Chapter 4: The Cortex-M33 and Thumb-2

This chapter is a closer look at the processor that runs your ticktrace code. Don't try to memorise everything here, treat it as a reference to come back to. You'll absorb the details as you write programs.

The register file

The Cortex-M33 has 16 general-purpose 32-bit registers. Most ticktrace documentation refers to them by the names below.

Cortex-M33 register file with AAPCS roles

Name	Alias	Role
`r0`–`r3`		Argument and return registers. Caller-saved.
`r4`–`r11`		General-purpose. Callee-saved.
`r12`	`ip`	Scratch / intra-procedure temporary. Caller-saved.
`r13`	`sp`	Stack pointer.
`r14`	`lr`	Link register (return address).
`r15`	`pc`	Program counter.

"Caller-saved" means: if the caller cares about the value, the caller saves it before the call. "Callee-saved" means: a called function must preserve the value, so the caller can rely on it surviving the call. We unpack this fully in chapter 8.

There is also a hidden register called APSR, the Application Program Status Register, containing the four condition flags:

N (negative): set if the result was negative.
Z (zero): set if the result was zero.
C (carry): set on unsigned overflow or by shifts.
V (overflow): set on signed overflow.

You never name APSR directly in code; instructions that end in s write to it, and conditional branches read from it.

ARM, Thumb, and Thumb-2

ARM cores historically supported two instruction encodings:

ARM mode. Every instruction is 32 bits. Flexible but bulky.
Thumb mode. Every instruction is 16 bits. Smaller code, slightly fewer features.

Then ARM introduced Thumb-2, which mixes 16-bit and 32-bit Thumb encodings in the same instruction stream. Each instruction is whichever size it needs to be. This is the best of both worlds: dense code where a 16-bit form exists, and the full power of 32-bit when it doesn't.

The Cortex-M33 only runs Thumb-2. There is no ARM mode at all on M-profile cores. This is why every ticktrace function is preceded by:

    .thumb_func
    .global  my_function
my_function:
    ...

The .thumb_func directive tells the assembler "the symbol that follows is Thumb code, so set bit 0 of its address". The bottom bit of any function pointer on Cortex-M means "I am Thumb code"; if you ever jump to an address with the bottom bit clear, the CPU will fault.

You don't need to think about this directly, .thumb_func does it for you, but it explains a few small mysteries down the road.

The instruction set, in broad strokes

You don't need to learn every instruction up front. The everyday working set is small. Here's what you'll see constantly in ticktrace:

Move and load constants

    movs    r0, #25         @ r0 = 25 (small immediate, &#x3C; 256)
    mov     r1, r2          @ r1 = r2
    ldr     r0, =0x40014000 @ r0 = the 32-bit constant 0x40014000
    ldr     r0, =banner     @ r0 = address of label "banner"

movs with # takes a small immediate (up to 8 bits in most short forms; up to a 12-bit modified immediate in 32-bit forms). When you need a full 32-bit constant, write ldr r0, =VALUE, the assembler will park the constant in a nearby pool and turn the instruction into a PC-relative load.

Arithmetic

    adds    r0, r1, r2      @ r0 = r1 + r2 (and update flags)
    subs    r0, #1          @ r0 = r0 - 1
    muls    r0, r1, r0      @ r0 = r1 * r0
    sdiv    r0, r1, r2      @ r0 = r1 / r2 (signed)
    udiv    r0, r1, r2      @ unsigned

Bit operations

    ands    r0, r1          @ r0 &#x26;= r1
    orrs    r0, r1          @ r0 |= r1
    eors    r0, r1          @ r0 ^= r1
    bics    r0, r1          @ r0 &#x26;= ~r1
    lsls    r0, r1, #5      @ r0 = r1 &#x3C;&#x3C; 5
    lsrs    r0, r1, #2      @ r0 = r1 >> 2 (logical, fills with 0)
    asrs    r0, r1, #2      @ r0 = r1 >> 2 (arithmetic, sign-extends)

Memory access

    ldr     r0, [r1]        @ r0 = *(uint32_t*)r1
    ldr     r0, [r1, #4]    @ r0 = *(uint32_t*)(r1 + 4)
    str     r0, [r1]        @ *(uint32_t*)r1 = r0
    ldrb    r0, [r1]        @ r0 = *(uint8_t*)r1 (zero-extended)
    ldrh    r0, [r1]        @ r0 = *(uint16_t*)r1
    strb / strh             @ store byte / halfword

The [...] syntax is "the memory address inside these brackets". You can also write [r1, r2] (register-indexed), [r1], #4 (post-increment r1 after the access), and [r1, #4]! (pre-increment, write back). You'll see them all in real ticktrace code.

Compare and branch

    cmp     r0, r1          @ compute r0 - r1, set flags, discard result
    beq     .Lequal         @ branch if Z=1 (equal)
    bne     .Lloop          @ branch if Z=0
    blt     .Lneg           @ branch if signed less-than
    cbz     r0, .Lzero      @ compare-and-branch-if-zero (r0 only)
    cbnz    r0, .Lnonzero
    b       .Ldone          @ unconditional branch
    bl      function        @ branch with link (function call)
    bx      lr              @ branch to register (function return)

bl saves the address of the next instruction into lr so that the called function can return to it. bx lr returns by jumping to lr. The x in bx means "interworking", the CPU looks at bit 0 to know whether the destination is Thumb. Since all our code is Thumb, bit 0 is always 1, and bx lr is what a function-return looks like.

Pushing and popping

    push    {r4, r5, lr}    @ predecrement sp, store r4, r5, lr
    pop     {r4, r5, pc}    @ load into r4, r5, pc; postincrement sp

push and pop are how you save callee-saved registers across calls, and how you save lr if you'll do a bl inside your own function. Popping into pc is the standard return idiom for functions that push lr.

The memory map

On the RP2350, the 32-bit address space is divided into regions. The ones you'll care about in this book are:

RP2350 memory map

Address range	What lives here
`0x00000000`–`0x00007fff`	Bootrom (32 KB, read-only)
`0x10000000`–`0x103fffff`	QSPI flash, mapped via XIP cache (4 MB on Pico 2)
`0x20000000`–`0x20081fff`	SRAM (520 KB)
`0x40000000`–`0x4fffffff`	APB peripherals, UART, GPIO, I2C, etc.
`0xd0000000`–`0xd000ffff`	SIO (single-cycle I/O, includes GPIO out)
`0xe0000000`–`0xe00fffff`	Cortex-M33 system control (NVIC, SysTick)

The two regions you'll talk to most are SRAM (your stack, your data, sometimes your code during testing) and the peripheral region. Every hardware peripheral on the chip has a base address; you write to a specific offset from that base to control it. We meet this concept formally in chapter 9 as memory-mapped I/O.

Atomic register aliases: the RP2 trick

Most peripheral registers on most chips require read-modify-write to change one bit: read the register, OR in a bit, write it back. That takes three instructions and is unsafe against interrupts.

The RP2350 (and the RP2040 before it) cleverly map every peripheral register four times:

Atomic alias windows

Offset	Effect
`+0x0000`	Normal read/write
`+0x1000`	Atomic XOR (writing 1 toggles that bit)
`+0x2000`	Atomic SET (writing 1 sets that bit)
`+0x3000`	Atomic CLR (writing 1 clears that bit)

So to toggle GP25's output bit you do not need a read-modify-write. You just store a 1 in the right place at the right alias and the hardware does the bit-flip in two cycles.

ticktrace uses this aggressively. You'll see, for example:

    ldr     r0, =SIO_BASE + SIO_GPIO_OUT_XOR_OFFS
    movs    r1, #(1 &#x3C;&#x3C; 25)
    str     r1, [r0]            @ toggle GP25 atomically

That single str is the entire LED toggle. Two cycles, no scratch register, no race with an ISR. This idiom is the signature move of RP2 assembly and one of the prettiest tricks on the chip.

What state is the CPU in at boot?

When the bootrom hands off to your reset handler, you can rely on:

The CPU is in Thumb mode (it's always in Thumb mode on M33).
The MSP (main stack pointer) is loaded from the first word of your vector table.
The PC is loaded from the second word of your vector table.
Interrupts are disabled at the NVIC level (no peripheral has yet been enabled to fire).
The XOSC is running at 12 MHz and is the source for clk_sys.

You inherit a clean machine. From there, what happens next is whatever you write, which is the whole point of this book.

Exercises

Caller-saved or callee-saved? For each register, answer without peeking: r0, r4, r7, r12, lr, sp. (r0 caller, r4 callee, r7 callee, r12 caller, lr is caller-saved in the sense that bl clobbers it for you, sp must be preserved.)
Compute an address. GPIO[25].CTRL lives at IO_BANK0_BASE + 4 + 25 × 8. With IO_BANK0_BASE = 0x40028000, what's the byte address? Write the two-instruction Thumb sequence that loads it into r0. (Answer: 0x400280CC; e.g. ldr r0, =0x400280CC does it in one instruction.)
Atomic alias arithmetic. Suppose RESETS_RESET is at 0x40020000. What address do you write to atomically clear bits in it? (0x40023000, base + 0x3000 CLR alias.)
Why bit 0? Why does the .thumb_func directive set bit 0 of a symbol's address? What happens if you bx to an even address? (Cortex-M only runs Thumb; bit 0 = Thumb marker. Even addresses trigger a UsageFault.)
Where does main live? Looking at the memory map, where will the main symbol's address fall when the program is built for flash? What about when built for the SRAM test variant? (Flash: somewhere in 0x10000000-0x103FFFFF. SRAM: 0x20000000+.)

What's next

You now have a working vocabulary of registers, instructions, and the RP2350 memory map. The next chapter gets the toolchain installed so we can turn assembly source into a runnable .uf2 file.

← Chapter 3: The RP2 family · Table of contents · Chapter 5: Setting up ticktrace →

Chapter 4: The Cortex-M33 and Thumb-2#

The register file#

ARM, Thumb, and Thumb-2#

The instruction set, in broad strokes#

Move and load constants#

Arithmetic#

Bit operations#

Memory access#

Compare and branch#

Pushing and popping#

The memory map#

Atomic register aliases: the RP2 trick#

What state is the CPU in at boot?#

Exercises#

What's next#