Multiplication

The 65c816 does not have a multiplication instruction, but the SNES includes multiplication hardware that can be accessed with registers. These are faster than implementing general-purpose multiplication in software.

There are two available multipliers:
 * 5A22 multiplier
 * 8-bit number &times; 8-bit number = 16-bit result
 * Uses unsigned numbers: $ff (255) &times; $ff (255) = $fe01 (65025)
 * There is a delay between writing the parameters and correct results being available
 * Can be used any time


 * PPU multiplier
 * 16-bit number &times; 8-bit number = 24-bit result
 * Uses signed numbers: $ffff (-1) &times; $ff (-1) = $000001 (1)
 * Results can be read immediately after writing the parameters
 * Can not be used while rendering Mode 7

5A22 multiplier
The multiplier in the 5A22 chip (which also contains the SNES's main processor) is accessed through the following registers:

WRMPYA $4202   7  bit  0 NNNN NNNN |||| |||| ++++-++++- First number to multiply (8-bit, unsigned) WRMPYB $4203   7  bit  0 NNNN NNNN |||| |||| ++++-++++- Second number to multiply (8-bit unsigned) RDMPYH     RDMPYL $4217      $4216 7  bit  0   7  bit  0 HHHH HHHH  LLLL LLLL |||| ||||  |||| |||| ++++-++++---++++-++++- Multiplication result (16-bit unsigned)

Writing to  starts the multiplication process. This takes 8 CPU clock cycles before it's finished (regardless of if those clock cycles are 2.68MHz or 3.58MHz) and reading the result early will return an intermediate result.

The time spent reading the instruction that reads  and/or   counts toward the wait. For instance, if the registers are read with, that instruction will spend 3 cycles before reading the result, and   will spend 4 cycles before reading the result. This means the program only effectively needs to wait 4 or 5 cycles.

A program can choose to simply fill the required wait time with  (2 cycles each). A better option is to spend that time on something useful to the program. The following example spends it on a single instruction that fetches the next number the program intends to multiply:

; Accumulator is 8-bit, index registers are 16-bit ; Multiply eight different numbers by WRMPYA and store the results in RAM lda 0,y .repeat 8, I  sta WRMPYB   ; Kick off the multiplier lda I+1,y   ; 5 cycles ldx RDMPYL  ; 3 cycles before the read stx I*2     ; =8 cycles waiting .endrep

Another thing to be aware of is that if the most significant bit written to  is always known to be zero, the result is valid one cycle earlier, and if the two most significant bits are zero, the result is valid two cycles earlier, and so on.

"PPU" multiplier
The other multiplier in the SNES reuses hardware meant for Mode 7, so it can only be used in vblank or in background modes 0 to 6. It's accessed through the the following registers, also described on PPU registers:

M7A $211B 15 bit  8   7  bit  0 DDDD DDDD  dddd dddd |||| ||||  |||| ||||  ++++-++++---++++-++++- 16-bit multiplication factor (signed) On write: M7A = (value << 8) | mode7_latch mode7_latch = value This means that $211B must be written to twice, with the lower 8 bits of the number being written before the upper 8 bits. "STA M7A \ STZ M7A" can be used to just write an 8-bit number here with zero as the upper byte. M7B $211C 7 bit  0 dddd dddd |||| |||| ++++-++++- 8-bit multiplication factor (signed) MPYH       MPYM        MPYL $2136      $2135       $2134 7  bit  0   7  bit  0   7  bit  0 HHHH HHHH  MMMM MMMM   LLLL LLLL |||| ||||  |||| ||||   |||| |||| ++++-++++---++++-++++---++++-++++- Multiplication result (signed)

The advantages to using this multiplier are that the results are available immediately and it can use a 16-bit number for one of the parameters. It does not have to be started like the other one - writing to  or   will immediately and instantly calculate the result.

It's important to note that this multiplier uses signed numbers, so numbers like 128 are out of range for, and anything with the most significant bit set will be interpreted to be negative.

Multiplication by a constant
The multiplication registers are the fastest option when code needs to multiply two arbitrary numbers together. However, when code can rely on one of the numbers always being the same it can sometimes be faster to use tables or shifts. For instance, multiplication by three can be as simple as:

sta Temp asl adc Temp ; Assume that carry got cleared by the ASL