The 6502 is a fascinating core. It was my first encounter with central processing units, quickly followed by Intel 8080, Zilog Z80 and Z8. Those were the days. BBC Micro, Tandy TRS-80, the 24-pound “portable” Osborne 1. Memory lane.
The 6502’s instruction set is minimal. Its register file is also tiny relative to modern cores. Just three accessible 8-bit registers: the accumulator, X and Y. Of course, the register file includes an 8-bit stack pointer, 16-bit program counter and 8-bit processor status register. The stack pointer addresses the 256-byte address “page” at \(01xx_{16}\). A rich set of direct and indirect addressing “modes” complement the limited instruction and register set. What the programmer can do even with such computing limitations still amazes.
The problem
Operations to add and subtract two 8-bit integers are arguably the most complicated 6502 instructions, ADC and SBC. The arithmetic involves BCD, binary-coded decimal. In the old days, binary-coded decimal replaced an FPU. It was a handy way to represent floating-point numbers of prescribed limited precision.
Add with carry. Subtract with carry. The programmer loads the 8-bit accumulator and then applies the operand. What do these operations look like in emulated C? Such C code would appear in an emulator library for a 6502. It would reflect the actual circuit-level logic and microcode, mimicking the logical pipeline from bits input to bits output buried deeply within the CPU’s arithmetic and logic unit, ALU.
The solution
Cutting a long story short, the ADC operation amounts to a C function:
static inline void mcs650x_adc_d(struct mcs650x_cpu *cpu, uint8_t d)
{
const uint8_t a = cpu->a, p = cpu->p;
uint16_t ac = (uint16_t)a + d + (p & MCS650X_P_C);
const uint8_t z = ac & 0xff ? 0 : MCS650X_P_Z;
if (p & MCS650X_P_D)
{
if ((ac ^ a ^ d) & 0x010 && (ac & 0x00f) >= 0x00a)
ac -= 0x010;
ac += mcs650x_da_lo(ac, a, d);
}
const uint8_t n = ac & MCS650X_P_N;
const uint8_t v = (((a ^ n) & (d ^ n)) >> (mcs650x_p_n - mcs650x_p_v)) & MCS650X_P_V;
if (p & MCS650X_P_D)
{
if ((ac & 0x100) && (ac & 0x0f0) >= 0x0a0)
ac -= 0x100;
ac += mcs650x_da_hi(ac);
}
const uint8_t c = (ac >> 8) & MCS650X_P_C;
cpu->a = ac;
cpu->p = (p & ~MCS650X_P_NVZC) | n | v | z | c;
}
And the SBC operation, similarly, amounts to:
static inline void mcs650x_sbc(struct mcs650x_cpu *cpu)
{
uint8_t a = cpu->a, d = cpu->d, p = cpu->p;
uint16_t ac = (uint16_t)a - d - ((p & MCS650X_P_C) ^ MCS650X_P_C);
if (p & MCS650X_P_D)
ac -= mcs650x_da_hi(ac) | mcs650x_da_lo(ac, a, d);
const uint8_t n = ac & MCS650X_P_N;
const uint8_t v = (((a ^ d) & (a ^ n)) >> (mcs650x_p_n - mcs650x_p_v)) & MCS650X_P_V;
const uint8_t z = ac & 0xff ? 0 : MCS650X_P_Z;
const uint8_t c = ((ac >> 8) & MCS650X_P_C) ^ MCS650X_P_C;
cpu->a = ac;
cpu->p = (p & ~MCS650X_P_NVZC) | n | v | z | c;
}
While ostensibly simple, the implementations have some important subtleties particularly relating to carry and half carry.
Explanations
At its most basic, the ADC add-with-carry operation becomes in C:
uint16_t ac = (uint16_t)a + d + (p & MCS650X_P_C);
Accumulator plus data plus carry becomes new accumulator plus new carry, ac
. Term p
accesses the CPU processor status word, also known as the “flags.” The carry flag lives in the status word’s least significant bit, bit \(0\). Casting the incoming accumulator a
to an unsigned 16-bit integer drops a compiler hint. The addition spans 16 bits not eight by default. The ninth bit becomes the new carry. This approach assumes a more-than-16-bit host for the emulated ADC instruction.
Basic subtraction similarly amounts to a simple C statement:
uint16_t ac = (uint16_t)a - d - ((p & MCS650X_P_C) ^ MCS650X_P_C);
It subtracts d
and the carry bit from the accumulator. Carry inversion is the main difference. On 6502, “barrow” is not “carry,” \(B=\overline C\). Exclusive-OR by “carry” inverts the carry.
Simple. Almost not worth the ink.
Conclusions
Does it matter? Well, yes. It helps to more deeply comprehend a core. Most of the time, an embedded firmware engineer plies his trade in C. Not without reason. C acts as the Lingua Franca. The human programmer will typically not improve on the C compiler’s generated code. Plus, the compiler adds some platform portability. Embedded code binds tightly to the target platform under normal circumstances—be it C or assembler. Yet, the C compiler adds convenience and some measure of higher-level portability. Nevertheless, knowing the details helps to inform the designer, the debugger and the tester. Details matter. In principle, the same approach applies to more complex cores, e.g. the ARM Cortex family. Charting out how a CPU does what it does is never a wasted exercise.