This section contains some elementary routines that perform certain actions you are likely to need in the future. Besides, they are also useful to get a bit more practice in creating more and more efficient structures. I will also try to gradually introduce a new way of commenting from now on, because it doesn’t make much sense to comment every single line. My aim is to help you develop a better skill of understanding others’ sources on your own. It is essential for you to be able to understand these routines on your own, so take the time to think about how they work. Before starting the discussion of these tasks, I have to introduce some additional instructions.
All the computers are capable of doing basically one single thing: they manipulate data. This section gives an introduction to these possibilities in the case of the Z80.
The Z80 processor is able to directly add or subtract both 8 and 16-bit numbers. These operations are performed by four simple instructions: add
, sub
, adc
and sbc
. (If you have been reading linearly, you could already see add
in action.) Except for sub
, they all have two operands, and the result is written back into the first one. The number of bits is determined by the first operand: 8-bit operations always involve the A register, while the 16-bit versions rely on HL/IX/IY. In the case of sub
, there is only one operand whose value is always subtracted from A, and the result is naturally written back into A, too. The four instructions do the following:
add op1,op2 | – op1=op1+op2 |
sub op2 | – A=A–op2 |
adc op1,op2 | – op1=op1+(op2+carry) |
sbc op1,op2 | – op1=op1–(op2+carry) |
As I said, the first operand is either of the four ones listed above. What the second operand can be depends on the number of bits. In 8-bit operations OP2 can be an 8-bit constant, any 8-bit general purpose register (A, B, C, D, E, H, L, IXH, IXL, IYH or IYL) or an indirectly addressed byte of the memory ((HL), (IX+n), (IY+n), but not (BC) or (DE)!). However, with 16-bit operations you can only use BC, DE, SP or what you used as OP1 (i. e. add hl,ix is not possible, contrary to add hl,hl
), no constants or data in the memory.
Carry is the value of the carry flag: either 0 or 1. You might ask why it is useful to include the value of a flag in some operations, since you could not see such a thing in other programming languages. This is just another thing that is naturally handled by high-level languages, but has to be programmed manually in assembly (one example is adding numbers of a bit number greater than 16). Carry usually holds the (n+1)th bit of the result of arithmetic operations. For instance, if you add two 8-bit numbers, the result generally needs 9 bits to be stored. The name “carry” suggests that this 9th bit might be of some use later, that is the reason to carry it around. You will see some examples for its usage in the following sections.
You could already encounter logical operations if you have programming experience, e. g. when examining conditions like “(i=1) and (j=2)”. On the CPU level, they are preformed by the logical and
, or
and xor
instructions. All the three need an 8-bit operand, which can be the same as that of the 8-bit arithmetic instructions (just about anything). Naturally, the A register is always involved, both as one of the factors and as the holder of the result. The individual bits are completely independent of each other in these operations, and the carry is always cleared after one of these instructions is executed (so if you want to do a 16-bit subtraction without carry, you can still do it with sbc
by putting or a
or and a
before it, so the carry is guaranteed to be zero).
When or
is performed, each bit of the result will be one if at least one of the factors had its corresponding bit set:
%00101110 %10011101 i. e. only two zeroes give zero, all the other combinations result in one --------- %10111111 (result)
In turn, and
makes each bit of the result be zero if at least one of the factors had its corresponding bit reset:
%00101110 %10011101 i. e. only two ones give one, all the other combinations result in zero --------- %00001100 (result)
The third one, xor
(which comes from the expression “exclusive or”) makes a bit of the result set if and only if one of the factors had its corresponding bit set, while the same bit of the other factor was reset prior to execution:
%00101110 %10011101 i. e. inequality gives one, while equality gives zero as a result --------- %10110011 (result)
You are going to use these instructions a lot. To close their discussion, here is a little trick: you can load zero into A by executing xor a
, i. e. by xor
ing A with itself (just think about it why this is true). This is useful as it is faster and smaller than ld a,0
. The only drawback is that it modifies the flags, but you do not usually need to preserve them for a long time anyway.
Let’s assume that we have five 8-bit numbers stored beginning at the address $1000. We want to calculate their sum. For now, we assume that the sum itself will also remain within 8 bits (i. e. it will be less than 256). An unefficient but straightforward solution could be the following:
ld b,0 ; Initialising the partial sum ld a,($1000) ; Reading the first number into the accumulator add a,b ; Adding it to the sum ld b,a ; Writing the sum back to B ld a,($1001) ; Reading the second number into the accumulator add a,b ; Adding it to the sum ld b,a ; Writing the sum back to B ld a,($1002) ; Adding the 3rd value add a,b ld b,a ld a,($1003) ; Adding the 4th value add a,b ld b,a ld a,($1004) ; And adding the 5th value, too add a,b ld b,a
If you have the eyes of an eagle and noticed that something odd is going on at the beginning, you could already ask why not ld
the first value into B instead of zeroing it first. The answer is that this is not the final version. The first improvement could be using indirect addressing instead of directly addressing every single byte of the data.
ld a,0 ; Initialising the partial sum ld hl,$1000 ; Initialising the pointer to the first byte of the data add a,(hl) ; Adding the current data to the sum inc hl ; Proceeding to the next byte of the data add a,(hl) inc hl add a,(hl) inc hl add a,(hl) inc hl add a,(hl) inc hl ld b,a ; We want the result in B
There are two advantages of this solution. First, you can directly add the data to the sum without having to load it first into a register. Naturally, addition implies that the sum is either in A (8-bit) or in HL/IX/IY (16-bit). The other, more important change is that now it is much easier to modify the program if you decide to sum another consecutive series of five numbers in the memory—you only need to modify the initialisation part. Besides these advantages, the code also improved in terms of speed and size. However, there is still a redundant inc hl
at the end... Why? If you look at the code, you can see five add
–inc
pairs. As these pairs are completely identical, we could as well put them in a loop.
ld c,5 ; Setting up the loop counter ld a,0 ; Initialising the partial sum ld hl,$1000 ; Initialising the pointer to the first byte of the data Repeat: ; Adds the current byte to the sum and proceeds add a,(hl) inc hl dec c ; Handling the loop (without djnz this time) jr nz,Repeat ld b,a ; We want the result in B
This also explains why we could not use ld
for the first byte: we had to separate the initialisation from the actual calculation. Doing so is in general unefficient, but it helps to maintain a cleaner code, which is useful in the development stage—but it is certainly worth to optimise the code prior to releasing it. Introducing the loop also enables us to easily modify the number of bytes involved without having to add or remove code. However, when it is possible to use B as loop counter, it is advisable to do so:
ld b,5 ; Setting up the loop counter ld a,0 ; Initialising the partial sum ld hl,$1000 ; Initialising the pointer to the first byte of the data Repeat: ; Adds the current byte to the sum and proceeds add a,(hl) inc hl djnz Repeat ld b,a ; We want the result in B
After going through all this, it is important to note that using 8 bits for the sum is not really practical. Let’s extend it to 16 bits then. And to make it even better, I will also make the loop counter 16-bit:
ld ix,$1000 ; Pointer to the data ld hl,0 ; The beginning sum ld bc,500 ; Loop counter ld d,0 ; The upper 8 bits of the numbers to be added Repeat: ; Adds the current byte to the sum and proceeds ld e,(ix) add hl,de ; Note that D=0 inc ix dec bc ; This instruction does not modify the flags! ld a,b ; Verifying whether the counter reached zero or c ; The zero flag is set if both bytes of BC are zero jr nz,Repeat
If you still don’t understand how this 16-bit counter works, try to remember the principle of the or
operation: if the result is zero, then both values must have been zero. To perform the operation “B or C” the value of B has to be loaded into A, because all the logical operations suppose one of the factors to be in A. Get used to this method, because it is frequently applied in practice.
Let’s say we have two 16-byte (128-bit) numbers in the memory. The first is at $1000, the second at $1010. Their sum is to be put into the 16 bytes starting at $1020. All the numbers start with the least significant byte. The magic word is carry in this case, which holds the bits transferred between the byte boundaries.
ld ix,$1000 ; Pointer to the first number ld b,16 ; The number of bytes in each number or a ; A dummy logical instruction, used to clear the carry Repeat: ; Adds 8 bits on each iteration ld a,(ix) adc a,(ix+$10) ; Add with carry (the 9th bit of the previous addition) ld (ix+$20),a ; Storing the current byte of the result inc ix djnz Repeat
Note that neither 16-bit inc
nor djnz
alters the flags, and actually this is the reason for it. The loop would not work if the CPU designers had not thought about these cases.
This is a typical programming task, you will certainly need to move around data in your programs. Let’s start with the elementary memory block movement. The aim is to move 500 bytes of data from the address $2000 to $4000. Fortunately the Z80 processor is capable of performing this task with a single instruction:
ld hl,$2000 ; Pointer to the source ld de,$4000 ; Pointer to the destination ld bc,500 ; Number of bytes to move ldir ; Moves BC bytes from (HL) to (DE)
The ldir
instruction is a composite instruction, which is equivalent to the following piece of code:
Repeat: ld a,(hl) ; Getting the current byte inc hl ld (de),a ; Storing it inc de dec bc ; Handling the loop ld a,b or c jr nz,Repeat
The only difference (besides the rather obvious fact that ldir
is much smaller and much faster than the loop above) is that the A register is not involved when using ldir
. For the programmers’ convenience there is also an instruction called ldi
which does almost the same thing except that it moves only one byte (but still updates all the three counters!).
This little instruction can also be used to fill each byte of an area of the memory with a given value. I might as well call it a little trick, but it isn’t actually a complicated one. The following code fills 500 bytes from the address $2000 with 150.
ld hl,$2000 ; Pointer to the source ld de,$2001 ; Pointer to the destination ld bc,499 ; Number of bytes to move ld (hl),150 ; The value to fill ldir ; Moves BC bytes from (HL) to (DE)
What happens? If you think it over, you can realise that in each iteration the preceding byte is copied into the current byte, which results in step by step copying the value of 150 at the beginning into each byte of the region. This happens because the two regions—the source and the destination—overlap. Now you could start wondering about what to do if you really want to move these 500 bytes one byte ahead instead of filling them with the same value. The solution is simple: you have to start from the end of the region, and go backwards. The instruction to do this is lddr
, which does almost the same as ldir
, with the only difference that it decrements HL and DE in each iteration. The example to move 500 bytes from the address of $2000 to $2001:
ld hl,$21F3 ; Pointer to the end of the source (500=$1F4) ld de,$21F4 ; Pointer to the end of the destination ld bc,500 ; Number of bytes to move lddr ; Moves BC bytes from (HL) to (DE) backwards
Note that if the overlapping is the other way around, i. e. the destination is at the lower address, you have to use ldir
. Think about this before proceeding to the next section.
After getting to know some elementary methods, we can start thinking about practical problems. The next task is a bit more complicated: there are 200 numbers (8-bit signed integers) stored from the address $1000, and we want to separate the negative and the non-negative numbers. We want to create two separate lists: that of the non-negative numbers at $2000 and the negative values at $3000. A possible solution could look like this:
ld hl,$1000 ; Pointer to the data ld ix,$2000 ; Pointer to the non-negative list ld iy,$3000 ; Pointer to the negative list ld b,200 ; Loop counter Repeat: ld a,(hl) ; Getting and checking the sign of the current element inc hl cp $80 jr nc,Negative ld (ix),a ; Storing a non-negative value inc ix jr Continue Negative: ld (iy),a ; Storing a negative value inc iy Continue: djnz Repeat
A comment for programmers of TI calculators: the IY register is reserved for the system, so you can only use it if you save its value and disable interrupts. In this example, you could use DE instead of IY, but in a normal everyday situation you will most probably find all your registers full of important data, particularly the general purpose registers (A, B, C, D, E, H, L)...
This time I would like to show a way to implement simple bubble sort in Z80 assembly. For those who don’t know the algorithm, here is the explanation:
The code:
ld c,NumberOfElements dec c ; Note that the first step involves N-1 checks ld hl,1 ; Setting H=0 and L=1, for optimising speed Step: ld ix,ArrayAddress ld e,h ; Bit 0 of E will indicate if there was need to swap ld b,c ; C holds the number of elements in the current step Loop: ld a,(ix) ld d,(ix+1) cp d ; If A was less than D, the carry will be set jr c,Continue ld (ix),d ; Swapping order is actually performed by simply writing ld (ix+1),a ; the values back in a reversed order ld e,l ; Swapping is indicated here (L=1) Continue: inc ix djnz Loop dec e jr nz,Finish ; If E became zero after DEC, we have to continue dec c jr nz,Step Finish:
Of course, this is the slowest sorting algorithm, but it is easy to understand. Later, in the advanced section you are going to find an implementation of the QuickSort algorithm, too.
Another useful thing is searching byte sequences in the memory, e. g. strings in a text. The program below does the following: given the address and length of a text, and the same parameters of a string to be found in it (all four are 2-byte words), it returns the (first) address where the string is found in HL. If the text does not contain the string given, it returns 0 in HL.
Start: ld hl,(TextAddress) ld de,(StringAddress) ld bc,(StringLength) Repeat: ; This loop verifies if the text from the current byte ld a,(de) ; matches the string given, character by character. If cp (hl) ; it does, then the zero flag is set. Execution is jr nz,EndRepeat ; continued from EndRepeat, regardless of the success of inc hl ; the search. inc de dec bc ld a,b or c jr nz,Repeat EndRepeat: ld hl,(TextAddress) ; Note that LD preserves the flags jr z,Finish inc hl ; The text pointer is advanced ld (TextAddress),hl ld bc,(TextLength) dec bc ; Total byte count is decreased ld (TextLength),bc ld a,b or c jr nz,Start ld hl,0 ; This part is executed in case of failure (BC=0) Finish: ... ; There should be some code following here, otherwise ; execution would continue in the data part... TextAddress: .word $1000 TextLength: .word 500 StringAddress: .word $2000 StringLength: .word 20
It was intentional that I only gave some loose comments, because by now you should be able to understand what is going on. Take the time to do so, I give you a break for now.