# Z80 Assembly - Simple Tasks

This section contains some elementary routines that perform certain actions you are likely to need in the future. Besides, they are also useful to get a bit more practice in creating more and more efficient structures. I will also try to gradually introduce a new way of commenting from now on, because it doesn’t make much sense to comment every single line. My aim is to help you develop a better skill of understanding others’ sources on your own. It is essential for you to be able to understand these routines on your own, so take the time to think about how they work. Before starting the discussion of these tasks, I have to introduce some additional instructions.

## Working with data

All the computers are capable of doing basically one single thing: they manipulate data. This section gives an introduction to these possibilities in the case of the Z80.

### Addition and subtraction

The Z80 processor is able to directly add or subtract both 8 and 16-bit numbers. These operations are performed by four simple instructions: `add`, `sub`, `adc` and `sbc`. (If you have been reading linearly, you could already see `add` in action.) Except for `sub`, they all have two operands, and the result is written back into the first one. The number of bits is determined by the first operand: 8-bit operations always involve the A register, while the 16-bit versions rely on HL/IX/IY. In the case of `sub`, there is only one operand whose value is always subtracted from A, and the result is naturally written back into A, too. The four instructions do the following:

 `add op1,op2` – op1=op1+op2 `sub op2` – A=A–op2 `adc op1,op2` – op1=op1+(op2+carry) `sbc op1,op2` – op1=op1–(op2+carry)

As I said, the first operand is either of the four ones listed above. What the second operand can be depends on the number of bits. In 8-bit operations OP2 can be an 8-bit constant, any 8-bit general purpose register (A, B, C, D, E, H, L, IXH, IXL, IYH or IYL) or an indirectly addressed byte of the memory ((HL), (IX+n), (IY+n), but not (BC) or (DE)!). However, with 16-bit operations you can only use BC, DE, SP or what you used as OP1 (i. e. add hl,ix is not possible, contrary to `add hl,hl`), no constants or data in the memory.

Carry is the value of the carry flag: either 0 or 1. You might ask why it is useful to include the value of a flag in some operations, since you could not see such a thing in other programming languages. This is just another thing that is naturally handled by high-level languages, but has to be programmed manually in assembly (one example is adding numbers of a bit number greater than 16). Carry usually holds the (n+1)th bit of the result of arithmetic operations. For instance, if you add two 8-bit numbers, the result generally needs 9 bits to be stored. The name “carry” suggests that this 9th bit might be of some use later, that is the reason to carry it around. You will see some examples for its usage in the following sections.

### Bit-level operations

You could already encounter logical operations if you have programming experience, e. g. when examining conditions like “(i=1) and (j=2)”. On the CPU level, they are preformed by the logical `and`, `or` and `xor` instructions. All the three need an 8-bit operand, which can be the same as that of the 8-bit arithmetic instructions (just about anything). Naturally, the A register is always involved, both as one of the factors and as the holder of the result. The individual bits are completely independent of each other in these operations, and the carry is always cleared after one of these instructions is executed (so if you want to do a 16-bit subtraction without carry, you can still do it with `sbc` by putting `or a` or `and a` before it, so the carry is guaranteed to be zero).

When `or` is performed, each bit of the result will be one if at least one of the factors had its corresponding bit set:

```  %00101110
%10011101  i. e. only two zeroes give zero, all the other combinations result in one
---------
%10111111 (result)```

In turn, `and` makes each bit of the result be zero if at least one of the factors had its corresponding bit reset:

```  %00101110
%10011101  i. e. only two ones give one, all the other combinations result in zero
---------
%00001100 (result)```

The third one, `xor` (which comes from the expression “exclusive or”) makes a bit of the result set if and only if one of the factors had its corresponding bit set, while the same bit of the other factor was reset prior to execution:

```  %00101110
%10011101  i. e. inequality gives one, while equality gives zero as a result
---------
%10110011 (result)```

You are going to use these instructions a lot. To close their discussion, here is a little trick: you can load zero into A by executing `xor a`, i. e. by `xor`ing A with itself (just think about it why this is true). This is useful as it is faster and smaller than `ld a,0`. The only drawback is that it modifies the flags, but you do not usually need to preserve them for a long time anyway.

## Summing one-byte numbers

Let’s assume that we have five 8-bit numbers stored beginning at the address \$1000. We want to calculate their sum. For now, we assume that the sum itself will also remain within 8 bits (i. e. it will be less than 256). An unefficient but straightforward solution could be the following:

```  ld b,0                         ; Initialising the partial sum
ld a,(\$1000)                   ; Reading the first number into the accumulator
add a,b                        ; Adding it to the sum
ld b,a                         ; Writing the sum back to B
ld a,(\$1001)                   ; Reading the second number into the accumulator
add a,b                        ; Adding it to the sum
ld b,a                         ; Writing the sum back to B
ld a,(\$1002)                   ; Adding the 3rd value
ld b,a
ld a,(\$1003)                   ; Adding the 4th value
ld b,a
ld a,(\$1004)                   ; And adding the 5th value, too
ld b,a```

If you have the eyes of an eagle and noticed that something odd is going on at the beginning, you could already ask why not `ld` the first value into B instead of zeroing it first. The answer is that this is not the final version. The first improvement could be using indirect addressing instead of directly addressing every single byte of the data.

```  ld a,0                         ; Initialising the partial sum
ld hl,\$1000                    ; Initialising the pointer to the first byte of the data
add a,(hl)                     ; Adding the current data to the sum
inc hl                         ; Proceeding to the next byte of the data
inc hl
inc hl
inc hl
inc hl
ld b,a                         ; We want the result in B```

There are two advantages of this solution. First, you can directly add the data to the sum without having to load it first into a register. Naturally, addition implies that the sum is either in A (8-bit) or in HL/IX/IY (16-bit). The other, more important change is that now it is much easier to modify the program if you decide to sum another consecutive series of five numbers in the memory—you only need to modify the initialisation part. Besides these advantages, the code also improved in terms of speed and size. However, there is still a redundant `inc hl` at the end... Why? If you look at the code, you can see five `add``inc` pairs. As these pairs are completely identical, we could as well put them in a loop.

```  ld c,5                         ; Setting up the loop counter
ld a,0                         ; Initialising the partial sum
ld hl,\$1000                    ; Initialising the pointer to the first byte of the data
Repeat:                          ; Adds the current byte to the sum and proceeds
inc hl
dec c                          ; Handling the loop (without djnz this time)
jr nz,Repeat
ld b,a                         ; We want the result in B```

This also explains why we could not use `ld` for the first byte: we had to separate the initialisation from the actual calculation. Doing so is in general unefficient, but it helps to maintain a cleaner code, which is useful in the development stage—but it is certainly worth to optimise the code prior to releasing it. Introducing the loop also enables us to easily modify the number of bytes involved without having to add or remove code. However, when it is possible to use B as loop counter, it is advisable to do so:

```  ld b,5                         ; Setting up the loop counter
ld a,0                         ; Initialising the partial sum
ld hl,\$1000                    ; Initialising the pointer to the first byte of the data
Repeat:                          ; Adds the current byte to the sum and proceeds
inc hl
djnz Repeat
ld b,a                         ; We want the result in B```

After going through all this, it is important to note that using 8 bits for the sum is not really practical. Let’s extend it to 16 bits then. And to make it even better, I will also make the loop counter 16-bit:

```  ld ix,\$1000                    ; Pointer to the data
ld hl,0                        ; The beginning sum
ld bc,500                      ; Loop counter
ld d,0                         ; The upper 8 bits of the numbers to be added
Repeat:                          ; Adds the current byte to the sum and proceeds
ld e,(ix)
add hl,de                      ; Note that D=0
inc ix
dec bc                         ; This instruction does not modify the flags!
ld a,b                         ; Verifying whether the counter reached zero
or c                           ; The zero flag is set if both bytes of BC are zero
jr nz,Repeat```

If you still don’t understand how this 16-bit counter works, try to remember the principle of the `or` operation: if the result is zero, then both values must have been zero. To perform the operation “B or C” the value of B has to be loaded into A, because all the logical operations suppose one of the factors to be in A. Get used to this method, because it is frequently applied in practice.

## Adding large numbers

Let’s say we have two 16-byte (128-bit) numbers in the memory. The first is at \$1000, the second at \$1010. Their sum is to be put into the 16 bytes starting at \$1020. All the numbers start with the least significant byte. The magic word is carry in this case, which holds the bits transferred between the byte boundaries.

```  ld ix,\$1000                    ; Pointer to the first number
ld b,16                        ; The number of bytes in each number
or a                           ; A dummy logical instruction, used to clear the carry
Repeat:                          ; Adds 8 bits on each iteration
ld a,(ix)
adc a,(ix+\$10)                 ; Add with carry (the 9th bit of the previous addition)
ld (ix+\$20),a                  ; Storing the current byte of the result
inc ix
djnz Repeat```

Note that neither 16-bit `inc` nor `djnz` alters the flags, and actually this is the reason for it. The loop would not work if the CPU designers had not thought about these cases.

## Moving data blocks

This is a typical programming task, you will certainly need to move around data in your programs. Let’s start with the elementary memory block movement. The aim is to move 500 bytes of data from the address \$2000 to \$4000. Fortunately the Z80 processor is capable of performing this task with a single instruction:

```  ld hl,\$2000                    ; Pointer to the source
ld de,\$4000                    ; Pointer to the destination
ld bc,500                      ; Number of bytes to move
ldir                           ; Moves BC bytes from (HL) to (DE)```

The `ldir` instruction is a composite instruction, which is equivalent to the following piece of code:

```Repeat:
ld a,(hl)                      ; Getting the current byte
inc hl
ld (de),a                      ; Storing it
inc de
dec bc                         ; Handling the loop
ld a,b
or c
jr nz,Repeat```

The only difference (besides the rather obvious fact that `ldir` is much smaller and much faster than the loop above) is that the A register is not involved when using `ldir`. For the programmers’ convenience there is also an instruction called `ldi` which does almost the same thing except that it moves only one byte (but still updates all the three counters!).

This little instruction can also be used to fill each byte of an area of the memory with a given value. I might as well call it a little trick, but it isn’t actually a complicated one. The following code fills 500 bytes from the address \$2000 with 150.

```  ld hl,\$2000                    ; Pointer to the source
ld de,\$2001                    ; Pointer to the destination
ld bc,499                      ; Number of bytes to move
ld (hl),150                    ; The value to fill
ldir                           ; Moves BC bytes from (HL) to (DE)```

What happens? If you think it over, you can realise that in each iteration the preceding byte is copied into the current byte, which results in step by step copying the value of 150 at the beginning into each byte of the region. This happens because the two regions—the source and the destination—overlap. Now you could start wondering about what to do if you really want to move these 500 bytes one byte ahead instead of filling them with the same value. The solution is simple: you have to start from the end of the region, and go backwards. The instruction to do this is `lddr`, which does almost the same as `ldir`, with the only difference that it decrements HL and DE in each iteration. The example to move 500 bytes from the address of \$2000 to \$2001:

```  ld hl,\$21F3                    ; Pointer to the end of the source (500=\$1F4)
ld de,\$21F4                    ; Pointer to the end of the destination
ld bc,500                      ; Number of bytes to move
lddr                           ; Moves BC bytes from (HL) to (DE) backwards```

Note that if the overlapping is the other way around, i. e. the destination is at the lower address, you have to use `ldir`. Think about this before proceeding to the next section.

## Manipulating data blocks

### Simple conditions

After getting to know some elementary methods, we can start thinking about practical problems. The next task is a bit more complicated: there are 200 numbers (8-bit signed integers) stored from the address \$1000, and we want to separate the negative and the non-negative numbers. We want to create two separate lists: that of the non-negative numbers at \$2000 and the negative values at \$3000. A possible solution could look like this:

```  ld hl,\$1000                    ; Pointer to the data
ld ix,\$2000                    ; Pointer to the non-negative list
ld iy,\$3000                    ; Pointer to the negative list
ld b,200                       ; Loop counter
Repeat:
ld a,(hl)                      ; Getting and checking the sign of the current element
inc hl
cp \$80
jr nc,Negative
ld (ix),a                      ; Storing a non-negative value
inc ix
jr Continue
Negative:
ld (iy),a                      ; Storing a negative value
inc iy
Continue:
djnz Repeat```

A comment for programmers of TI calculators: the IY register is reserved for the system, so you can only use it if you save its value and disable interrupts. In this example, you could use DE instead of IY, but in a normal everyday situation you will most probably find all your registers full of important data, particularly the general purpose registers (A, B, C, D, E, H, L)...

### Sorting

This time I would like to show a way to implement simple bubble sort in Z80 assembly. For those who don’t know the algorithm, here is the explanation:

1. Going through the array from the beginning to the end, if two neighbouring elements are in the wrong order, we swap them. By this the greatest number will be the last element of the list.
2. We repeat the step above, but this time we do not include the last element, we stop before reaching it. This way the second greatest number will also be put into its proper place.
3. Doing the second step with a decreasing number of elements until this number becomes one. Then we are done. We will also stop sorting if there was no need to sort in an intermediate step, because that implies that the elements are already in the right order.

The code:

```  ld c,NumberOfElements
dec c                          ; Note that the first step involves N-1 checks
ld hl,1                        ; Setting H=0 and L=1, for optimising speed
Step:
ld e,h                         ; Bit 0 of E will indicate if there was need to swap
ld b,c                         ; C holds the number of elements in the current step
Loop:
ld a,(ix)
ld d,(ix+1)
cp d                           ; If A was less than D, the carry will be set
jr c,Continue
ld (ix),d                      ; Swapping order is actually performed by simply writing
ld (ix+1),a                    ; the values back in a reversed order
ld e,l                         ; Swapping is indicated here (L=1)
Continue:
inc ix
djnz Loop
dec e
jr nz,Finish                   ; If E became zero after DEC, we have to continue
dec c
jr nz,Step
Finish:```

Of course, this is the slowest sorting algorithm, but it is easy to understand. Later, in the advanced section you are going to find an implementation of the QuickSort algorithm, too.

### Searching

Another useful thing is searching byte sequences in the memory, e. g. strings in a text. The program below does the following: given the address and length of a text, and the same parameters of a string to be found in it (all four are 2-byte words), it returns the (first) address where the string is found in HL. If the text does not contain the string given, it returns 0 in HL.

```Start:
ld bc,(StringLength)
Repeat:                          ; This loop verifies if the text from the current byte
ld a,(de)                      ; matches the string given, character by character. If
cp (hl)                        ; it does, then the zero flag is set. Execution is
jr nz,EndRepeat                ; continued from EndRepeat, regardless of the success of
inc hl                         ; the search.
inc de
dec bc
ld a,b
or c
jr nz,Repeat
EndRepeat:
ld hl,(TextAddress)            ; Note that LD preserves the flags
jr z,Finish
inc hl                         ; The text pointer is advanced
ld bc,(TextLength)
dec bc                         ; Total byte count is decreased
ld (TextLength),bc
ld a,b
or c
jr nz,Start
ld hl,0                        ; This part is executed in case of failure (BC=0)
Finish:
...                            ; There should be some code following here, otherwise
; execution would continue in the data part...