Why do C to Z80 compilers produce poor code?

Why do C to Z80 compilers produce poor code?

Quite often people don’t know how to use the compilers or don’t understand fully the consequences of code they write. There is optimization going on in the z80 c compilers but it’s not as complete as, say, gcc. And I often see people fail to turn up the optimization when they compile.

There is an example here in introspec’s post that I am not allowed to comment on due to reputation points:

char i,data[10];

void main(void) 
{
  for (i=0; i<10; i++)
    data[i]=0;
}

There are lots of problems with this code that he is not considering. By declaring i as char, he's possibly making it signed (that is the compiler's discretion). That means, in comparisons, the 8-bit quantity is sign extended before being compared because normally, unless you specify in code properly, the c compiler may promote to ints before doing those comparisons. And by making it global, he makes sure the compiler cannot hold the for-loop index in a register inside the loop.

There are two c compilers in z88dk. One is sccz80 which is the most advanced iteration of Ron Cain's original compiler from the late 70s; it's mostly C90 now. This compiler is not an optimizing compiler - it's intention is to generate small code instead. So you will see many compiler primitives being carried out in subroutine calls. The idea behind it is that z88dk provides a substantial c library that is written entirely in asm language so the c compiler is intended to produce glue code while the execution time is spent in hand-written assembler.

The other c compiler is a fork of sdcc called zsdcc. This one has been improved on and produces better & smaller code than sdcc itself does. sdcc is an optimizing compiler but it tends to produce larger code than sccz80 and overuses the z80's index registers. The version in z88dk, zsdcc, fixes many of these sorts of issues and now produces comparable code size to sccz80 when the --opt-code-size switch is used.

This is what I get for the above when I compile using sccz80:

zcc +zx -vn -a -clib=new test.c

(the -O3 switch is for code size reduction but I prefer the default -O2 most of the time)

._main
    ld  hl,0    ;const
    ld  a,l
    ld  (_i),a
    jp  i_4
.i_2
    ld  hl,_i
    call    l_gchar
    inc hl
    ld  a,l
    ld  (_i),a
    dec hl
.i_4
    ld  hl,_i
    call    l_gchar
    ld  de,10   ;const
    ex  de,hl
    call    l_lt
    jp  nc,i_3
    ld  hl,_data
    push    hl
    ld  hl,_i
    call    l_gchar
    pop de
    add hl,de
    ld  (hl),#(0 % 256)
    ld  l,(hl)
    ld  h,0
    jp  i_2
.i_3
    ret

Here you see the subroutine calls for compiler primitives and the fact the compiler is forced to use memory to hold the for-loop index. "l_lt" is a signed comparison.

A zsdcc compile with optimization turned up:

zcc +zx -vn -a -clib=sdcc_iy -SO3 --max-allocs-per-node200000 test.c

_main:
    ld  hl,_i
    ld  (hl),0x00
l_main_00102:
    ld  hl,(_i)
    ld  h,0x00
    ld  bc,_data
    add hl,bc
    xor a,a
    ld  (hl),a
    ld  hl,_i
    ld  a,(hl)
    inc a
    ld  (hl),a
    sub a,0x0a
    jr  C,l_main_00102
    ret

By default char is unsigned in zsdcc and it's noticed that the comparison "i<10" can be done in 8-bits. C rules say both sides should be promoted to int but it's ok not to do that if the compiler can figure out the comparison can be equivalently done another way. When you don't specify that your chars are unsigned, this promotion can lead to insertion of sign extension code.

If I now make the char explicitly unsigned and declare i inside the for-loop:

unsigned char data[10];

void main(void)
{
  for (unsigned char i=0; i<10; i++)
    data[i]=0;
}

sccz80 does this:

zcc +zx -vn -a -clib=new test.c

._main
    dec sp
    pop hl
    ld  l,#(0 % 256)
    push    hl
    jp  i_4
.i_2
    ld  hl,0    ;const
    add hl,sp
    inc (hl)
.i_4
    ld  hl,0    ;const
    add hl,sp
    ld  a,(hl)
    cp  #(10 % 256)
    jp  nc,i_3
    ld  de,_data
    ld  hl,2-2  ;const
    add hl,sp
    ld  l,(hl)
    ld  h,0
    add hl,de
    ld  (hl),#(0 % 256 % 256)
    ld  l,(hl)
    ld  h,0
    jp  i_2
.i_3
    inc sp
    ret

The comparison is now 8-bit and no subroutine calls are used. However, sccz80 cannot put the index i into a register - it does not carry enough information to do that so it instead makes it a stack variable.

The same for zsdcc:

zcc +zx -vn -a -clib=sdcc_iy -SO3 --max-allocs-per-node200000 test.c

_main:
    ld  bc,_data+0
    ld  e,0x00
l_main_00103:
    ld  a, e
    sub a,0x0a
    ret NC
    ld  l,e
    ld  h,0x00
    add hl, bc
    ld  (hl),0x00
    inc e
    jr  l_main_00103

Comparisons are unsigned and 8-bit. The for loop variable is kept in register E.

What about if we walk the array instead of indexing it?

unsigned char data[10];

void main(void)
{
  for (unsigned char *p = data; p != data+10; ++p)
      *p = 0;
}

zcc +zx -vn -a -clib=sdcc_iy -SO3 --max-allocs-per-node200000 test.c

_main:
    ld  bc,_data
l_main_00103:
    ld  a, c
    sub a,+((_data+0x000a) & 0xFF)
    jr  NZ,l_main_00116
    ld  a, b
    sub a,+((_data+0x000a) / 256)
    jr  Z,l_main_00105
l_main_00116:
    xor a, a
    ld  (bc), a
    inc bc
    jr  l_main_00103
l_main_00105:
    ret

The pointer is held in BC, the end condition is a 16-bit comparison and the result is the main loop takes about the same amount of time.

Then the question is why isn't this done with a memset?

#include 

unsigned char data[10];

void main(void)
{
    memset(data, 0, 10);
}

zcc +zx -vn -a -clib=sdcc_iy -SO3 --max-allocs-per-node200000 test.c

_main:
    ld  b,0x0a
    ld  hl,_data
l_main_00103:
    ld  (hl),0x00
    inc hl
    djnz    l_main_00103
    ret

For larger transfers this becomes an inlined ldir.

In general the c compilers cannot currently generate the common z80 cisc instructions ldir, cpir, djnz, etc but they do in certain circumstances as shown above. They are also not able to use the exx set. However, the substantial c library that comes with z88dk does make full use of the z80 architecture so anyone using the library will benefit from asm level performance (sdcc's own library is written in c so is not at the same performance level). However, beginner c programmers are usually not using the library either because they're not familiar with it and that's on top of making performance mistakes when they don't understand how the c maps to the underlying processor.

The c compilers are not able to do everything, however they're not helpless either. To get the best code out, you have to understand the consequences of the kind of c code you write and not just throw something together.

Other News

Menu
Need Help? Chat with us