On 11/03/14 01:40, DJ Delorie wrote:
I'm curious.  Have you tried out other approaches before you decided
to go with the virtual registers?

Yes.  Getting GCC to understand the "unusual" addressing modes the
RL78 uses was too much for the register allocator to handle.  Even
when the addressing modes are limited to "usual" ones, GCC doesn't
have a good way to do regalloc and reload when there are limits on
what registers you can use in an address expression, and it's worse
when there are dependencies between operands, or limited numbers of
address registers.

Is it possible that the virtual pass causes inefficiencies in some cases by sticking with r8-r31 when one of the 'normal' registers would be better?

For example, I'm having a devil of a time convincing the compiler that an immediate value can be stored directly in any of the normal 16-bit registers (e.g. 'movw hl, #123'). I'm beginning to wonder whether it's the unoptimized code being fed in that's causing problems.

Taking a slight variation on my original test code (removing the 'volatile' keyword and accessing an 8-bit memory location):

--------

#define SOE0L (*(unsigned char *)0xF012A)

void orTest()
{
   SOE0L |= 3;
}

--------

produces (with -O0)

  28                                    _test:
  29 0000 C9 F0 2A 01                           movw    r8, #298
  30 0004 C9 F2 2A 01                           movw    r10, #298
  31 0008 AD F2                                 movw    ax, r10
  32 000a BD F4                                 movw    r12, ax
  33 000c FA F4                                 movw    hl, r12
  34 000e 8B                                    mov     a, [hl]
  35 000f 9D F2                                 mov     r10, a
  36 0011 6A F2 03                              or      r10, #3
  37 0014 AD F0                                 movw    ax, r8
  38 0016 BD F4                                 movw    r12, ax
  39 0018 DA F4                                 movw    bc, r12
  40 001a 8D F2                                 mov     a, r10
  41 001c 48 00 00                              mov     [bc], a
  42 001f D7                                    ret

In some cases, the normal optimization steps remove a lot, if not all, of the unnecessary register passing, but not always.

The conditions on the movhi_real insn allow an immediate value to be stored in (for example) HL directly, and yet I cannot find a single instance in my project where it isn't in the form of

movw    r8, #298
movw    ax, r10
movw    hl, ax

and no manner of re-arranging the conditions (that I've found) will cause the correct code to be generated. It's determined to put the immediate value into rX, and then copy that into ax (which is also unnecessary).

I see the same problem with 'cmp' when the value to be compared is in the A register:

mov     r8, a
cmp     r8, #3

The A register is the one register that can be almost guaranteed to be usable with any instruction, and copying it to R8 (or wherever) to perform the comparison not only wastes two bytes for the move but also makes the cmp instruction a byte longer, so five bytes are used instead of two.

I looked at the code produced for IA64 and ARM targets, and although I'm not as familiar with those instruction sets, they didn't appear to do as much needless copying, which strengthens my suspicion that it's something in the RL78 backend that needs 'tweaking'.

The suggestions made regarding 'volatile' were very helpful and I've made some good savings elsewhere by adding support for different addressing modes and more efficient instructions but there are still a number of (theoretically) easy pickings that should (I feel) be possible before more complicated optimizations need to be looked at.

As ever, any suggestions are very gratefully received. I hope to be able to post some patches once I'm comfortable that I haven't missed anything obvious or done something stupid.

Regards,

Richard.

Reply via email to