Re: [RL78] Questions about code-generation

Richard Hulme Fri, 21 Mar 2014 15:36:22 -0700

On 11/03/14 01:40, DJ Delorie wrote:

I'm curious.  Have you tried out other approaches before you decided
to go with the virtual registers?


Yes.  Getting GCC to understand the "unusual" addressing modes the
RL78 uses was too much for the register allocator to handle.  Even
when the addressing modes are limited to "usual" ones, GCC doesn't
have a good way to do regalloc and reload when there are limits on
what registers you can use in an address expression, and it's worse
when there are dependencies between operands, or limited numbers of
address registers.

Is it possible that the virtual pass causes inefficiencies in some casesby sticking with r8-r31 when one of the 'normal' registers would be better?

For example, I'm having a devil of a time convincing the compiler thatan immediate value can be stored directly in any of the normal 16-bitregisters (e.g. 'movw hl, #123'). I'm beginning to wonder whether it'sthe unoptimized code being fed in that's causing problems.

Taking a slight variation on my original test code (removing the'volatile' keyword and accessing an 8-bit memory location):


--------

#define SOE0L (*(unsigned char *)0xF012A)

void orTest()
{
   SOE0L |= 3;
}

--------

produces (with -O0)

  28                                    _test:
  29 0000 C9 F0 2A 01                           movw    r8, #298
  30 0004 C9 F2 2A 01                           movw    r10, #298
  31 0008 AD F2                                 movw    ax, r10
  32 000a BD F4                                 movw    r12, ax
  33 000c FA F4                                 movw    hl, r12
  34 000e 8B                                    mov     a, [hl]
  35 000f 9D F2                                 mov     r10, a
  36 0011 6A F2 03                              or      r10, #3
  37 0014 AD F0                                 movw    ax, r8
  38 0016 BD F4                                 movw    r12, ax
  39 0018 DA F4                                 movw    bc, r12
  40 001a 8D F2                                 mov     a, r10
  41 001c 48 00 00                              mov     [bc], a
  42 001f D7                                    ret

In some cases, the normal optimization steps remove a lot, if not all,of the unnecessary register passing, but not always.

The conditions on the movhi_real insn allow an immediate value to bestored in (for example) HL directly, and yet I cannot find a singleinstance in my project where it isn't in the form of


movw    r8, #298
movw    ax, r10
movw    hl, ax

and no manner of re-arranging the conditions (that I've found) willcause the correct code to be generated. It's determined to put theimmediate value into rX, and then copy that into ax (which is alsounnecessary).

I see the same problem with 'cmp' when the value to be compared is inthe A register:


mov     r8, a
cmp     r8, #3

The A register is the one register that can be almost guaranteed to beusable with any instruction, and copying it to R8 (or wherever) toperform the comparison not only wastes two bytes for the move but alsomakes the cmp instruction a byte longer, so five bytes are used insteadof two.

I looked at the code produced for IA64 and ARM targets, and although I'mnot as familiar with those instruction sets, they didn't appear to do asmuch needless copying, which strengthens my suspicion that it'ssomething in the RL78 backend that needs 'tweaking'.

The suggestions made regarding 'volatile' were very helpful and I'vemade some good savings elsewhere by adding support for differentaddressing modes and more efficient instructions but there are still anumber of (theoretically) easy pickings that should (I feel) be possiblebefore more complicated optimizations need to be looked at.

As ever, any suggestions are very gratefully received. I hope to beable to post some patches once I'm comfortable that I haven't missedanything obvious or done something stupid.


Regards,

Richard.

Re: [RL78] Questions about code-generation

Reply via email to