Hi,

I have this problem in private backend, but it is reproducible on
x86-gcc also, so I suppose core GCC probems. Lets consider simple
example:

unsigned int buffer[10];

__attribute__((noinline)) void
myFunc(unsigned int a, unsigned int b, unsigned int c)
{
  unsigned int tmp;
  if( a & 0x2 )
    {
      tmp = 0x3221;
      tmp |= (a & 0xF) << 24;
      tmp |= (a & 0x3) << 2;
    }
  else
    {
      tmp = 0x83621;
      tmp |= (a & 0xF) << 24;
      tmp |= (a & 0x3) << 2;
    }
  buffer[0] = tmp;
}

And compile it with -Os to assembler. It yields:

movl %edi, %eax
andl $15, %eax
sall $24, %eax
testb $2, %dil
je .L2
andl $3, %edi
sall $2, %edi
orl %edi, %eax
orl $12833, %eax
jmp .L3
.L2:
andl $3, %edi
sall $2, %edi
orl %edi, %eax
orl $538145, %eax
.L3:
movl %eax, buffer(%rip)
ret

There is a big common code fragment:

andl $3, %edi
sall $2, %edi
orl %edi, %eax

That potentially can be moved upper and reduce code size. But it
doesn't. Also on O2.

Things are even worse in my backend, because my target have
conditional or and all code possibly may be linearized in this case
with good performance impact, and only thing why not is because GCC
fails to move common part out of the if-else switch upper prior to
trying conditional execution.

Can someone advice me how to tune target machine options to signal,
that moving common parts out of the conditional expressions is
profitable? Or the only way is to write custom pass?

---
With best regards, Konstantin

Reply via email to