libgcc: strange optimization

Michael Walle Mon, 01 Aug 2011 13:30:58 -0700

Hi list,


consider the following test code:
 static void inline f1(int arg)
 {
   register int a1 asm("r8") = 10;
   register int a2 asm("r1") = arg;

   asm("scall" : : "r"(a1), "r"(a2));
 }

 void f2(int arg)
 {
   f1(arg >> 10);
 }


If you compile this code with 'lm32-gcc -O1 -S -c test.c' (see end of this
email), the a1 = 10; assignment is optimized away. According to my
understanding the following happens:

 1) function inlining
 2) deferred argument evaluation
 3) because our target has no barrel shifter, (arg >> 10) is emitted as a
function call to libgcc's __ashrsi3 (_in place_!)
 4) BAM! dead code elimination optimizes r8 assignment away because calli
may clobber r1-r10 (callee saved registers on lm32).

If you use:
 void f2(int arg)
 {
   f1(__ashrsi3(arg, 10));
 }
everything works as expected, __ashrsi3 is evaluated before the body of f1.

According to wikipedia [1], function calls are sequence points and all
side effects for the arguments are completed before entering the function.
So in my understanding the deferred argument evaluation is wrong if that
operation is emitted as a call to a libgcc helper.

I tried that on other architectures too (microblaze and avr). All show the
same behaviour. If an integer arithmetic opcode is translated to a call to
libgcc, every assignment to a register which is clobbered by the call is
optimized away.

The GCC mentions some caveats when using explicit register variables [2]:
  In the above example, beware that a register that is call-clobbered by
  the target ABI will be overwritten by any function call in the
  assignment, including library calls for arithmetic operators. Also a
  register may be clobbered when generating some operations, like variable
  shift, memory copy or memory move on x86. Assuming it is a call-clobbered
  register, this may happen to r0 above by the assignment to p2. If you
  have to use such a register, use temporary variables for expressions
  between the register assignment.

But i think, this may not apply to the case above, where the arithmetic
operator is an argument of the called function. Eg. there is a sequence
point and the statements must not be reordered.


Assembler output (lm32-gcc -O1 -S -c test.c):
f2:
        addi     sp, sp, -4
        sw       (sp+4), ra
        addi     r2, r0, 10
        calli    __ashrsi3
        scall
        lw       ra, (sp+4)
        addi     sp, sp, 4
        b        ra

Assembler output with no DCE (lm32-gcc -O1 -S -fno-dce -c test.c)
f2:
        addi     sp, sp, -4
        sw       (sp+4), ra
        addi     r8, r0, 10
        addi     r2, r0, 10
        calli    __ashrsi3
        scall
        lw       ra, (sp+4)
        addi     sp, sp, 4
        b        ra

[1] http://en.wikipedia.org/wiki/Sequence_point
[2]
http://gcc.gnu.org/onlinedocs/gcc/Extended-
Asm.html#Example%20of%20asm%20with%20clobbered%20asm%20reg

-- 
Michael

libgcc: strange optimization

Reply via email to