Hi list,
consider the following test code: static void inline f1(int arg) { register int a1 asm("r8") = 10; register int a2 asm("r1") = arg; asm("scall" : : "r"(a1), "r"(a2)); } void f2(int arg) { f1(arg >> 10); } If you compile this code with 'lm32-gcc -O1 -S -c test.c' (see end of this email), the a1 = 10; assignment is optimized away. According to my understanding the following happens: 1) function inlining 2) deferred argument evaluation 3) because our target has no barrel shifter, (arg >> 10) is emitted as a function call to libgcc's __ashrsi3 (_in place_!) 4) BAM! dead code elimination optimizes r8 assignment away because calli may clobber r1-r10 (callee saved registers on lm32). If you use: void f2(int arg) { f1(__ashrsi3(arg, 10)); } everything works as expected, __ashrsi3 is evaluated before the body of f1. According to wikipedia [1], function calls are sequence points and all side effects for the arguments are completed before entering the function. So in my understanding the deferred argument evaluation is wrong if that operation is emitted as a call to a libgcc helper. I tried that on other architectures too (microblaze and avr). All show the same behaviour. If an integer arithmetic opcode is translated to a call to libgcc, every assignment to a register which is clobbered by the call is optimized away. The GCC mentions some caveats when using explicit register variables [2]: In the above example, beware that a register that is call-clobbered by the target ABI will be overwritten by any function call in the assignment, including library calls for arithmetic operators. Also a register may be clobbered when generating some operations, like variable shift, memory copy or memory move on x86. Assuming it is a call-clobbered register, this may happen to r0 above by the assignment to p2. If you have to use such a register, use temporary variables for expressions between the register assignment. But i think, this may not apply to the case above, where the arithmetic operator is an argument of the called function. Eg. there is a sequence point and the statements must not be reordered. Assembler output (lm32-gcc -O1 -S -c test.c): f2: addi sp, sp, -4 sw (sp+4), ra addi r2, r0, 10 calli __ashrsi3 scall lw ra, (sp+4) addi sp, sp, 4 b ra Assembler output with no DCE (lm32-gcc -O1 -S -fno-dce -c test.c) f2: addi sp, sp, -4 sw (sp+4), ra addi r8, r0, 10 addi r2, r0, 10 calli __ashrsi3 scall lw ra, (sp+4) addi sp, sp, 4 b ra [1] http://en.wikipedia.org/wiki/Sequence_point [2] http://gcc.gnu.org/onlinedocs/gcc/Extended- Asm.html#Example%20of%20asm%20with%20clobbered%20asm%20reg -- Michael