On Thu, Feb 21, 2013 at 03:57:05PM +0400, Konstantin Vladimirov wrote: > Hi, > > Discovered this optimization possibilty on private backend, but can > easily reproduce on x86 > > Consider code, say test.c: > > static __attribute__((noinline)) unsigned int* > proxy1( unsigned int* codeBuffer, unsigned int oper, unsigned int a, unsigned > in > { > return codeBuffer; > } > > static __attribute__((noinline)) unsigned int* > proxy2( unsigned int* codeBuffer, unsigned int oper, unsigned int a, unsigned > in > { > return codeBuffer; > } > > __attribute__((noinline)) unsigned int* > myFunc( unsigned int* codeBuffer, unsigned int oper) > { > if( (oper & 0xF) == 14) > { > return proxy1( codeBuffer, oper, 0x22, 0x2102400b); > } > else > { > return proxy2( codeBuffer, oper, 0x22, 0x1102400b); > } > } > > With ~/x86-toolchain-4.7.2/bin/gcc -O2 -fomit-frame-pointer -S test.c, > gcc yields: > > myFunc: > .LFB2: > .cfi_startproc > andl $15, %esi > cmpl $14, %esi > je .L6 > jmp proxy2.isra.1 > .p2align 4,,10 > .p2align 3 > .L6: > jmp proxy1.isra.0 > > Which can be simplified to: > > myFunc: > .LFB2: > .cfi_startproc > andl $15, %esi > cmpl $14, %esi > je proxy2.isra.1 // <--- conditional sibling call here > .p2align 4,,10 > .p2align 3 > jmp proxy1.isra.0
Apart from the je/jne thinko you mentioned, the .p2align directives are completely useless in this case. Regards, Gabriel