It looks like gcc sometimes produces "useless" conditional branches. I've found code like this:
xor %edx,%edx ; code with no effect on edx (see full code below) test %edx,%edx jne <somewhere else> The branch on the last line is never taken. Why does gcc generate such code sequences? Is this patched at runtime, or something? Am I missing something obvious here? I append the function's complete code below. There is another suspicious branch at 0xb31cd8 (never taken, for less obvious reasons---edx is never zero at that point). I have found hundreds such occurrences across the CPU2006 suite. Does anybody have any idea why this happens? Is there any specific optimization to enable or disable to avoid such dead edges? Thanks in advance for any remark/idea/... This code is from 416.gamess (from SPEC CPU2006), function "formf", compiled with "gcc version 4.4.1 (Ubuntu 4.4.1-4ubuntu9)" (from a stock ubuntu 9.10), with options "-g -O3 -march=native -fno-optimize-sibling-calls". 416.gamess is compiled with gfortran, but the same thing happens with C or C++ programs. The same also happens at lower optimization levels (-01), but less frequently. uname -m gives "x86_64", and /proc/cpuinfo contains: vendor_id : GenuineIntel cpu family : 6 model : 23 model name : Intel(R) Core(TM)2 Duo CPU P8700 @ 2.53GHz [...] flags : fpu vme de pse tsc msr pae mce cx8 apic sep mtrr pge mca cmov pat pse36 clflush dts acpi mmx fxsr sse sse2 ss ht tm pbe syscall nx lm constant_tsc arch_perfmon pebs bts rep_good pni dtes64 monitor ds_cpl vmx smx est tm2 ssse3 cx16 xtpr pdcm sse4_1 xsave lahf_lm ida tpr_shadow vnmi flexpriority Here is an objdump disassembly of the code, broken into basic-blocks: 0000000000b31c60 <formf_>: b31c60: push %rbx b31c61: xor %edx,%edx b31c63: mov %rsi,%rbx b31c66: mov $0x2,%r9d b31c6c: mov $0xa,%r8d b31c72: mov $0x6,%esi b31c77: mov $0xe,%ecx b31c7c: test %edx,%edx b31c7e: jne b31cda <formf_+0x7a> b31c80: mov $0x3ff8000000000000,%r11 b31c8a: mov $0xbfe0000000000000,%r10 b31c94: mov %r11,(%rdi) b31c97: movq $0x0,0x40(%rdi) b31c9f: movq $0x0,0x20(%rdi) b31ca7: mov %r10,0x60(%rdi) b31cab: xor %eax,%eax b31cad: nopl (%rax) b31cb0: inc %edx b31cb2: movq $0x0,0x10(%rdi,%rax,8) b31cbb: movq $0x0,0x50(%rdi,%rax,8) b31cc4: movq $0x0,0x30(%rdi,%rax,8) b31ccd: movq $0x0,0x70(%rdi,%rax,8) b31cd6: test %edx,%edx b31cd8: je b31c80 <formf_+0x20> b31cda: cmp $0x2,%edx b31cdd: jne b31d18 <formf_+0xb8> b31cdf: mov $0xbfe0000000000000,%r11 b31ce9: mov $0xbfe0000000000000,%r10 b31cf3: mov %r11,(%rdi,%r9,8) b31cf7: mov $0x2,%eax b31cfc: movq $0x0,(%rdi,%r8,8) b31d04: movq $0x0,(%rdi,%rsi,8) b31d0c: mov %r10,(%rdi,%rcx,8) b31d10: jmp b31cb0 <formf_+0x50> b31d12: nopw 0x0(%rax,%rax,1) b31d18: movslq %edx,%rax b31d1b: cmp $0x1,%edx b31d1e: movq $0x0,(%rdi,%rax,8) b31d26: movq $0x0,0x40(%rdi,%rax,8) b31d2f: movq $0x0,0x20(%rdi,%rax,8) b31d38: movq $0x0,0x60(%rdi,%rax,8) b31d41: jne b31cb0 <formf_+0x50> b31d47: movq $0x0,0x50(%rdi,%rax,8) b31d50: movq $0x0,0x30(%rdi,%rax,8) b31d59: mov $0x3fe0000000000000,%r9 b31d63: mov $0xbff8000000000000,%r8 b31d6d: mov %r9,0x10(%rdi,%rax,8) b31d72: mov %r8,0x70(%rdi,%rax,8) b31d77: mov $0xd0f838,%edx b31d7c: mov $0xd0f7b0,%esi b31d81: mov %rbx,%rdi b31d84: xor %eax,%eax b31d86: callq 977b20 <vclr_> b31d8b: mov $0x3fe0000000000000,%rsi b31d95: mov $0x3fe0000000000000,%rcx b31d9f: mov %rsi,(%rbx) b31da2: mov %rcx,0x60(%rbx) b31da6: mov $0xbfe0000000000000,%rdx b31db0: mov $0xbfe0000000000000,%rax b31dba: mov %rdx,0x18(%rbx) b31dbe: mov %rax,0x78(%rbx) b31dc2: pop %rbx b31dc3: retq Let me know if more detail is needed. -- Alain.