http://gcc.gnu.org/bugzilla/show_bug.cgi?id=49095
Summary: Horrible code generation for trivial decrement with test Product: gcc Version: 4.5.1 Status: UNCONFIRMED Severity: normal Priority: P3 Component: other AssignedTo: unassig...@gcc.gnu.org ReportedBy: torva...@linux-foundation.org This trivial code: extern void fncall(void *); int main(int argc, char **argv) { if (!--*argv) fncall(argv); return 0; } compiles into this ridiculous x86-64 assembly language: movq (%rsi), %rax subq $1, %rax testq %rax, %rax movq %rax, (%rsi) je .L4 for the "decrement and test result" at -O2. I'd have expected that any reasonable compiler would generate something like decq (%rsi) je .L4 instead, which would be smaller and faster (even a "subq $1" would be fine, but the decq is one byte shorter). The problem is more noticeable when the memory location is a structure offset, when the "load+decrement+store" model really results in relatively much bigger code due to the silly repetition of the memory address, for absolutely no advantage. Is there some way that I haven't found to make gcc use the rmw instructions?