http://gcc.gnu.org/bugzilla/show_bug.cgi?id=51840
Bug #: 51840 Summary: asm goto incorrect code generation at -O2 and -O3 Classification: Unclassified Product: gcc Version: 4.7.0 Status: UNCONFIRMED Severity: normal Priority: P3 Component: tree-optimization AssignedTo: unassig...@gcc.gnu.org ReportedBy: a...@consulting.net.nz Created attachment 26310 --> http://gcc.gnu.org/bugzilla/attachment.cgi?id=26310 asm goto incorrect code generation at -O2 and -O3 Attached is the simplest test case causing incorrect code generation I've been able to craft. I've added comments, expanded out macros by hand, eliminated many dispatch destinations and built a define switch so you can compare the code to computed goto by commenting out the #define ASM_GOTO statement. I original tested asm goto as a replacement for computed goto since GCC doesn't generate complex jmp instructions [jmp [m] is superior to mov [m]->r; jmp r. The complex jmp is an overall shorter instruction sequence and if correctly predicted may have an overall latency of zero]. asm goto requires a list of destination labels. By only listing possible destinations of each dispatch GCC has an opportunity to generate better code. I think I have uncovered a bug exercising this increased optimization potential. Here's the problem (with gcc 4.5, 4.6 or snapshot; the printed destination addresses may differ): $ gcc -Wall -O0 -std=gnu99 exec_code.c && ./a.out atop__a_dec: 0x400648 atop__a_non_zero_p: 0x40068c atop__exit: 0x400735 atop_f__jmp_if_true: 0x4006cc atop_t__jmp_if_true: 0x4006f2 fixme: 0x400742 atop = 10 atop = 9 atop = 8 atop = 7 atop = 6 atop = 5 atop = 4 atop = 3 atop = 2 atop = 1 $ gcc -Wall -O1 -std=gnu99 exec_code.c && ./a.out atop__a_dec: 0x400634 atop__a_non_zero_p: 0x40066b atop__exit: 0x4006ad atop_f__jmp_if_true: 0x400688 atop_t__jmp_if_true: 0x400698 fixme: 0x4006ca atop = 10 atop = 9 atop = 8 atop = 7 atop = 6 atop = 5 atop = 4 atop = 3 atop = 2 atop = 1 $ gcc -Wall -O2 -std=gnu99 exec_code.c && ./a.out atop__a_dec: 0x400657 atop__a_non_zero_p: 0x4006ea atop__exit: 0x400678 atop_f__jmp_if_true: 0x400710 atop_t__jmp_if_true: 0x4006f8 fixme: 0x400660 Dispatch logic ERROR $ gcc -Wall -O3 -std=gnu99 exec_code.c && ./a.out atop__a_dec: 0x400657 atop__a_non_zero_p: 0x4006ea atop__exit: 0x400678 atop_f__jmp_if_true: 0x4006bc atop_t__jmp_if_true: 0x400700 fixme: 0x400660 Dispatch logic ERROR I am surprised that -Os correctly counts down: $ gcc -Wall -Os -std=gnu99 exec_code.c && ./a.out atop__a_dec: 0x400636 atop__a_non_zero_p: 0x40065f atop__exit: 0x40069b atop_f__jmp_if_true: 0x40067b atop_t__jmp_if_true: 0x400686 fixme: 0x4006a7 atop = 10 atop = 9 atop = 8 atop = 7 atop = 6 atop = 5 atop = 4 atop = 3 atop = 2 atop = 1 This may narrow the bug down to one of the optimisations performed at -O2,-O3 that is not performed at -Os. Any level of optimization is OK when #define ASM_GOTO is commented out: $ gcc -Wall -O3 -std=gnu99 exec_code.c && ./a.out atop__a_dec: 0x400660 atop__a_non_zero_p: 0x400690 atop__exit: 0x4006e0 atop_f__jmp_if_true: 0x4006b0 atop_t__jmp_if_true: 0x4006c8 fixme: 0x400700 atop = 10 atop = 9 atop = 8 atop = 7 atop = 6 atop = 5 atop = 4 atop = 3 atop = 2 atop = 1