http://gcc.gnu.org/bugzilla/show_bug.cgi?id=51840

             Bug #: 51840
           Summary: asm goto incorrect code generation at -O2 and -O3
    Classification: Unclassified
           Product: gcc
           Version: 4.7.0
            Status: UNCONFIRMED
          Severity: normal
          Priority: P3
         Component: tree-optimization
        AssignedTo: unassig...@gcc.gnu.org
        ReportedBy: a...@consulting.net.nz


Created attachment 26310
  --> http://gcc.gnu.org/bugzilla/attachment.cgi?id=26310
asm goto incorrect code generation at -O2 and -O3

Attached is the simplest test case causing incorrect code generation I've been
able to craft. I've added comments, expanded out macros by hand, eliminated
many dispatch destinations and built a define switch so you can compare the
code to computed goto by commenting out the #define ASM_GOTO statement.

I original tested asm goto as a replacement for computed goto since GCC doesn't
generate complex jmp instructions [jmp [m] is superior to mov [m]->r; jmp r.
The complex jmp is an overall shorter instruction sequence and if correctly
predicted may have an overall latency of zero].

asm goto requires a list of destination labels. By only listing possible
destinations of each dispatch GCC has an opportunity to generate better code. I
think I have uncovered a bug exercising this increased optimization potential.

Here's the problem (with gcc 4.5, 4.6 or snapshot; the printed destination
addresses may differ):

$ gcc -Wall -O0 -std=gnu99 exec_code.c && ./a.out 
atop__a_dec:         0x400648
atop__a_non_zero_p:  0x40068c
atop__exit:          0x400735
atop_f__jmp_if_true: 0x4006cc
atop_t__jmp_if_true: 0x4006f2
fixme:               0x400742
atop = 10
atop = 9
atop = 8
atop = 7
atop = 6
atop = 5
atop = 4
atop = 3
atop = 2
atop = 1

$ gcc -Wall -O1 -std=gnu99 exec_code.c && ./a.out 
atop__a_dec:         0x400634
atop__a_non_zero_p:  0x40066b
atop__exit:          0x4006ad
atop_f__jmp_if_true: 0x400688
atop_t__jmp_if_true: 0x400698
fixme:               0x4006ca
atop = 10
atop = 9
atop = 8
atop = 7
atop = 6
atop = 5
atop = 4
atop = 3
atop = 2
atop = 1

$ gcc -Wall -O2 -std=gnu99 exec_code.c && ./a.out 
atop__a_dec:         0x400657
atop__a_non_zero_p:  0x4006ea
atop__exit:          0x400678
atop_f__jmp_if_true: 0x400710
atop_t__jmp_if_true: 0x4006f8
fixme:               0x400660
Dispatch logic ERROR

$ gcc -Wall -O3 -std=gnu99 exec_code.c && ./a.out 
atop__a_dec:         0x400657
atop__a_non_zero_p:  0x4006ea
atop__exit:          0x400678
atop_f__jmp_if_true: 0x4006bc
atop_t__jmp_if_true: 0x400700
fixme:               0x400660
Dispatch logic ERROR


I am surprised that -Os correctly counts down:

$ gcc -Wall -Os -std=gnu99 exec_code.c && ./a.out 
atop__a_dec:         0x400636
atop__a_non_zero_p:  0x40065f
atop__exit:          0x40069b
atop_f__jmp_if_true: 0x40067b
atop_t__jmp_if_true: 0x400686
fixme:               0x4006a7
atop = 10
atop = 9
atop = 8
atop = 7
atop = 6
atop = 5
atop = 4
atop = 3
atop = 2
atop = 1


This may narrow the bug down to one of the optimisations performed at -O2,-O3
that is not performed at -Os.


Any level of optimization is OK when #define ASM_GOTO is commented out:

$ gcc -Wall -O3 -std=gnu99 exec_code.c && ./a.out 
atop__a_dec:         0x400660
atop__a_non_zero_p:  0x400690
atop__exit:          0x4006e0
atop_f__jmp_if_true: 0x4006b0
atop_t__jmp_if_true: 0x4006c8
fixme:               0x400700
atop = 10
atop = 9
atop = 8
atop = 7
atop = 6
atop = 5
atop = 4
atop = 3
atop = 2
atop = 1

Reply via email to