http://gcc.gnu.org/bugzilla/show_bug.cgi?id=53669
Bug #: 53669 Summary: suboptimal small switch - 3-way jump with only 1 comparison Classification: Unclassified Product: gcc Version: 4.8.0 Status: UNCONFIRMED Severity: enhancement Priority: P3 Component: rtl-optimization AssignedTo: unassig...@gcc.gnu.org ReportedBy: vermaelen.wou...@gmail.com Hi, The program below generates sub-optimal code for small (+dense) switch statements (small enough so that they are not implemented via jump-tables). The function foo() is the smallest example I could come up with that shows the problem. The function bar() shows the same problem in a slightly more realistic context. The function foo() also shows a missed opportunity for jump-threading(?), should I file a separate bug report for this? Tested on linux x86_64 with SVN revision thrunk@188550. Below I show the code generated with -Os, but -O2 and -O3 show the same missed optimization. Wouter -------8<---------8<---------8<--------8<--------8<------- void f0(); void f1(); void f2(); void f3(); // Toy example to demonstrate the problem void foo(int x) { switch (x) { case 0: f0(); break; case 1: f1(); break; default: f2(); break; } } // This is the code generated by 'g++-thrunk@188550 -Os' // _Z3fooi: testl %edi, %edi <-- 1 // je .L5 // decl %edi <-- 2 comparisons // jne .L7 // jmp .L6 <-- why not immediately jump to _Z2f1v? // .L5: jmp _Z2f0v // .L6: jmp _Z2f1v // .L7: jmp _Z2f2v // This generates optimal(?) code void my_foo(int x) { asm goto ( "cmpl $1,%[x];" // only 1 comparison "je %l[l1];" "jb %l[l0];" :: [x] "r" (x) :: l0, l1 ); l2: f2(); return; l0: f0(); return; l1: f1(); return; } // Bigger example in a slightly more realistic context: // e.g. a main loop that handles 4 elements per iteration and a switch // like this after that loop to handle the remaining elements. void bar(int x) { switch (x & 3) { case 0: f0(); break; case 1: f1(); break; case 2: f2(); break; case 3: f3(); break; } } // This is the code generated by 'g++-thrunk@188550 -Os' // _Z3bari: andl $3, %edi // cmpl $2, %edi <-- 1 // je .L11 // cmpl $3, %edi <-- 2 // je .L12 // decl %edi <-- 3 comparisons // je .L10 // jmp _Z2f0v // .L10: jmp _Z2f1v // .L11: jmp _Z2f2v // .L12: jmp _Z2f3v // This generates optimal(?) code void my_bar(int x) { asm goto ( "andl $3, %[x];" // implicit comparison with 0 "je %l[l0];" "cmpl $2, %[x];" // 1 explicit comparison "je %l[l2];" "jb %l[l1];" :: [x] "r" (x) :: l0, l1, l2 ); l3: f3(); return; l0: f0(); return; l1: f1(); return; l2: f2(); return; }