https://gcc.gnu.org/bugzilla/show_bug.cgi?id=86702
Bug ID: 86702 Summary: [8/9 Regression] SPEC CPU2006 400.perlbench, CPU2017 500.perlbench_r ~3% performance drop after r262247 Product: gcc Version: unknown Status: UNCONFIRMED Severity: normal Priority: P3 Component: tree-optimization Assignee: unassigned at gcc dot gnu.org Reporter: alexander.nesterovskiy at intel dot com Target Milestone: --- Created attachment 44453 --> https://gcc.gnu.org/bugzilla/attachment.cgi?id=44453&action=edit reproducer It looks like some branch probabilities information is being lost in some cases after r262247 during tree-switchlower1. As a result there are performance drops of ~3% for SPEC CPU2006/2017 perlbench with some particular compilation options/HW configurations because of a more heavy spilling/filling in hot block. It can be illustrated with a small example: --- $ cat > reproducer.c int foo(int bar) { switch (bar) { case 0: return bar + 5; case 1: return bar - 4; case 2: return bar + 3; case 3: return bar - 2; case 4: return bar + 1; case 5: return bar; default: return 0; } } ^Z [2]+ Stopped cat > reproducer.c $ ./r262246/bin/gcc -m64 -c -o /dev/null -O1 -fdump-tree-switchlower1=r262246_168t.switchlower1 reproducer.c $ ./r262247/bin/gcc -m64 -c -o /dev/null -O1 -fdump-tree-switchlower1=r262247_168t.switchlower1 reproducer.c $ cat r262246_168t.switchlower1 ;; Function foo (foo, funcdef_no=0, decl_uid=2007, cgraph_uid=1, symbol_order=0) beginning to process the following SWITCH statement (reproducer.c:3) : ------- switch (bar_2(D)) <default: <L6> [14.29%], case 0: <L9> [57.14%], case 1: <L8> [14.29%], case 2: <L9> [57.14%], case 3: <L3> [14.29%], case 4 ... 5: <L9> [57.14%]> ;; GIMPLE switch case clusters: JT:0-5 Removing basic block 9 Merging blocks 2 and 8 Merging blocks 2 and 7 Symbols to be put in SSA form { D.2019 } Incremental SSA update started at block: 0 Number of blocks in CFG: 7 Number of blocks to update: 6 ( 86%) foo (int bar) { int _1; <bb 2> [local count: 1073419798]: switch (bar_2(D)) <default: <L6> [14.29%], case 0: <L9> [57.14%], case 1: <L8> [14.29%], case 2: <L9> [57.14%], case 3: <L3> [14.29%], case 4 ... 5: <L9> [57.14%]> <bb 3> [local count: 613382737]: <L9>: goto <bb 6>; [100.00%] <bb 4> [local count: 153391689]: <L3>: goto <bb 6>; [100.00%] <bb 5> [local count: 153391689]: <L6>: <bb 6> [local count: 1073741825]: # _1 = PHI <0(5), -3(2), 1(4), 5(3)> <L8>: return _1; } $ cat r262247_168t.switchlower1 ;; Function foo (foo, funcdef_no=0, decl_uid=2007, cgraph_uid=1, symbol_order=0) beginning to process the following SWITCH statement (reproducer.c:3) : ------- switch (bar_2(D)) <default: <L6> [14.29%], case 0: <L9> [57.14%], case 1: <L8> [14.29%], case 2: <L9> [57.14%], case 3: <L3> [14.29%], case 4 ... 5: <L9> [57.14%]> ;; GIMPLE switch case clusters: JT:0-5 Removing basic block 7 Merging blocks 2 and 8 Merging blocks 2 and 9 Symbols to be put in SSA form { D.2019 } Incremental SSA update started at block: 0 Number of blocks in CFG: 7 Number of blocks to update: 6 ( 86%) foo (int bar) { int _1; <bb 2> [local count: 1073419798]: switch (bar_2(D)) <default: <L6> [INV], case 0: <L9> [INV], case 1: <L8> [INV], case 2: <L9> [INV], case 3: <L3> [INV], case 4 ... 5: <L9> [INV]> <bb 3> [local count: 613382737]: <L9>: goto <bb 6>; [100.00%] <bb 4> [local count: 153391689]: <L3>: goto <bb 6>; [100.00%] <bb 5> [local count: 153391689]: <L6>: <bb 6> [local count: 1073741825]: # _1 = PHI <0(5), -3(2), 1(4), 5(3)> <L8>: return _1; } --- Same for a current trunk (I tried r263027).