https://gcc.gnu.org/bugzilla/show_bug.cgi?id=86702

            Bug ID: 86702
           Summary: [8/9 Regression] SPEC CPU2006 400.perlbench, CPU2017
                    500.perlbench_r ~3% performance drop after r262247
           Product: gcc
           Version: unknown
            Status: UNCONFIRMED
          Severity: normal
          Priority: P3
         Component: tree-optimization
          Assignee: unassigned at gcc dot gnu.org
          Reporter: alexander.nesterovskiy at intel dot com
  Target Milestone: ---

Created attachment 44453
  --> https://gcc.gnu.org/bugzilla/attachment.cgi?id=44453&action=edit
reproducer

It looks like some branch probabilities information is being lost in some cases
after r262247 during tree-switchlower1.

As a result there are performance drops of ~3% for SPEC CPU2006/2017 perlbench
with some particular compilation options/HW configurations because of a more
heavy spilling/filling in hot block.

It can be illustrated with a small example:
---
$ cat > reproducer.c
int foo(int bar)
{
    switch (bar)
    {
    case 0:
        return bar + 5;
    case 1:
        return bar - 4;
    case 2:
        return bar + 3;
    case 3:
        return bar - 2;
    case 4:
        return bar + 1;
    case 5:
        return bar;
    default:
        return 0;
    }
}
^Z
[2]+  Stopped                 cat > reproducer.c
$ ./r262246/bin/gcc -m64 -c -o /dev/null -O1
-fdump-tree-switchlower1=r262246_168t.switchlower1 reproducer.c
$ ./r262247/bin/gcc -m64 -c -o /dev/null -O1
-fdump-tree-switchlower1=r262247_168t.switchlower1 reproducer.c
$ cat r262246_168t.switchlower1

;; Function foo (foo, funcdef_no=0, decl_uid=2007, cgraph_uid=1,
symbol_order=0)

beginning to process the following SWITCH statement (reproducer.c:3) : -------
switch (bar_2(D)) <default: <L6> [14.29%], case 0: <L9> [57.14%], case 1: <L8>
[14.29%], case 2: <L9> [57.14%], case 3: <L3> [14.29%], case 4 ... 5: <L9>
[57.14%]>

;; GIMPLE switch case clusters: JT:0-5
Removing basic block 9
Merging blocks 2 and 8
Merging blocks 2 and 7

Symbols to be put in SSA form
{ D.2019 }
Incremental SSA update started at block: 0
Number of blocks in CFG: 7
Number of blocks to update: 6 ( 86%)


foo (int bar)
{
  int _1;

  <bb 2> [local count: 1073419798]:
  switch (bar_2(D)) <default: <L6> [14.29%], case 0: <L9> [57.14%], case 1:
<L8> [14.29%], case 2: <L9> [57.14%], case 3: <L3> [14.29%], case 4 ... 5: <L9>
[57.14%]>

  <bb 3> [local count: 613382737]:
<L9>:
  goto <bb 6>; [100.00%]

  <bb 4> [local count: 153391689]:
<L3>:
  goto <bb 6>; [100.00%]

  <bb 5> [local count: 153391689]:
<L6>:

  <bb 6> [local count: 1073741825]:
  # _1 = PHI <0(5), -3(2), 1(4), 5(3)>
<L8>:
  return _1;

}


$ cat r262247_168t.switchlower1

;; Function foo (foo, funcdef_no=0, decl_uid=2007, cgraph_uid=1,
symbol_order=0)

beginning to process the following SWITCH statement (reproducer.c:3) : -------
switch (bar_2(D)) <default: <L6> [14.29%], case 0: <L9> [57.14%], case 1: <L8>
[14.29%], case 2: <L9> [57.14%], case 3: <L3> [14.29%], case 4 ... 5: <L9>
[57.14%]>

;; GIMPLE switch case clusters: JT:0-5
Removing basic block 7
Merging blocks 2 and 8
Merging blocks 2 and 9

Symbols to be put in SSA form
{ D.2019 }
Incremental SSA update started at block: 0
Number of blocks in CFG: 7
Number of blocks to update: 6 ( 86%)


foo (int bar)
{
  int _1;

  <bb 2> [local count: 1073419798]:
  switch (bar_2(D)) <default: <L6> [INV], case 0: <L9> [INV], case 1: <L8>
[INV], case 2: <L9> [INV], case 3: <L3> [INV], case 4 ... 5: <L9> [INV]>

  <bb 3> [local count: 613382737]:
<L9>:
  goto <bb 6>; [100.00%]

  <bb 4> [local count: 153391689]:
<L3>:
  goto <bb 6>; [100.00%]

  <bb 5> [local count: 153391689]:
<L6>:

  <bb 6> [local count: 1073741825]:
  # _1 = PHI <0(5), -3(2), 1(4), 5(3)>
<L8>:
  return _1;

}
---

Same for a current trunk (I tried r263027).

Reply via email to