https://gcc.gnu.org/bugzilla/show_bug.cgi?id=82862

            Bug ID: 82862
           Summary: [8 Regression] SPEC CPU2006 465.tonto performance
                    regression with trunk@253975 (up to 40% drop for
                    particular loop)
           Product: gcc
           Version: unknown
            Status: UNCONFIRMED
          Severity: normal
          Priority: P3
         Component: tree-optimization
          Assignee: unassigned at gcc dot gnu.org
          Reporter: alexander.nesterovskiy at intel dot com
  Target Milestone: ---

Created attachment 42552
  --> https://gcc.gnu.org/bugzilla/attachment.cgi?id=42552&action=edit
reproducer

Regression is well noticeable when 465.tonto is compiled with:
-Ofast -march=core-avx2 -mfpmath=sse -funroll-loops

Changes in cost model leads to changes in unrolling and vectorizing of few
loops and causes increase of their execution time up to 60%.
Whole 465.tonto benchmark regression is not so big and is about 2-4% just
because the affected loops are less than 10% of all workload.

Compiling with "-fopt-info-all-optall=all.optimized" and grepping for
particular line:
r253973:
 shell1quartet.fppized.f90:4086:0: note: loop unrolled 7 times
 shell1quartet.fppized.f90:4086:0: note: loop unrolled 7 times

r253975:
 shell1quartet.fppized.f90:4086:0: note: loop vectorized
 shell1quartet.fppized.f90:4086:0: note: loop vectorized
 shell1quartet.fppized.f90:4086:0: note: loop with 6 iterations completely
unrolled
 shell1quartet.fppized.f90:4086:0: note: loop with 6 iterations completely
unrolled
 shell1quartet.fppized.f90:4086:0: note: loop unrolled 3 times
 shell1quartet.fppized.f90:4086:0: note: loop unrolled 1 times

There was a change introduced by r254012: 
 shell1quartet.fppized.f90:4086:0: note: loop vectorized
 shell1quartet.fppized.f90:4086:0: note: loop vectorized
 shell1quartet.fppized.f90:4086:0: note: loop with 3 iterations completely
unrolled
 shell1quartet.fppized.f90:4086:0: note: loop with 3 iterations completely
unrolled
 shell1quartet.fppized.f90:4086:0: note: loop unrolled 3 times
 shell1quartet.fppized.f90:4086:0: note: loop unrolled 1 times

But still there is a degradation of these particular loops up to 40%.

Reproducer is attached.

Reply via email to