https://gcc.gnu.org/bugzilla/show_bug.cgi?id=82862
Bug ID: 82862 Summary: [8 Regression] SPEC CPU2006 465.tonto performance regression with trunk@253975 (up to 40% drop for particular loop) Product: gcc Version: unknown Status: UNCONFIRMED Severity: normal Priority: P3 Component: tree-optimization Assignee: unassigned at gcc dot gnu.org Reporter: alexander.nesterovskiy at intel dot com Target Milestone: --- Created attachment 42552 --> https://gcc.gnu.org/bugzilla/attachment.cgi?id=42552&action=edit reproducer Regression is well noticeable when 465.tonto is compiled with: -Ofast -march=core-avx2 -mfpmath=sse -funroll-loops Changes in cost model leads to changes in unrolling and vectorizing of few loops and causes increase of their execution time up to 60%. Whole 465.tonto benchmark regression is not so big and is about 2-4% just because the affected loops are less than 10% of all workload. Compiling with "-fopt-info-all-optall=all.optimized" and grepping for particular line: r253973: shell1quartet.fppized.f90:4086:0: note: loop unrolled 7 times shell1quartet.fppized.f90:4086:0: note: loop unrolled 7 times r253975: shell1quartet.fppized.f90:4086:0: note: loop vectorized shell1quartet.fppized.f90:4086:0: note: loop vectorized shell1quartet.fppized.f90:4086:0: note: loop with 6 iterations completely unrolled shell1quartet.fppized.f90:4086:0: note: loop with 6 iterations completely unrolled shell1quartet.fppized.f90:4086:0: note: loop unrolled 3 times shell1quartet.fppized.f90:4086:0: note: loop unrolled 1 times There was a change introduced by r254012: shell1quartet.fppized.f90:4086:0: note: loop vectorized shell1quartet.fppized.f90:4086:0: note: loop vectorized shell1quartet.fppized.f90:4086:0: note: loop with 3 iterations completely unrolled shell1quartet.fppized.f90:4086:0: note: loop with 3 iterations completely unrolled shell1quartet.fppized.f90:4086:0: note: loop unrolled 3 times shell1quartet.fppized.f90:4086:0: note: loop unrolled 1 times But still there is a degradation of these particular loops up to 40%. Reproducer is attached.