https://gcc.gnu.org/bugzilla/show_bug.cgi?id=107946
Richard Biener <rguenth at gcc dot gnu.org> changed: What |Removed |Added ---------------------------------------------------------------------------- Target Milestone|--- |13.0 --- Comment #1 from Richard Biener <rguenth at gcc dot gnu.org> --- Nope, it wasn't supposed to speedup the benchmark but it indeed (with -Ofast) causes the hot loop kernels to be unswitched. Btw, do we know if train and ref data align up in these loops? Btw, with -Ofast on znver2 I didn't observe any change when benchmarking this. I'm trying to reproduce. OK, so with -O2 -flto -march=znver2 and FDO I get a runtime of 173s while adding -fno-unswitch-loops gets me 188s. There's currently no knob to specifically disable outer loop unswitching so I have to instead patch that up. With -O2 -flto -funswitch-loops (w/o FDO) I get 178s. I'm going to add a --param to allow easier reproduction.