https://gcc.gnu.org/bugzilla/show_bug.cgi?id=120941
--- Comment #26 from rguenther at suse dot de <rguenther at suse dot de> --- On Tue, 15 Jul 2025, pheeck at gcc dot gnu.org wrote: > https://gcc.gnu.org/bugzilla/show_bug.cgi?id=120941 > > --- Comment #25 from Filip Kastl <pheeck at gcc dot gnu.org> --- > (In reply to H.J. Lu from comment #24) > > Why is it bad for znver2? > > Oh, I thought we are trying to figure that out. Spilling because of register > pressure, as richi suggested in comment 3, is the best guess we currently > have. > > I'll see if I can confirm that there is some extra spilling. For the testcase you reduced to, sanitized a bit: enum { ST, SB, ET, EB, WT, WB } LBM_initializeGrid(double *grid) { grid[ST] = grid[SB] = grid[ET] = grid[EB] = grid[WT] = grid[WB] = 1.0 / 36.0; } this is LBM_initializeGrid: .LFB0: .cfi_startproc vmovddup .LC1(%rip), %xmm0 vmovupd %xmm0, 32(%rdi) vbroadcastsd .LC1(%rip), %ymm0 vmovupd %ymm0, (%rdi) vzeroupper ret vs. LBM_initializeGrid: .LFB0: .cfi_startproc vbroadcastsd .LC1(%rip), %ymm0 vmovupd %xmm0, 32(%rdi) vmovupd %ymm0, (%rdi) vzeroupper ret the latter (new) version is better. I would expect that if the two uses are far apart you get extra spilling as I said. I'd have restricted the optimization to uses within a single basic block for example. If we'd have a tunable/--param for that you could see if that helps the regressions.