https://gcc.gnu.org/bugzilla/show_bug.cgi?id=84481
Richard Biener <rguenth at gcc dot gnu.org> changed: What |Removed |Added ---------------------------------------------------------------------------- Priority|P3 |P2 Status|UNCONFIRMED |NEW Last reconfirmed| |2019-04-11 Ever confirmed|0 |1 --- Comment #10 from Richard Biener <rguenth at gcc dot gnu.org> --- (In reply to Martin Liška from comment #3) > Interesting. Do I understand that correctly that it's due to increasing > addresses of the 3 load instructions: 0x8(%rdx), 0x18(%rdx), 0x30(%rdx) vs. > 0x18(%rdx) 0x30(%rdx) 0x8(%rdx) ? I would guess that the hardware prefetcher might be sensitive to this. But note that depending on the frontend any two of the loads might issue in parallel. It seems this is some kind of list-walking so HW prefetching possibly doesn't (and should not) trigger. Anyways, it's probably a cache subsystem "issue". Ordering memory references might be an interesting post-reload scheduling heuristic we could employ here.