https://gcc.gnu.org/bugzilla/show_bug.cgi?id=114932
--- Comment #3 from Tamar Christina <tnfchris at gcc dot gnu.org> --- (In reply to Andrew Pinski from comment #2) > > which is harder for prefetchers to follow. > > This seems like a limitation in the HW prefetcher rather than anything else. > Maybe the cost model for addressing mode should punish base+index if so. > Many HW prefetchers I know of are based on the final VA (or even PA) rather > looking at the instruction to see if it increments or not ... That was the first thing we tried, and even increasing the cost of register_offset to something ridiculously high doesn't change a thing. IVopts thinks it needs to use it and generates: _1150 = (voidD.26 *) _1148; _1152 = (sizetype) l0_78(D); _1154 = _1152 * 324; _1156 = _1154 + 216; # VUSE <.MEM_421> vect__349.614_1418 = MEM <vector(2) integer(kind=4)D.9> [(integer(kind=4)D.9 *)_1150 + _1156 * 1 clique 2 base 0]; Hence the bug report to see what's going on.