https://gcc.gnu.org/bugzilla/show_bug.cgi?id=56612
--- Comment #2 from Richard Biener <rguenth at gcc dot gnu.org> --- We now try hard to generate lane extracts for those uses but still when we fail (and know so during analysis - there's some support for "late" fails) we try to adjust costing for this. double x[1024]; double y; double foo () { y = x[1]; double r = x[0]; return r + x[1] + x[2]; } is currently not handled for example (detected during analysis) because of a ???, since the use in y = x[1] is before the last scalar stmt in the SLP node (r = x[0]) despite us emitting vector loads before the first scalar stmt (last is correct for any other stmt - but the story to compute the insert location is really complicated). Swapping y = x[1] and r = x[0] creates a lane extract as requested. vectorizable_live_operation has fallback code that refrains from replacing some uses, that should be priority one to avoid.