https://gcc.gnu.org/bugzilla/show_bug.cgi?id=56612

--- Comment #2 from Richard Biener <rguenth at gcc dot gnu.org> ---
We now try hard to generate lane extracts for those uses but still when we fail
(and know so during analysis - there's some support for "late" fails) we try
to adjust costing for this.

double x[1024];
double y;
double foo ()
{
  y = x[1];
  double r = x[0];
  return r + x[1] + x[2];
}

is currently not handled for example (detected during analysis) because
of a ???, since the use in y = x[1] is before the last scalar stmt in
the SLP node (r = x[0]) despite us emitting vector loads before the
first scalar stmt (last is correct for any other stmt - but the story to
compute the insert location is really complicated).

Swapping y = x[1] and r = x[0] creates a lane extract as requested.
vectorizable_live_operation has fallback code that refrains from replacing
some uses, that should be priority one to avoid.

Reply via email to