https://gcc.gnu.org/bugzilla/show_bug.cgi?id=89226

--- Comment #1 from Marc Glisse <glisse at gcc dot gnu.org> ---
The optimized dump for copy1 looks like

  *to_2(D) = *from_3(D);

so we get essentially memcpy, while copy2 has

  _4 = MEM[(const struct foo512 &)from_3(D)].a;
  MEM[(struct foo512 *)to_2(D)].a = _4;
  _5 = MEM[(const struct foo512 &)from_3(D)].b;
  MEM[(struct foo512 *)to_2(D)].b = _5;

which we expand literally.

I agree that we should generate the same code for both (ideally we would reach
expand with essentially the same GIMPLE representation, although I am not sure
how).

A question is whether the memcpy expansion is optimal for that target. It could
be that as long as you are only copying a rather small object, it isn't worth
switching to larger registers which cause a drop in the processor frequency.
However the code generated is not impacted if I use other AVX instructions
nearby. -Os can make us generate 'rep movsl' for copy1.

Reply via email to