https://gcc.gnu.org/bugzilla/show_bug.cgi?id=104723

--- Comment #10 from Jakub Jelinek <jakub at gcc dot gnu.org> ---
(In reply to H.J. Lu from comment #8)
> > DSE can remove redundant load/store for TI, but not OI/XI.

DSE can remove redundant load/store for OI/XI just fine, just remove the last 7
from the string so that it is 48 bytes instead of 49 and all of sudden it works
fine.
It is indeed due to:

> It is due to overlapping store.

this.
Wonder if we couldn't special case overlapping stores if they are loaded from
constant pool and the overlapping bytes have the same values.

And for the backend, the question is how big the penalty for the overlapping
store is compared to doing multiple non-overlapping stores.  Say for those 49
bytes one could do one OI, one TI/V1TI and one QI load/store as opposed to
one aligned and one misaligned OI load/store.

For say:
void
foo (void *p, void *q)
{
  __builtin_memcpy (p, q, 49);
}
we emit the 2 overlapping loads/stores for -mavx512f and 4 non-overlapping
loads/stores with say -mavx2.

Reply via email to