[Bug tree-optimization/92486] Wrong optimization: padding in structs is not copied even with memcpy

rguenther at suse dot de Fri, 15 Nov 2019 04:08:00 -0800

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=92486

--- Comment #14 from rguenther at suse dot de <rguenther at suse dot de> ---
On Fri, 15 Nov 2019, jamborm at gcc dot gnu.org wrote:

> https://gcc.gnu.org/bugzilla/show_bug.cgi?id=92486
> 
> --- Comment #13 from Martin Jambor <jamborm at gcc dot gnu.org> ---
> (In reply to rguent...@suse.de from comment #10)
> > [...] But total scalarization works with the premise
> > that we don't see any direct accesses to source or destination
> 
> That is not true, total scalarization just adds special artificial
> accesses to the aggregate to its data structures and hopes they will
> blend in nicely with whatever real accesses are already there, so
> that, for example, val can be propagated in:
> 
>   a.f = val;
>   d = c = b = a;
>   use (d.f);

OK, but these cases are handled by FRE just fine.  What is missing
elsewhere is the plain

 a = c = d;

where SRA manages to get rid of 'c' by total scalarization
(so the "SRA does aggregate copy prop" thing).

But yes, for your example we'd copy-prop out c and b which
might then have created "mismatching" accesses.  Btw, I had
the impression that we propagate accesses from 'a' to b, c and d
in the above case, if we do we only have to add "fake" accesses
to the remaining parts?

> This is a problem because SRA does not like scalar accesses within
> scalar accesses (I originally wanted to supported it but then backed
> out almost immediately) so if f did not happen to be exactly the size
> of the copy step, SRA would give up on the aggregate.  This is a
> limitation which we'd have to lift first, I'm afraid.
> 
> > so
> > I think we should simply change "total scalarization" to be
> > "emit the block-copy on GIMPLE".  Preferably without "crossing"
> > field boundaries but covering padding by choosing larger accesses.
> 
> Assuming we almost never want to decrease the step size all the way to
> a char, this will not always help us to deal with the problem with
> overlapping scalar accesses.

That's true, but if we do have scalar accesses then we can just reuse
those.  For the original example with a char and an int and three
bytes padding when we originally have a char access we'd have to
come up with something for the three bytes padding.

The case of fully contained sub-accesses is probably easy to handle
via BIT_FIELD_REFs but when one access overlaps two others things
get interesting ... (two BIT_FIELD_REFs plus one BIT_INSERT_EXPR
for the combination).  Code generation might also become awkward.

[Bug tree-optimization/92486] Wrong optimization: padding in structs is not copied even with memcpy

Reply via email to