https://gcc.gnu.org/bugzilla/show_bug.cgi?id=92486
--- Comment #14 from rguenther at suse dot de <rguenther at suse dot de> --- On Fri, 15 Nov 2019, jamborm at gcc dot gnu.org wrote: > https://gcc.gnu.org/bugzilla/show_bug.cgi?id=92486 > > --- Comment #13 from Martin Jambor <jamborm at gcc dot gnu.org> --- > (In reply to rguent...@suse.de from comment #10) > > [...] But total scalarization works with the premise > > that we don't see any direct accesses to source or destination > > That is not true, total scalarization just adds special artificial > accesses to the aggregate to its data structures and hopes they will > blend in nicely with whatever real accesses are already there, so > that, for example, val can be propagated in: > > a.f = val; > d = c = b = a; > use (d.f); OK, but these cases are handled by FRE just fine. What is missing elsewhere is the plain a = c = d; where SRA manages to get rid of 'c' by total scalarization (so the "SRA does aggregate copy prop" thing). But yes, for your example we'd copy-prop out c and b which might then have created "mismatching" accesses. Btw, I had the impression that we propagate accesses from 'a' to b, c and d in the above case, if we do we only have to add "fake" accesses to the remaining parts? > This is a problem because SRA does not like scalar accesses within > scalar accesses (I originally wanted to supported it but then backed > out almost immediately) so if f did not happen to be exactly the size > of the copy step, SRA would give up on the aggregate. This is a > limitation which we'd have to lift first, I'm afraid. > > > so > > I think we should simply change "total scalarization" to be > > "emit the block-copy on GIMPLE". Preferably without "crossing" > > field boundaries but covering padding by choosing larger accesses. > > Assuming we almost never want to decrease the step size all the way to > a char, this will not always help us to deal with the problem with > overlapping scalar accesses. That's true, but if we do have scalar accesses then we can just reuse those. For the original example with a char and an int and three bytes padding when we originally have a char access we'd have to come up with something for the three bytes padding. The case of fully contained sub-accesses is probably easy to handle via BIT_FIELD_REFs but when one access overlaps two others things get interesting ... (two BIT_FIELD_REFs plus one BIT_INSERT_EXPR for the combination). Code generation might also become awkward.