https://gcc.gnu.org/bugzilla/show_bug.cgi?id=99728
--- Comment #11 from rguenther at suse dot de <rguenther at suse dot de> --- On Wed, 24 Mar 2021, kretz at kde dot org wrote: > https://gcc.gnu.org/bugzilla/show_bug.cgi?id=99728 > > --- Comment #10 from Matthias Kretz (Vir) <kretz at kde dot org> --- > Is this the same issue: > > struct A { > double v; > }; > > struct B { > double v; > B& operator=(const B& rhs) { > v = rhs.v; > return *this; > } > }; > > // 10 loads & stores > void f(A& a, const A& b) { > for (int i = 0; i < 10; ++i) a = b; > } > > // 1 load & store > void f(B& a, const B& b) { > for (int i = 0; i < 10; ++i) a = b; > } > > I.e. by turning the aggregate assignment into an explicit assignment of the > fundamental (including vector) type, the issue can be worked around. Yes, that's a good cut-down of the issue at hand. Using memcpy () for the assignment also tends to work since we inline that using a load/store sequence if we can. Note this might also point at a solution in the C++ FE and it's default-generated copy/assign operations to copy the ultimate single element there rather than the containing aggregate. (one could argue that this would be premature optimization of course)