https://gcc.gnu.org/bugzilla/show_bug.cgi?id=99728
Richard Biener <rguenth at gcc dot gnu.org> changed: What |Removed |Added ---------------------------------------------------------------------------- CC| |rguenth at gcc dot gnu.org Status|UNCONFIRMED |NEW Ever confirmed|0 |1 Component|c++ |tree-optimization Last reconfirmed| |2021-03-24 --- Comment #9 from Richard Biener <rguenth at gcc dot gnu.org> --- The issue for store-motion is that we see an aggregate copy: Unanalyzed memory reference 0: *d_28(D).lam1 = *d_28(D).lam2; __MEM <struct s0data_s> (d_28(D)).lam1 = __MEM <struct s0data_s> (d_28(D)).lam2; __MEM <struct s0data_s> (d_28(D)).lam2.v = _38; il_36 = il_74 + 1ul; there is another PR about those Unanalyzed refs preventing LIM/SM but then getting rid of those aggregate copies would be nice as well since many passes do not like them. I suppose 'vtype' in this case has a FP mode which prevents us from simplistic folding of this (unless we'd always expand those to FP load/store sequences). Indeed, we're copying type <record_type 0x7ffff437dbd0 Tvsimple sizes-gimplified needs-constructing cxx-odr-p type_1 type_5 type_6 V4DF size <integer_cst 0x7ffff658b228 constant 256> unit-size <integer_cst 0x7ffff658b318 constant 32> align:256 warn_if_not_align:0 symtab:0 alias-set 1 canonical-type 0x7ffff437dbd0 fields <function_decl 0x7ffff4391400 operator= type <method_type 0x7ffff43933f0> public external autoinline decl_3 QI t.C:3:8 align:16 warn_if_not_align:0 context <record_type 0x7ffff437dbd0 Tvsimple> full-name "constexpr Tvsimple& Tvsimple::operator=(Tvsimple&&) noexcept (<uninstantiated>)" not-really-extern chain <function_decl 0x7ffff4391300 operator=>> context <translation_unit_decl 0x7ffff6578168 t.C> full-name "struct Tvsimple" needs-constructor X() X(constX&) this=(X&) n_parents=0 use_template=0 interface-unknown pointer_to_this <pointer_type 0x7ffff437dd20> reference_to_this <reference_type 0x7ffff437d690> chain <type_decl 0x7ffff4a554c0 Tvsimple>> OK, so for a simple struct X { double x; }; void foo (struct X *x, struct X *y) { *x = *y; } we do generate x87 FP load/store insns and do not transfer bytes. Probably OK from a C language perspective but questionable on the GIMPLE side (we've been there before). So one thing we can experiment with is to gimplify those aggregate copies to register load/store when the aggregates have been assigned non-BLKmode by the target. This might of course confuse SRA which means that SRA itself might be a better place to perform this optimization. [mind struct { double; double; } on x86 gets TImode for example]