https://gcc.gnu.org/bugzilla/show_bug.cgi?id=99728

Richard Biener <rguenth at gcc dot gnu.org> changed:

           What    |Removed                     |Added
----------------------------------------------------------------------------
                 CC|                            |rguenth at gcc dot gnu.org
             Status|UNCONFIRMED                 |NEW
     Ever confirmed|0                           |1
          Component|c++                         |tree-optimization
   Last reconfirmed|                            |2021-03-24

--- Comment #9 from Richard Biener <rguenth at gcc dot gnu.org> ---
The issue for store-motion is that we see an aggregate copy:

Unanalyzed memory reference 0: *d_28(D).lam1 = *d_28(D).lam2;

  __MEM <struct s0data_s> (d_28(D)).lam1 = __MEM <struct s0data_s>
(d_28(D)).lam2;
  __MEM <struct s0data_s> (d_28(D)).lam2.v = _38;
  il_36 = il_74 + 1ul;

there is another PR about those Unanalyzed refs preventing LIM/SM but then
getting rid of those aggregate copies would be nice as well since many
passes do not like them.  I suppose 'vtype' in this case has a FP mode
which prevents us from simplistic folding of this (unless we'd always
expand those to FP load/store sequences).

Indeed, we're copying

    type <record_type 0x7ffff437dbd0 Tvsimple sizes-gimplified
needs-constructing cxx-odr-p type_1 type_5 type_6 V4DF
        size <integer_cst 0x7ffff658b228 constant 256>
        unit-size <integer_cst 0x7ffff658b318 constant 32>
        align:256 warn_if_not_align:0 symtab:0 alias-set 1 canonical-type
0x7ffff437dbd0
        fields <function_decl 0x7ffff4391400 operator= type <method_type
0x7ffff43933f0>
            public external autoinline decl_3 QI t.C:3:8 align:16
warn_if_not_align:0 context <record_type 0x7ffff437dbd0 Tvsimple>
            full-name "constexpr Tvsimple& Tvsimple::operator=(Tvsimple&&)
noexcept (<uninstantiated>)"
            not-really-extern chain <function_decl 0x7ffff4391300 operator=>>
context <translation_unit_decl 0x7ffff6578168 t.C>
        full-name "struct Tvsimple"
        needs-constructor X() X(constX&) this=(X&) n_parents=0 use_template=0
interface-unknown
        pointer_to_this <pointer_type 0x7ffff437dd20> reference_to_this
<reference_type 0x7ffff437d690> chain <type_decl 0x7ffff4a554c0 Tvsimple>>

OK, so for a simple

struct X { double x; };

void foo (struct X *x, struct X *y)
{
  *x = *y;
}

we do generate x87 FP load/store insns and do not transfer bytes.  Probably
OK from a C language perspective but questionable on the GIMPLE side
(we've been there before).

So one thing we can experiment with is to gimplify those aggregate
copies to register load/store when the aggregates have been assigned
non-BLKmode by the target.  This might of course confuse SRA which
means that SRA itself might be a better place to perform this
optimization.  [mind struct { double; double; } on x86 gets TImode
for example]

Reply via email to