https://gcc.gnu.org/bugzilla/show_bug.cgi?id=118076

--- Comment #8 from Jakub Jelinek <jakub at gcc dot gnu.org> ---
In the RISCV case it is optimized because the copying of the structure into the
argument area is done using 4 DImode loads + stores rather than 2 TImode loads
+ stores.
And in that case it is actually cse1 which sees through the memory stores and
so for
s.x = x
s.y = y
s.z = z
s.w = w
t1 = s.x
t2 = s.y
t3 = s.z
t4 = s.w
arg[0] = t1
arg[1] = t2
arg[2] = t3
arg[3] = t4
replaces the 4 middle insns with t1 = x; t2 = y; t3 = z; t4 = w.
And then dse1 optimizes the first 4 insns away.

Reply via email to