https://gcc.gnu.org/bugzilla/show_bug.cgi?id=103592
--- Comment #2 from Richard Biener <rguenth at gcc dot gnu.org> ---
23.13% 44783 a.out.vect a.out.vect [.]
__perdida_m_MOD_generalized_hookes_law.constprop.0.isra.0#
2.40% 4641 a.out.vect a.out.vect [.]
__perdida_m_MOD_generalized_hookes_law.constprop.1.isra.0#
2.37% 4613 a.out.novect a.out.novect [.]
__perdida_m_MOD_generalized_hookes_law.constprop.0.isra.0#
1.23% 2383 a.out.vect libc-2.31.so [.]
__memset_avx2_unaligned_erms #
0.35% 676 a.out.vect libc-2.31.so [.]
__memset_avx2_unaligned #
0.20% 394 a.out.novect a.out.novect [.]
__perdida_m_MOD_generalized_hookes_law.constprop.1.isra.0
we end up doing loop vectorization with a lot of invariants built up from
scalars but only a known single vector iteration. We also have a local
array that's only elided after vectorization causing final stores to
require vector extracts.
I think this is the usual case of vectorization constraining OOO execution
in the face of the code being limited by load & store.
We also fail to elide generalized_constitutive_tensor - FRE can do this
in priciple - there's a duplicate PR for this and the situation is like
generalized_constitutive_tensor = {};
...
generalized_constitutive_tensor[0] = _19;
generalized_constitutive_tensor[1] = ISRA.833_76(D);
generalized_constitutive_tensor[2] = ISRA.833_76(D);
...
vect__14.843_125 = MEM <vector(4) real(kind=8)> [(real(kind=8)
*)&generalized_constitutive_tensor];
where FRE could create a { _19, ISRA.833_76(D), ISRA.833_76(D), 0. }
vector CTOR but that's only profitable if the stores go away. I have
a patch to do that (w/o the costing).
Note in the not vectorized case we are able to elide
generalized_constitutive_tensor and also CSE a lot of the computations
because the tensor only has 4 distinct values (and some are even zero).
So it's really a very special case ...