12 Regression] Unoptimal code for __negdi2 (and others) from libgcc2 due to unwanted vectorization

rguenth at gcc dot gnu.org via Gcc-bugs Thu, 17 Feb 2022 23:27:01 -0800

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=104582


--- Comment #7 from Richard Biener <rguenth at gcc dot gnu.org> ---
(In reply to Jakub Jelinek from comment #5)
> The costs look weird:
> _1 1 times scalar_store costs 12 in body
> _5 1 times scalar_store costs 12 in body
> _1 1 times vector_store costs 12 in body
> <unknown> 1 times vec_construct costs 8 in prologue
> vec_construct is certainly more expensive than a store (especially in this
> case when it is a store into a TImode variable which isn't addressable and
> will not be in memory at all).

x86 can do cheap move low/hi so the construct isn't expensive.  Note
it only gets expensive in the end because the "memory" isn't really memory
and the return ABI isn't exposed.

Just as a wild idea, maybe we can pessimize vector stores into
!TREE_ADDRESSABLE automatic variables ...

We do already have some "weird" code in vect_model_store_cost employing
hard_function_value to deal with stores to RESULT_DECLs, but here 'w'
isn't a RESULT_DECL.  In the code we assume what happens happens, spill
of the vector and loads of the components.

What's missing in the CTOR cost is the move from GPR to XMM regs when
we are not dealing with FP or vector components (or direct memory
sources).  Getting that applied only for relevant cases isn't easy
since it requires looking at the defs.

One could try to amend the vect_model_store_cost handling by at the
beginning of the SLP pass analyze stmts from the function return,
marking decls we return a loaded value from in some way and handle
that in a similar way.

[Bug tree-optimization/104582] [11/12 Regression] Unoptimal code for __negdi2 (and others) from libgcc2 due to unwanted vectorization

Reply via email to