On 09/28/2016 05:41 AM, Richard Biener wrote:
Currently strided SLP vectorization creates vector constructors composed
of vector elements. This is a constructor form that is not handled
specially by the expander but it gets expanded via piecewise stores
to scratch memory and a load of that scratch memory.
Ugh. Yup, obviously bad, even without store forwarding.
does not work on any CPU I know of). The following patch simply avoids
the issue by making the vectorizer create integer loads, composing
a vector of that integers and then punning that to the desired vector
type. Thus (V4SF){V2SF, V2SF} becomes (V4SF)(V2DI){DI, DI} and
every body is happy. Especially x264 gets a 5-10% improvement
(dependent on vector size and x86 sub-architecture).
Seems reasonable to me -- there's not a lot of difference (conceptually)
to how we've used SImode constants to construct DFmode constants in the
past.
Handling the vector-vector constructors on the expander side would
require either similar punning or making vec_init parametric on
the element mode plus supporting vector elements in all targets
(which in the end probably will simply pun them similarly).
Bootstrap and regtest running on x86_64-unknown-linux-gnu.
Any comments?
Thanks,
Richard.
2016-09-28 Richard Biener <rguent...@suse.de>
* tree-vect-stmts.c (vectorizable_load): Avoid emitting vector
constructors with vector elements.
Seems quite reasonable.
jeff