https://gcc.gnu.org/bugzilla/show_bug.cgi?id=118019

--- Comment #10 from Robin Dapp <rdapp at gcc dot gnu.org> ---
Ah I see - the actual vector code isn't even that bad and the vec_constructs
aren't either.  The problem is rather that we have slow unaligned (scalar)
access with the default tune model.  Thus we need to load 8 individual uint8s
to actually load one long - of course the vec_init costs underestimate what's
really happening then.

If we enable fast unaligned scalar access we chose a different vectorization
scheme so the issue above is not relevant anymore...

Another issue is that we use the wrong vectype for costing the vec_construct. 
Will prepare a patch for that.

Reply via email to