https://gcc.gnu.org/bugzilla/show_bug.cgi?id=103554
Richard Biener <rguenth at gcc dot gnu.org> changed: What |Removed |Added ---------------------------------------------------------------------------- CC| |crazylht at gmail dot com, | |rguenth at gcc dot gnu.org Status|UNCONFIRMED |NEW Last reconfirmed| |2021-12-06 Blocks| |53947 Ever confirmed|0 |1 Target| |x86_64-*-* --- Comment #2 from Richard Biener <rguenth at gcc dot gnu.org> --- The issue is t.ii:8:11: note: ==> examining statement: _1 = in.d; t.ii:8:11: missed: BB vectorization with gaps at the end of a load is not supported t.ii:8:16: missed: not vectorized: relevant stmt not supported: _1 = in.d; t.ii:8:11: note: Building vector operands of 0x447c768 from scalars instead when trying to vectorize this with V4DI. We don't realize that with a visible decl we can load the gap. Indeed I think I've seen this before as well. Note with SSE the same issue is present but we create the V2DI vectors in a more optimal way from scalars. With the above issue fixed we'd instead use two V4DI unaligned vector moves from the stack and a shuffle. The locally optimal solution would be two unaligned V2DI loads and either two V2DI stores or a V4DI merge and store. _Note_ that likely the suboptimal solution presented here is faster because it avoids STLF penalties from the calls stack setup which very likely uses scalar or differently aligned vector moves. Note the x86 backend costs the SSE variant t.ii:8:11: note: Cost model analysis for part in loop 0: Vector cost: 40 Scalar cost: 48 and the AVX variant t.ii:8:11: note: Cost model analysis for part in loop 0: Vector cost: 48 Scalar cost: 48 but the x86 backend chooses to not let the vectorizer compare costs with different vector sizes but instead asks it to pick the first working solution from the vector of modes to consider (and in that order). We might want to reconsider that (maybe at least for BB vectorization and maybe with some extra special mode?). Referenced Bugs: https://gcc.gnu.org/bugzilla/show_bug.cgi?id=53947 [Bug 53947] [meta-bug] vectorizer missed-optimizations