On Thu, Jun 2, 2022 at 12:55 PM Roger Sayle <ro...@nextmovesoftware.com> wrote: > > > Hi Richard, > > + /* RTL expansion knows how to expand rotates using shift/or. */ if > > + (icode == CODE_FOR_nothing > > + && (code == LROTATE_EXPR || code == RROTATE_EXPR) > > + && optab_handler (ior_optab, vec_mode) != CODE_FOR_nothing > > + && optab_handler (ashl_optab, vec_mode) != CODE_FOR_nothing) > > + icode = (int) optab_handler (lshr_optab, vec_mode); > > > > but we then get the vector costing wrong. > > The issue is that we currently get the (relative) vector costing wrong. > Currently for gcc.dg/vect/pr98674.c, the vectorizer thinks the scalar > code requires two shifts and an ior, so believes its profitable to vectorize > this loop using two vector shifts and an vector ior. But once match.pd > simplifies the truncate and recognizes the HImode rotate we end up with: > > pr98674.c:6:16: note: ==> examining statement: _6 = _1 r>> 8; > pr98674.c:6:16: note: vect_is_simple_use: vectype vector(8) short int > pr98674.c:6:16: note: vect_is_simple_use: operand 8, type of def: constant > pr98674.c:6:16: missed: op not supported by target. > pr98674.c:8:33: missed: not vectorized: relevant stmt not supported: _6 = > _1 r>> 8; > pr98674.c:6:16: missed: bad operation or unsupported loop bound. > > > Clearly, it's a win to vectorize HImode rotates, when the backend can perform > 8 (or 16) rotations at a time, but using 3 vector instructions, even when a > scalar > rotate can performed in a single instruction. Fundamentally, vectorization > may > still be desirable/profitable even when the backend doesn't provide an optab.
Yes, as said it's tree-vect-patterns.cc job to handle this not natively supported rotate by re-writing it. Can you check why vect_recog_rotate_pattern does not do this? Ah, the code only handles !TYPE_UNSIGNED (type) - not sure why though (for rotates it should not matter and for the lowered sequence we can convert to desired signedness to get arithmetic/logical shifts)? > The current situation where the i386's backend provides expanders to lower > rotations (or vcond) into individual instruction sequences, also interferes > with > vector costing. It's the vector cost function that needs to be fixed, not > the > generated code made worse (or the backend bloated performing its own > RTL expansion workarounds). > > Is it instead ok to mark pr98674.c as XFAIL (a regression)? > The tweak to tree-vect-stmts.cc was based on the assumption that we wished > to continue vectorizing this loop. Improving scalar code generation really > shouldn't disable vectorization like this. Yes, see above where the fix needs to be. The pattern will then expose the shift and ior to the vectorizer which then are properly costed. Richard. > > > Cheers, > Roger > -- > >