[Bug middle-end/115675] [15 Regression] truncv4hiv4qi affect r14-1402-gd8545fb2c71683's optimization.

rguenth at gcc dot gnu.org via Gcc-bugs Fri, 24 Jan 2025 01:52:35 -0800

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=115675


Richard Biener <rguenth at gcc dot gnu.org> changed:

           What    |Removed                     |Added
----------------------------------------------------------------------------
     Ever confirmed|0                           |1
   Last reconfirmed|                            |2025-01-24
             Status|UNCONFIRMED                 |NEW
             Blocks|                            |53947
                 CC|                            |rguenth at gcc dot gnu.org

--- Comment #4 from Richard Biener <rguenth at gcc dot gnu.org> ---
Confirmed.  Note the BB vectorizer simply cobbles up as much operations as it
can - as we now have truncv4hiv4qi we happily vectorize the hi->qi truncation,
but we fail to also cover the loads given half of the lanes have a shift.

We do have some heuristics that avoid operations on all "externs", but
only if it appears to be uniform.

  /* If we have all children of a child built up from uniform scalars
     or does more than one possibly expensive vector construction then
     just throw that away, causing it built up from scalars.
     The exception is the SLP node for the vector store.  */
...
      if (all_uniform_p
          || n_vector_builds > 1
          || (n_vector_builds == children.length ()
              && is_a <gphi *> (stmt_info->stmt)))
        {
          /* Roll back.  */

in this case !all_uniform_p and n_vector_builds == 1.  There's an exception
for a "copy", but I think scrapping all conversions would be bad
(like int<->float converts are OK to vectorize).

I think this is a case where SLP discovery should maybe have used

tem = load >> { 8, 0, 8, 0 };
tem2 = (vector(4) char) tem;
store = tem2;

thus the missed insert of neutral operand operations issue.  This is of
course still short of detecting the bswap.

I'll note that without vectorizing the bswap pass doesn't detect the bswap
either, so we do rely on the vector CTOR detection of the bswap pass it
seems.


Referenced Bugs:

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=53947
[Bug 53947] [meta-bug] vectorizer missed-optimizations

[Bug middle-end/115675] [15 Regression] truncv4hiv4qi affect r14-1402-gd8545fb2c71683's optimization.

Reply via email to