[rfc] PR tree-optimization/52633 - ICE due to vectorizer pattern detection collision

Ulrich Weigand Tue, 24 Apr 2012 09:51:33 -0700

Hello,

PR 52633 is caused by bad interaction between two different vectorizer
pattern recognition passed.  A minimal test case is:


void
test (unsigned short *x, signed char *y)
{
  int i;
  for (i = 0; i < 32; i++)
    x[i] = (short) (y[i] << 5);
}

built with "cc1 -O3 -march=armv7-a -mfpu=neon -mfloat-abi=softfp"
on a arm-linux-gnueabi target.

Before the vectorizer, we have something like:

  short unsigned int D.4976;
  int D.4975;
  int D.4974;
  signed char D.4973;
  signed char * D.4972;
  short unsigned int * D.4970;
[snip]
  D.4973_8 = *D.4972_7;
  D.4974_9 = (int) D.4973_8;
  D.4975_10 = D.4974_9 << 5;
  D.4976_11 = (short unsigned int) D.4975_10;
  *D.4970_5 = D.4976_11;


The pattern recognizer now goes through its list of patterns it tries
to detect.  The first successful one is vect_recog_over_widening_pattern.
This will annotate the statements (via related_stmt fields):

  D.4973_8 = *D.4972_7;
  D.4974_9 = (int) D.4973_8;
    --> D.4992_26 = (signed short) D.4973_8
  D.4975_10 = D.4974_9 << 5;
    --> patt.16_31 = D.4992_26 << 5
  D.4976_11 = (short unsigned int) D.4975_10;
    --> D.4994_32 = (short unsigned int) patt.16_31
  *D.4970_5 = D.4976_11;


In the next step, vect_recog_widen_shift_pattern *also* matches, and
creates a new annotation for the shift statement (using a widening
shift):

  D.4973_8 = *D.4972_7;
  D.4974_9 = (int) D.4973_8;
    --> D.4992_26 = (signed short) D.4973_8
  D.4975_10 = D.4974_9 << 5;
   [--> patt.16_31 = D.4992_26 << 5]
    --> patt.17_33 = D.4973_8 w<< 5
  D.4976_11 = (short unsigned int) D.4975_10;
    --> D.4994_32 = (short unsigned int) patt.16_31
  *D.4970_5 = D.4976_11;


Since the original statement can only point to a single related_stmt,
the statement setting patt.16_31 is now longer refered to as related_stmt
by any other statement.  This causes it to no longer be considered
relevant for the vectorizer.

However, the statement:
    --> D.4994_32 = (short unsigned int) patt.16_31 
*is* still considered relevant.  While analysing it, however, the
vectorizer follows through to the def statement for patt.16_31,
and gets quite confused to find that it doesn't have a vectype
(because it wasn't considered by the vectorizer).  The symptom
is a failing assertion
      gcc_assert (*vectype != NULL_TREE);
in vect_is_simple_use_1.


Now, it seems quite unusual for multiple patterns to match for a
single original statement.  In fact, most pattern recognizers
explicitly refuse to even consider statements that were already
recognized.  However, vect_recog_widen_shift_pattern makes an
exception:

      /* This statement was also detected as over-widening operation (it can't
         be any other pattern, because only over-widening detects shifts).
         LAST_STMT is the final type demotion statement, but its related
         statement is shift.  We analyze the related statement to catch cases:

         orig code:
          type a_t;
          itype res;
          TYPE a_T, res_T;

          S1 a_T = (TYPE) a_t;
          S2 res_T = a_T << CONST;
          S3 res = (itype)res_T;

          (size of type * 2 <= size of itype
           and size of itype * 2 <= size of TYPE)

         code after over-widening pattern detection:

          S1 a_T = (TYPE) a_t;
               --> a_it = (itype) a_t;
          S2 res_T = a_T << CONST;
          S3 res = (itype)res_T;  <--- LAST_STMT
               --> res = a_it << CONST;

         after widen_shift:

          S1 a_T = (TYPE) a_t;
               --> a_it = (itype) a_t; - redundant
          S2 res_T = a_T << CONST;
          S3 res = (itype)res_T;
               --> res = a_t w<< CONST;

      i.e., we replace the three statements with res = a_t w<< CONST.  */


If everything were indeed as described in that comment, things would work out
fine.  However, what is described above as "code after over-widening pattern
detection" is only one of two possible outcomes of that latter pattern; the
other is the one that happens in the current test case, where we still have
a final type conversion left after applying the over-widening pattern.


I guess one could try to distiguish the two cases somehow and handle both;
but the overall approach seems quite fragile to me; it doesn't look really
maintainable to have to rely on so many details of the operation of one
particular pattern detection function while writing another one, or else
risk creating subtle problems like the one described above.

So I was wondering why vect_recog_widen_shift_pattern tries to take advantage
of an already recognized over-widening pattern.  But indeed, if it does not,
it will generate less efficient code in cases like the above test case: by
itself vect_recog_widen_shift_pattern, would generate code to expand the
char to a short, then do a widening shift resulting in an int, followed
by narrowing back down to a short.


However, even so, it might actually be preferable to just handle such
cases within vect_recog_widen_shift_pattern itself.  Indeed, the routine
already looks for another subsequent type cast, in order to handle
unsigned shift variants.  Maybe it simply ought to always look for
another cast, and detect over-widening situations itself?


Does this look reasonable?  Any comments or suggestions appreciated!

Thanks,
Ulrich

-- 
  Dr. Ulrich Weigand
  GNU Toolchain for Linux on System z and Cell BE
  ulrich.weig...@de.ibm.com

[rfc] PR tree-optimization/52633 - ICE due to vectorizer pattern detection collision

Reply via email to