Hello, PR 52633 is caused by bad interaction between two different vectorizer pattern recognition passed. A minimal test case is:
void test (unsigned short *x, signed char *y) { int i; for (i = 0; i < 32; i++) x[i] = (short) (y[i] << 5); } built with "cc1 -O3 -march=armv7-a -mfpu=neon -mfloat-abi=softfp" on a arm-linux-gnueabi target. Before the vectorizer, we have something like: short unsigned int D.4976; int D.4975; int D.4974; signed char D.4973; signed char * D.4972; short unsigned int * D.4970; [snip] D.4973_8 = *D.4972_7; D.4974_9 = (int) D.4973_8; D.4975_10 = D.4974_9 << 5; D.4976_11 = (short unsigned int) D.4975_10; *D.4970_5 = D.4976_11; The pattern recognizer now goes through its list of patterns it tries to detect. The first successful one is vect_recog_over_widening_pattern. This will annotate the statements (via related_stmt fields): D.4973_8 = *D.4972_7; D.4974_9 = (int) D.4973_8; --> D.4992_26 = (signed short) D.4973_8 D.4975_10 = D.4974_9 << 5; --> patt.16_31 = D.4992_26 << 5 D.4976_11 = (short unsigned int) D.4975_10; --> D.4994_32 = (short unsigned int) patt.16_31 *D.4970_5 = D.4976_11; In the next step, vect_recog_widen_shift_pattern *also* matches, and creates a new annotation for the shift statement (using a widening shift): D.4973_8 = *D.4972_7; D.4974_9 = (int) D.4973_8; --> D.4992_26 = (signed short) D.4973_8 D.4975_10 = D.4974_9 << 5; [--> patt.16_31 = D.4992_26 << 5] --> patt.17_33 = D.4973_8 w<< 5 D.4976_11 = (short unsigned int) D.4975_10; --> D.4994_32 = (short unsigned int) patt.16_31 *D.4970_5 = D.4976_11; Since the original statement can only point to a single related_stmt, the statement setting patt.16_31 is now longer refered to as related_stmt by any other statement. This causes it to no longer be considered relevant for the vectorizer. However, the statement: --> D.4994_32 = (short unsigned int) patt.16_31 *is* still considered relevant. While analysing it, however, the vectorizer follows through to the def statement for patt.16_31, and gets quite confused to find that it doesn't have a vectype (because it wasn't considered by the vectorizer). The symptom is a failing assertion gcc_assert (*vectype != NULL_TREE); in vect_is_simple_use_1. Now, it seems quite unusual for multiple patterns to match for a single original statement. In fact, most pattern recognizers explicitly refuse to even consider statements that were already recognized. However, vect_recog_widen_shift_pattern makes an exception: /* This statement was also detected as over-widening operation (it can't be any other pattern, because only over-widening detects shifts). LAST_STMT is the final type demotion statement, but its related statement is shift. We analyze the related statement to catch cases: orig code: type a_t; itype res; TYPE a_T, res_T; S1 a_T = (TYPE) a_t; S2 res_T = a_T << CONST; S3 res = (itype)res_T; (size of type * 2 <= size of itype and size of itype * 2 <= size of TYPE) code after over-widening pattern detection: S1 a_T = (TYPE) a_t; --> a_it = (itype) a_t; S2 res_T = a_T << CONST; S3 res = (itype)res_T; <--- LAST_STMT --> res = a_it << CONST; after widen_shift: S1 a_T = (TYPE) a_t; --> a_it = (itype) a_t; - redundant S2 res_T = a_T << CONST; S3 res = (itype)res_T; --> res = a_t w<< CONST; i.e., we replace the three statements with res = a_t w<< CONST. */ If everything were indeed as described in that comment, things would work out fine. However, what is described above as "code after over-widening pattern detection" is only one of two possible outcomes of that latter pattern; the other is the one that happens in the current test case, where we still have a final type conversion left after applying the over-widening pattern. I guess one could try to distiguish the two cases somehow and handle both; but the overall approach seems quite fragile to me; it doesn't look really maintainable to have to rely on so many details of the operation of one particular pattern detection function while writing another one, or else risk creating subtle problems like the one described above. So I was wondering why vect_recog_widen_shift_pattern tries to take advantage of an already recognized over-widening pattern. But indeed, if it does not, it will generate less efficient code in cases like the above test case: by itself vect_recog_widen_shift_pattern, would generate code to expand the char to a short, then do a widening shift resulting in an int, followed by narrowing back down to a short. However, even so, it might actually be preferable to just handle such cases within vect_recog_widen_shift_pattern itself. Indeed, the routine already looks for another subsequent type cast, in order to handle unsigned shift variants. Maybe it simply ought to always look for another cast, and detect over-widening situations itself? Does this look reasonable? Any comments or suggestions appreciated! Thanks, Ulrich -- Dr. Ulrich Weigand GNU Toolchain for Linux on System z and Cell BE ulrich.weig...@de.ibm.com