Hi Richi, on 2020/3/10 下午7:14, Richard Biener wrote: > On Tue, Mar 10, 2020 at 12:12 PM Richard Biener > <richard.guent...@gmail.com> wrote: >> >> On Tue, Mar 10, 2020 at 7:52 AM Kewen.Lin <li...@linux.ibm.com> wrote: >>> >>> Hi all, >>> >>> I'm investigating whether GCC can vectorize the below case on ppc64le. >>> >>> extern void test(unsigned int t[4][4]); >>> >>> void foo(unsigned char *p1, int i1, unsigned char *p2, int i2) >>> { >>> unsigned int tmp[4][4]; >>> unsigned int a0, a1, a2, a3; >>> >>> for (int i = 0; i < 4; i++, p1 += i1, p2 += i2) { >>> a0 = (p1[0] - p2[0]) + ((p1[4] - p2[4]) << 16); >>> a1 = (p1[1] - p2[1]) + ((p1[5] - p2[5]) << 16); >>> a2 = (p1[2] - p2[2]) + ((p1[6] - p2[6]) << 16); >>> a3 = (p1[3] - p2[3]) + ((p1[7] - p2[7]) << 16); >>> >>> int t0 = a0 + a1; >>> int t1 = a0 - a1; >>> int t2 = a2 + a3; >>> int t3 = a2 - a3; >>> >>> tmp[i][0] = t0 + t2; >>> tmp[i][2] = t0 - t2; >>> tmp[i][1] = t1 + t3; >>> tmp[i][3] = t1 - t3; >>> } >>> test(tmp); >>> } >>> ... >>> From the above, the key thing is to group tmp[i][j] i=/0,1,2,3/ together, >>> eg: >>> tmp[i][0] i=/0,1,2,3/ (one group) >>> tmp[i][1] i=/0,1,2,3/ (one group) >>> tmp[i][2] i=/0,1,2,3/ (one group) >>> tmp[i][3] i=/0,1,2,3/ (one group) >>> >>> which tmp[i][j] group have the same isomorphic computations. But currently >>> SLP is unable to divide group like this way. (call it as A-way for now) >>> >>> It's understandable since it has better adjacent store groups like, >>> tmp[0][i] i=/0,1,2,3/ (one group) >>> tmp[1][i] i=/0,1,2,3/ (one group) >>> tmp[2][i] i=/0,1,2,3/ (one group) >>> tmp[3][i] i=/0,1,2,3/ (one group) > > Note this is how the non-SLP path will (try to) vectorize the loop. >
Oops, sorry for the confusion with poor writing, it's intended to show how the current SLP group those 16 stores tmp[i][j] i,j=/0,1,2,3/ with completely unrolled. I saw it split 16 stmts into 4 groups like this way finally. BR, Kewen