On Thu, 30 Apr 2015, Jeff Law wrote:
On 04/30/2015 01:17 AM, Marc Glisse wrote:
+/* This is another case of narrowing, specifically when there's an outer
+ BIT_AND_EXPR which masks off bits outside the type of the innermost
+ operands. Like the previous case we have to convert the operands
+ to unsigned types to avoid introducing undefined behaviour for the
+ arithmetic operation. */
+(for op (minus plus)
No mult? or widen_mult with a different pattern? (maybe that's already
done elsewhere)
No mult. When I worked on the pattern for 47477, supporting mult clearly
regressed the generated code -- presumably because we can often widen the
operands for free.
It would help with the testcase below, but I am willing to accept that the
cases where it hurts are more common (and guessing if it will help or hurt
may be hard), while with +- the cases that help are more common.
void f(short*a) {
a = __builtin_assume_aligned(a,128);
for (int i = 0; i < (1<<22); ++i) {
#ifdef EASY
a[i] *= a[i];
#else
int x = a[i];
x *= x;
a[i] = x;
#endif
}
}
With EASY, a nice little loop:
.L2:
movdqa (%rdi), %xmm0
addq $16, %rdi
pmullw %xmm0, %xmm0
movaps %xmm0, -16(%rdi)
cmpq %rdi, %rax
jne .L2
while without EASY, we get the uglier:
.L2:
movdqa (%rdi), %xmm0
addq $16, %rdi
movdqa %xmm0, %xmm2
movdqa %xmm0, %xmm1
pmullw %xmm0, %xmm2
pmulhw %xmm0, %xmm1
movdqa %xmm2, %xmm0
punpckhwd %xmm1, %xmm2
punpcklwd %xmm1, %xmm0
movdqa %xmm2, %xmm1
movdqa %xmm0, %xmm2
punpcklwd %xmm1, %xmm0
punpckhwd %xmm1, %xmm2
movdqa %xmm0, %xmm1
punpcklwd %xmm2, %xmm0
punpckhwd %xmm2, %xmm1
punpcklwd %xmm1, %xmm0
movaps %xmm0, -16(%rdi)
cmpq %rdi, %rax
jne .L2
A small pattern like
(simplify
(vec_pack_trunc (widen_mult_lo @0 @1) (widen_mult_hi:c @0 @1))
(mult @0 @1))
probably with some tweaks (convert to unsigned? only do it before vector
lowering?), would fix this particular case, but not as well as narrowing
before vectorization.
--
Marc Glisse