Hi Hongtao,
Many thanks for reviewing the x86_64 pieces.
> if (negate)
> - cmp = ix86_expand_int_sse_cmp (operands[0], EQ, cmp,
> - CONST0_RTX (GET_MODE (cmp)),
> - NULL, NULL, &negate);
> -
> - gcc_assert (!negate);
> + {
> + if (TARGET_AVX512F && GET_MODE_SIZE (GET_MODE (cmp)) >= 16)
> + cmp = gen_rtx_XOR (GET_MODE (cmp), cmp, CONSTM1_RTX
> (GET_MODE (cmp)));
> + else
> + {
> + cmp = ix86_expand_int_sse_cmp (operands[0], EQ, cmp,
> + CONST0_RTX (GET_MODE (cmp)),
> + NULL, NULL, &negate);
> + gcc_assert (!negate);
> + }
> + }
>
> Technically it's correct, however, in actual scenarios, avx512 (x86-64-v4)
will enter
> ix86_expand_mask_vec_cmp, so this optimization appears to only target the
> scenario of avx512f + no-avx512vl + VL == 16/32, which doesn't sound
particularly
> useful.
The mistake in this reasoning is that this function is entered in actual
scenarios.
Consider:
typedef char v32qi __attribute__((vector_size(16)));
v32qi x, y, m;
void bar() { m = x != y; }
which when compiled with -O2 -mavx512vl on mainline currently generates:
foo: vmovdqa x(%rip), %xmm0
vpxor %xmm1, %xmm1, %xmm1
vpcmpeqb y(%rip), %xmm0, %xmm0
vpcmpeqb %xmm1, %xmm0, %xmm0
vmovdqa %xmm0, m(%rip)
ret
which uses vpxor and vpcmpeqb to invert the mask.
with the proposed chunk above, we instead generate:
foo: vmovdqa x(%rip), %xmm0
vpcmpeqb y(%rip), %xmm0, %xmm0
vpternlogd $0x55, %xmm0, %xmm0, %xmm0
vmovdqa %xmm0, m(%rip)
ret
Not only is this one less instruction, and shorter in bytes,
but the not/xor/ternlog can be fused by combine with any
following binary logic, where unfortunately the vpcmpeqb
against zero can't (easily) be.
The Bugzilla PR concerns x86_64 using vpcmpeqb to
negate masks when it shouldn't be; the example above
is exactly the sort of case that it was complaining about.
I was hoping the above not/xor/ternlog and a following
blend or pand-pandn-por could eventually be fused into
a single ternlog instruction, i.e. with ternlog the RTL
optimizers (combine) can potentially swap operands of
VCOND_MASK without requiring the middle-end's help.
Thanks (again) in advance,
Roger
--