[Bug tree-optimization/70251] Wrong code with -O3 -march=skylake-avx512.

rguenther at suse dot de Mon, 21 Mar 2016 01:38:48 -0700

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=70251


--- Comment #10 from rguenther at suse dot de <rguenther at suse dot de> ---
On Sat, 19 Mar 2016, glisse at gcc dot gnu.org wrote:

> https://gcc.gnu.org/bugzilla/show_bug.cgi?id=70251
> 
> --- Comment #9 from Marc Glisse <glisse at gcc dot gnu.org> ---
> I don't find the current situation very consistent. We seem to consider that 
> in
> gimple, the output of a vector comparison can only be used directly in a
> vec_cond_expr (possibly also {and,ior,xor,bit_not}_expr?), except in the very
> specific case where we apply this transformation a+(b?1:0) -> a-VCE(b), where
> we only have a view_convert_expr.

True.  OTOH VIEW_CONVERT_EXPR is quite a special beast.

> Unless we start transforming x?-1:0 into
> view_convert_expr(x) under the same condition (VECTOR_MODE_P(...)), it doesn't
> make sense to me.

I think that would be a valid transform.

> I am also not convinced that VECTOR_MODE_P is the right test.
> If you are using a long vector (say of 2048 bits), the mode will never be a
> vector mode, so at best the transformation is delayed until after vector
> lowering.

True, I was also thinking that a TYPE_SIZE check would be better to
verify the validity of the V_C_E.

> By the way, nothing seems to combine the following into a single
> statement:
>   c_3 = VEC_COND_EXPR <x_1(D) < y_2(D), { -1, -1, -1, -1, -1, -1, -1, -1, -1,
> -1, -1, -1, -1, -1, -1, -1, -1, -1, -1, -1, -1, -1, -1, -1, -1, -1, -1, -1, 
> -1,
> -1, -1, -1 }, { 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0,
> 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0 }>;
>   _4 = VEC_COND_EXPR <c_3 != { 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0,
> 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0 }, { 1, 1, 1, 1, 1, 1, 1, 1, 1,
> 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1 }, { 0, 0,
> 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0,
> 0, 0, 0, 0 }>;
> (it does get combined in forwprop4 after the vector operation is lowered to
> scalar operations (yes, we are missing the code to use sub-vectors here))

This means we are missing a simplifier for

(for cnd (cond vec_cond)
 (for cmp (ne eq)
  (cmp (cnd @0 CONSTANT_CLASS_P@1 CONSTANT_CLASS_P@2)
   ...

possibly also for the scalar COND_EXPR case.

> From a code generation point of view:
> 
> typedef int vec __attribute__((vector_size(32)));
> vec f(vec x,vec y,vec z){
>   vec zero={};
>   vec one=zero+1;
>   vec c=x<y;
>   return z+(c?one:zero);
> }
> 
> -mavx2 generates
> 
>         vpcmpgtd        %ymm0, %ymm1, %ymm1
>         vpsubd  %ymm1, %ymm2, %ymm0
> 
> while -march=skylake-avx512 gives the inferior
> 
>         vpcmpgtd        %ymm0, %ymm1, %ymm1
>         vpandd  .LC0(%rip), %ymm1, %ymm0
>         vpaddd  %ymm2, %ymm0, %ymm0
> 
> 
> 
> (In reply to rguent...@suse.de from comment #5)
> > On Wed, 16 Mar 2016, glisse at gcc dot gnu.org wrote:
> > > A + (B vcmp C ? 1 : 0)  ->  A - (B vcmp C ? -1 : 0)
> > I think that would be an odd transform.
> 
> As far as I understand, since the bool vector changes, this is currently the
> proper gimple syntax for this transformation, with the added bonus that it
> doesn't need to be disabled for avx512.

Sure, but nowhere else we transform a + into a - operation(?)  This
is what I mean with "odd transform".  We'd get this motivated only
by (B vcmp C ? -1 : 0) being "canonical" for target expansion, right?

> (I got "incorrect sharing of tree nodes" on the first argument of 
> vec_cond_expr
> when I wanted to try it out, the match-simplify machinery should probably do
> the unshare_expr automatically, especially since gimplify now wants to keep 
> the
> comparison inside the vec_cond_expr)

Ah, yes, that's sth we need to fix.  Care to share the pattern/testcase
that triggers this?  The place to fix seems a bit non-obvious.

Richard.

[Bug tree-optimization/70251] Wrong code with -O3 -march=skylake-avx512.

Reply via email to