Hi Richard,

Here is updated patch for middle-end part of the whole patch which
fixes all your remarks I hope.

Regression testing and bootstrapping did not show any new failures.
Is it OK for trunk?

Yuri.

ChangeLog:
2015-12-18  Yuri Rumyantsev  <ysrum...@gmail.com>

PR middle-end/68542
* fold-const.c (fold_binary_op_with_conditional_arg): Bail out for case
of mixind vector and scalar types.
(fold_relational_const): Add handling of vector
comparison with boolean result.
* tree-cfg.c (verify_gimple_comparison): Add argument CODE, allow
comparison of vector operands with boolean result for EQ/NE only.
(verify_gimple_assign_binary): Adjust call for verify_gimple_comparison.
(verify_gimple_cond): Likewise.
* tree-ssa-forwprop.c (combine_cond_expr_cond): Do not perform
combining for non-compatible vector types.
* tree-vrp.c (register_edge_assert_for): VRP does not track ranges for
vector types.

2015-12-16 16:37 GMT+03:00 Richard Biener <richard.guent...@gmail.com>:
> On Fri, Dec 11, 2015 at 3:03 PM, Yuri Rumyantsev <ysrum...@gmail.com> wrote:
>> Richard.
>> Thanks for your review.
>> I re-designed fix for assert by adding additional checks for vector
>> comparison with boolean result to fold_binary_op_with_conditional_arg
>> and remove early exit to combine_cond_expr_cond.
>> Unfortunately, I am not able to provide you with test-case since it is
>> in my second patch related to back-end patch which I sent earlier
>> (12-08).
>>
>> Bootstrapping and regression testing did not show any new failures.
>> Is it OK for trunk?
>
> +  else if (TREE_CODE (type) == VECTOR_TYPE)
>      {
>        tree testtype = TREE_TYPE (cond);
>        test = cond;
>        true_value = constant_boolean_node (true, testtype);
>        false_value = constant_boolean_node (false, testtype);
>      }
> +  else
> +    {
> +      test = cond;
> +      cond_type = type;
> +      true_value = boolean_true_node;
> +      false_value = boolean_false_node;
> +    }
>
> So this is, say, vec1 != vec2 with scalar vs. vector result.  If we have
> scalar result and thus, say, scalar + vec1 != vec2.  I believe rather
> than doing the above (not seeing how this not would generate wrong
> code eventually) we should simply detect the case of mixing vector
> and scalar types and bail out.  At least without some comments
> your patch makes the function even more difficult to understand than
> it is already.
>
> @@ -3448,10 +3448,17 @@ verify_gimple_comparison (tree type, tree op0, tree 
> op1)
>        if (TREE_CODE (op0_type) == VECTOR_TYPE
>           || TREE_CODE (op1_type) == VECTOR_TYPE)
>          {
> -          error ("vector comparison returning a boolean");
> -          debug_generic_expr (op0_type);
> -          debug_generic_expr (op1_type);
> -          return true;
> +         /* Allow vector comparison returning boolean if operand types
> +            are boolean or integral and CODE is EQ/NE.  */
> +         if (code != EQ_EXPR && code != NE_EXPR
> +             && !VECTOR_BOOLEAN_TYPE_P (op0_type)
> +             && !VECTOR_INTEGER_TYPE_P (op0_type))
> +           {
> +             error ("type mismatch for vector comparison returning a 
> boolean");
> +             debug_generic_expr (op0_type);
> +             debug_generic_expr (op1_type);
> +             return true;
> +           }
>          }
>      }
>    /* Or a boolean vector type with the same element count
>
> as said before please merge the cascaded if()s.  Better wording for
> the error is "unsupported operation or type for vector comparison
> returning a boolean"
>
> Otherwise the patch looks sensible to me though it shows that overloading of
> EQ/NE_EXPR for scalar result and vector operands might have some more 
> unexpected
> fallout (which is why I originally prefered the view-convert to large
> integer type variant).
>
> Thanks,
> Richard.
>
>
>> ChangeLog:
>> 2015-12-11  Yuri Rumyantsev  <ysrum...@gmail.com>
>>
>> PR middle-end/68542
>> * fold-const.c (fold_binary_op_with_conditional_arg): Add checks oh
>> vector comparison with boolean result to avoid ICE.
>> (fold_relational_const): Add handling of vector
>> comparison with boolean result.
>> * tree-cfg.c (verify_gimple_comparison): Add argument CODE, allow
>> comparison of vector operands with boolean result for EQ/NE only.
>> (verify_gimple_assign_binary): Adjust call for verify_gimple_comparison.
>> (verify_gimple_cond): Likewise.
>> * tree-ssa-forwprop.c (combine_cond_expr_cond): Do not perform
>> combining for non-compatible vector types.
>> * tree-vrp.c (register_edge_assert_for): VRP does not track ranges for
>> vector types.
>>
>> 2015-12-10 16:36 GMT+03:00 Richard Biener <richard.guent...@gmail.com>:
>>> On Fri, Dec 4, 2015 at 4:07 PM, Yuri Rumyantsev <ysrum...@gmail.com> wrote:
>>>> Hi Richard.
>>>>
>>>> Thanks a lot for your review.
>>>> Below are my answers.
>>>>
>>>> You asked why I inserted additional check to
>>>> ++ b/gcc/tree-ssa-forwprop.c
>>>> @@ -373,6 +373,11 @@ combine_cond_expr_cond (gimple *stmt, enum
>>>> tree_code code, tree type,
>>>>
>>>>    gcc_assert (TREE_CODE_CLASS (code) == tcc_comparison);
>>>>
>>>> +  /* Do not perform combining it types are not compatible.  */
>>>> +  if (TREE_CODE (TREE_TYPE (op0)) == VECTOR_TYPE
>>>> +      && !tree_int_cst_equal (TYPE_SIZE (type), TYPE_SIZE (TREE_TYPE 
>>>> (op0))))
>>>> +    return NULL_TREE;
>>>> +
>>>>
>>>> again, how does this happen?
>>>>
>>>> This is because without it I've got assert in fold_convert_loc
>>>>       gcc_assert (TREE_CODE (orig) == VECTOR_TYPE
>>>>  && tree_int_cst_equal (TYPE_SIZE (type), TYPE_SIZE (orig)));
>>>>
>>>> since it tries to convert vector of bool to scalar bool.
>>>> Here is essential part of call-stack:
>>>>
>>>> #0  internal_error (gmsgid=0x1e48397 "in %s, at %s:%d")
>>>>     at ../../gcc/diagnostic.c:1259
>>>> #1  0x0000000001743ada in fancy_abort (
>>>>     file=0x1847fc3 "../../gcc/fold-const.c", line=2217,
>>>>     function=0x184b9d0 <fold_convert_loc(unsigned int, tree_node*,
>>>> tree_node*)::__FUNCTION__> "fold_convert_loc") at
>>>> ../../gcc/diagnostic.c:1332
>>>> #2  0x00000000009c8330 in fold_convert_loc (loc=0, type=0x7ffff18a9d20,
>>>>     arg=0x7ffff1a7f488) at ../../gcc/fold-const.c:2216
>>>> #3  0x00000000009f003f in fold_ternary_loc (loc=0, code=VEC_COND_EXPR,
>>>>     type=0x7ffff18a9d20, op0=0x7ffff1a7f460, op1=0x7ffff18c2000,
>>>>     op2=0x7ffff18c2030) at ../../gcc/fold-const.c:11453
>>>> #4  0x00000000009f2f94 in fold_build3_stat_loc (loc=0, code=VEC_COND_EXPR,
>>>>     type=0x7ffff18a9d20, op0=0x7ffff1a7f460, op1=0x7ffff18c2000,
>>>>     op2=0x7ffff18c2030) at ../../gcc/fold-const.c:12394
>>>> #5  0x00000000009d870c in fold_binary_op_with_conditional_arg (loc=0,
>>>>     code=EQ_EXPR, type=0x7ffff18a9d20, op0=0x7ffff1a7f460,
>>>>     op1=0x7ffff1a48780, cond=0x7ffff1a7f460, arg=0x7ffff1a48780,
>>>>     cond_first_p=1) at ../../gcc/fold-const.c:6465
>>>> #6  0x00000000009e3407 in fold_binary_loc (loc=0, code=EQ_EXPR,
>>>>     type=0x7ffff18a9d20, op0=0x7ffff1a7f460, op1=0x7ffff1a48780)
>>>>     at ../../gcc/fold-const.c:9211
>>>> #7  0x0000000000ecb8fa in combine_cond_expr_cond (stmt=0x7ffff1a487d0,
>>>>     code=EQ_EXPR, type=0x7ffff18a9d20, op0=0x7ffff1a7f460,
>>>>     op1=0x7ffff1a48780, invariant_only=true)
>>>>     at ../../gcc/tree-ssa-forwprop.c:382
>>>
>>> Ok, but that only shows that
>>>
>>>       /* Convert A ? 1 : 0 to simply A.  */
>>>       if ((code == VEC_COND_EXPR ? integer_all_onesp (op1)
>>>                                  : (integer_onep (op1)
>>>                                     && !VECTOR_TYPE_P (type)))
>>>           && integer_zerop (op2)
>>>           /* If we try to convert OP0 to our type, the
>>>              call to fold will try to move the conversion inside
>>>              a COND, which will recurse.  In that case, the COND_EXPR
>>>              is probably the best choice, so leave it alone.  */
>>>           && type == TREE_TYPE (arg0))
>>>         return pedantic_non_lvalue_loc (loc, arg0);
>>>
>>>       /* Convert A ? 0 : 1 to !A.  This prefers the use of NOT_EXPR
>>>          over COND_EXPR in cases such as floating point comparisons.  */
>>>       if (integer_zerop (op1)
>>>           && (code == VEC_COND_EXPR ? integer_all_onesp (op2)
>>>                                     : (integer_onep (op2)
>>>                                        && !VECTOR_TYPE_P (type)))
>>>           && truth_value_p (TREE_CODE (arg0)))
>>>         return pedantic_non_lvalue_loc (loc,
>>>                                     fold_convert_loc (loc, type,
>>>                                               invert_truthvalue_loc (loc,
>>>                                                                      
>>> arg0)));
>>>
>>> are wrong?  I can't say for sure without a testcase.
>>>
>>> That said, papering over this in tree-ssa-forwprop.c is not the
>>> correct thing to do.
>>>
>>>> Secondly, I did not catch your idea to implement GCC Vector Extension
>>>> for vector comparison with bool result since
>>>> such extension completely depends on comparison context, e.g. for your
>>>> example, result type of comparison depends on using - for
>>>> if-comparison it is scalar, but for c = (a==b) - result type is
>>>> vector. I don't think that this is reasonable for current release.
>>>
>>> The idea was to be able to write testcases exercising different EQ/NE vector
>>> compares.  But yes, if that's non-trivial the it's not appropriate for 
>>> stage3.
>>>
>>> Can you add a testcase for the forwprop issue and try to fix the offending
>>> bogus folders instead?
>>>
>>> Thanks,
>>> Richard.
>>>
>>>> And finally about AMD performance. I checked that this transformation
>>>> works for "-march=bdver4" option and regression for 481.wrf must
>>>> disappear too.
>>>>
>>>> Thanks.
>>>> Yuri.
>>>>
>>>> 2015-12-04 15:18 GMT+03:00 Richard Biener <richard.guent...@gmail.com>:
>>>>> On Mon, Nov 30, 2015 at 2:11 PM, Yuri Rumyantsev <ysrum...@gmail.com> 
>>>>> wrote:
>>>>>> Hi All,
>>>>>>
>>>>>> Here is a patch for 481.wrf preformance regression for avx2 which is
>>>>>> sligthly modified mask store optimization. This transformation allows
>>>>>> perform unpredication for semi-hammock containing masked stores, other
>>>>>> words if we have a loop like
>>>>>> for (i=0; i<n; i++)
>>>>>>   if (c[i]) {
>>>>>>     p1[i] += 1;
>>>>>>     p2[i] = p3[i] +2;
>>>>>>   }
>>>>>>
>>>>>> then it will be transformed to
>>>>>>    if (!mask__ifc__42.18_165 == { 0, 0, 0, 0, 0, 0, 0, 0 }) {
>>>>>>      vect__11.19_170 = MASK_LOAD (vectp_p1.20_168, 0B, 
>>>>>> mask__ifc__42.18_165);
>>>>>>      vect__12.22_172 = vect__11.19_170 + vect_cst__171;
>>>>>>      MASK_STORE (vectp_p1.23_175, 0B, mask__ifc__42.18_165, 
>>>>>> vect__12.22_172);
>>>>>>      vect__18.25_182 = MASK_LOAD (vectp_p3.26_180, 0B, 
>>>>>> mask__ifc__42.18_165);
>>>>>>      vect__19.28_184 = vect__18.25_182 + vect_cst__183;
>>>>>>      MASK_STORE (vectp_p2.29_187, 0B, mask__ifc__42.18_165, 
>>>>>> vect__19.28_184);
>>>>>>    }
>>>>>> i.e. it will put all computations related to masked stores to 
>>>>>> semi-hammock.
>>>>>>
>>>>>> Bootstrapping and regression testing did not show any new failures.
>>>>>
>>>>> Can you please split out the middle-end support for vector equality 
>>>>> compares?
>>>>>
>>>>> @@ -3448,10 +3448,17 @@ verify_gimple_comparison (tree type, tree op0, 
>>>>> tree op1)
>>>>>        if (TREE_CODE (op0_type) == VECTOR_TYPE
>>>>>           || TREE_CODE (op1_type) == VECTOR_TYPE)
>>>>>          {
>>>>> -          error ("vector comparison returning a boolean");
>>>>> -          debug_generic_expr (op0_type);
>>>>> -          debug_generic_expr (op1_type);
>>>>> -          return true;
>>>>> +         /* Allow vector comparison returning boolean if operand types
>>>>> +            are equal and CODE is EQ/NE.  */
>>>>> +         if ((code != EQ_EXPR && code != NE_EXPR)
>>>>> +             || !(VECTOR_BOOLEAN_TYPE_P (op0_type)
>>>>> +                  || VECTOR_INTEGER_TYPE_P (op0_type)))
>>>>> +           {
>>>>> +             error ("type mismatch for vector comparison returning a 
>>>>> boolean");
>>>>> +             debug_generic_expr (op0_type);
>>>>> +             debug_generic_expr (op1_type);
>>>>> +             return true;
>>>>> +           }
>>>>>          }
>>>>>      }
>>>>>
>>>>> please merge the conditions with a &&
>>>>>
>>>>> @@ -13888,6 +13888,25 @@ fold_relational_const (enum tree_code code,
>>>>> tree type, tree op0, tree op1)
>>>>>
>>>>>    if (TREE_CODE (op0) == VECTOR_CST && TREE_CODE (op1) == VECTOR_CST)
>>>>>      {
>>>>> +      if (INTEGRAL_TYPE_P (type)
>>>>> +         && (TREE_CODE (type) == BOOLEAN_TYPE
>>>>> +             || TYPE_PRECISION (type) == 1))
>>>>> +       {
>>>>> +         /* Have vector comparison with scalar boolean result.  */
>>>>> +         bool result = true;
>>>>> +         gcc_assert (code == EQ_EXPR || code == NE_EXPR);
>>>>> +         gcc_assert (VECTOR_CST_NELTS (op0) == VECTOR_CST_NELTS (op1));
>>>>> +         for (unsigned i = 0; i < VECTOR_CST_NELTS (op0); i++)
>>>>> +           {
>>>>> +             tree elem0 = VECTOR_CST_ELT (op0, i);
>>>>> +             tree elem1 = VECTOR_CST_ELT (op1, i);
>>>>> +             tree tmp = fold_relational_const (code, type, elem0, elem1);
>>>>> +             result &= integer_onep (tmp);
>>>>> +         if (code == NE_EXPR)
>>>>> +           result = !result;
>>>>> +         return constant_boolean_node (result, type);
>>>>>
>>>>> ... just assumes it is either EQ_EXPR or NE_EXPR.   I believe you want
>>>>> to change the
>>>>> guarding condition to just
>>>>>
>>>>>    if (! VECTOR_TYPE_P (type))
>>>>>
>>>>> and assert the boolean/precision.  Please also merge the asserts into
>>>>> one with &&
>>>>>
>>>>> diff --git a/gcc/tree-ssa-forwprop.c b/gcc/tree-ssa-forwprop.c
>>>>> index b82ae3c..73ee3be 100644
>>>>> --- a/gcc/tree-ssa-forwprop.c
>>>>> +++ b/gcc/tree-ssa-forwprop.c
>>>>> @@ -373,6 +373,11 @@ combine_cond_expr_cond (gimple *stmt, enum
>>>>> tree_code code, tree type,
>>>>>
>>>>>    gcc_assert (TREE_CODE_CLASS (code) == tcc_comparison);
>>>>>
>>>>> +  /* Do not perform combining it types are not compatible.  */
>>>>> +  if (TREE_CODE (TREE_TYPE (op0)) == VECTOR_TYPE
>>>>> +      && !tree_int_cst_equal (TYPE_SIZE (type), TYPE_SIZE (TREE_TYPE 
>>>>> (op0))))
>>>>> +    return NULL_TREE;
>>>>> +
>>>>>
>>>>> again, how does this happen?
>>>>>
>>>>> diff --git a/gcc/tree-vrp.c b/gcc/tree-vrp.c
>>>>> index e67048e..1605520c 100644
>>>>> --- a/gcc/tree-vrp.c
>>>>> +++ b/gcc/tree-vrp.c
>>>>> @@ -5760,6 +5760,12 @@ register_edge_assert_for (tree name, edge e,
>>>>> gimple_stmt_iterator si,
>>>>>                                                 &comp_code, &val))
>>>>>      return;
>>>>>
>>>>> +  /* Use of vector comparison in gcond is very restricted and used to 
>>>>> check
>>>>> +     that the mask in masked store is zero, so assert for such comparison
>>>>> +     is not implemented yet.  */
>>>>> +  if (TREE_CODE (TREE_TYPE (name)) == VECTOR_TYPE)
>>>>> +    return;
>>>>> +
>>>>>
>>>>> VECTOR_TYPE_P
>>>>>
>>>>> I believe the comment should simply say that VRP doesn't track ranges for
>>>>> vector types.
>>>>>
>>>>> In the previous review I suggested you should make sure that RTL expansion
>>>>> ends up using a well-defined optab for these compares.  To make sure
>>>>> this happens across targets I suggest you make these comparisons available
>>>>> via the GCC vector extension.  Thus allow
>>>>>
>>>>> typedef int v4si __attribute__((vector_size(16)));
>>>>>
>>>>> int foo (v4si a, v4si b)
>>>>> {
>>>>>   if (a == b)
>>>>>     return 4;
>>>>> }
>>>>>
>>>>> and != and also using floating point vectors.
>>>>>
>>>>> Otherwise it's hard to see the impact of this change.  Obvious choices
>>>>> are the eq/ne optabs for FP compares and [u]cmp optabs for integer
>>>>> compares.
>>>>>
>>>>> A half-way implementation like your VRP comment suggests (only
>>>>> ==/!= zero against integer vectors is implemented?!) this doesn't sound
>>>>> good without also limiting the feature this way in the verifier.
>>>>>
>>>>> Btw, the regression with WRF is >50% on AMD Bulldozer (which only
>>>>> has AVX, not AVX2).
>>>>>
>>>>> Thanks,
>>>>> Richard.
>>>>>
>>>>>> ChangeLog:
>>>>>> 2015-11-30  Yuri Rumyantsev  <ysrum...@gmail.com>
>>>>>>
>>>>>> PR middle-end/68542
>>>>>> * config/i386/i386.c (ix86_expand_branch): Implement integral vector
>>>>>> comparison with boolean result.
>>>>>> * config/i386/sse.md (define_expand "cbranch<mode>4): Add define-expand
>>>>>> for vector comparion with eq/ne only.
>>>>>> * fold-const.c (fold_relational_const): Add handling of vector
>>>>>> comparison with boolean result.
>>>>>> * tree-cfg.c (verify_gimple_comparison): Add argument CODE, allow
>>>>>> comparison of vector operands with boolean result for EQ/NE only.
>>>>>> (verify_gimple_assign_binary): Adjust call for verify_gimple_comparison.
>>>>>> (verify_gimple_cond): Likewise.
>>>>>> * tree-ssa-forwprop.c (combine_cond_expr_cond): Do not perform
>>>>>> combining for non-compatible vector types.
>>>>>> * tree-vect-loop.c (is_valid_sink): New function.
>>>>>> (optimize_mask_stores): Likewise.
>>>>>> * tree-vect-stmts.c (vectorizable_mask_load_store): Initialize
>>>>>> has_mask_store field of vect_info.
>>>>>> * tree-vectorizer.c (vectorize_loops): Invoke optimaze_mask_stores for
>>>>>> vectorized loops having masked stores.
>>>>>> * tree-vectorizer.h (loop_vec_info): Add new has_mask_store field and
>>>>>> correspondent macros.
>>>>>> (optimize_mask_stores): Add prototype.
>>>>>> * tree-vrp.c (register_edge_assert_for): Do not handle NAME with vector
>>>>>> type.
>>>>>>
>>>>>> gcc/testsuite/ChangeLog:
>>>>>> * gcc.target/i386/avx2-vect-mask-store-move1.c: New test.

Attachment: PR68542.middle-end.patch3
Description: Binary data

Reply via email to