Re: [PATCH PR68542]

Yuri Rumyantsev Tue, 08 Dec 2015 04:34:58 -0800

Hi Richard,

Here is the second part of patch.


Is it OK for trunk?

I assume that it should fix huge degradation on 481.wrf for -march=bdver4 also.

ChangeLog:
2015-12-08  Yuri Rumyantsev  <ysrum...@gmail.com>

PR middle-end/68542
* config/i386/i386.c (ix86_expand_branch): Implement integral vector
comparison with boolean result.
* config/i386/sse.md (define_expand "cbranch<mode>4): Add define-expand
for vector comparion with eq/ne only.
* tree-vect-loop.c (is_valid_sink): New function.
(optimize_mask_stores): Likewise.
* tree-vect-stmts.c (vectorizable_mask_load_store): Initialize
has_mask_store field of vect_info.
* tree-vectorizer.c (vectorize_loops): Invoke optimaze_mask_stores for
vectorized loops having masked stores.
* tree-vectorizer.h (loop_vec_info): Add new has_mask_store field and
correspondent macros.
(optimize_mask_stores): Add prototype.

gcc/testsuite/ChangeLog:
* gcc.target/i386/avx2-vect-mask-store-move1.c: New test.

2015-12-07 13:57 GMT+03:00 Yuri Rumyantsev <ysrum...@gmail.com>:
> Richard!
>
> Here is middle-end part of patch with changes proposed by you.
>
> Is it OK for trunk?
>
> Thanks.
> Yuri.
>
> ChangeLog:
> 2015-12-07  Yuri Rumyantsev  <ysrum...@gmail.com>
>
> PR middle-end/68542
> * fold-const.c (fold_relational_const): Add handling of vector
> comparison with boolean result.
> * tree-cfg.c (verify_gimple_comparison): Add argument CODE, allow
> comparison of vector operands with boolean result for EQ/NE only.
> (verify_gimple_assign_binary): Adjust call for verify_gimple_comparison.
> (verify_gimple_cond): Likewise.
> * tree-ssa-forwprop.c (combine_cond_expr_cond): Do not perform
> combining for non-compatible vector types.
> * tree-vrp.c (register_edge_assert_for): VRP does not track ranges for
> vector types.
>
>
>
> 2015-12-04 18:07 GMT+03:00 Yuri Rumyantsev <ysrum...@gmail.com>:
>> Hi Richard.
>>
>> Thanks a lot for your review.
>> Below are my answers.
>>
>> You asked why I inserted additional check to
>> ++ b/gcc/tree-ssa-forwprop.c
>> @@ -373,6 +373,11 @@ combine_cond_expr_cond (gimple *stmt, enum
>> tree_code code, tree type,
>>
>>    gcc_assert (TREE_CODE_CLASS (code) == tcc_comparison);
>>
>> +  /* Do not perform combining it types are not compatible.  */
>> +  if (TREE_CODE (TREE_TYPE (op0)) == VECTOR_TYPE
>> +      && !tree_int_cst_equal (TYPE_SIZE (type), TYPE_SIZE (TREE_TYPE 
>> (op0))))
>> +    return NULL_TREE;
>> +
>>
>> again, how does this happen?
>>
>> This is because without it I've got assert in fold_convert_loc
>>       gcc_assert (TREE_CODE (orig) == VECTOR_TYPE
>>  && tree_int_cst_equal (TYPE_SIZE (type), TYPE_SIZE (orig)));
>>
>> since it tries to convert vector of bool to scalar bool.
>> Here is essential part of call-stack:
>>
>> #0  internal_error (gmsgid=0x1e48397 "in %s, at %s:%d")
>>     at ../../gcc/diagnostic.c:1259
>> #1  0x0000000001743ada in fancy_abort (
>>     file=0x1847fc3 "../../gcc/fold-const.c", line=2217,
>>     function=0x184b9d0 <fold_convert_loc(unsigned int, tree_node*,
>> tree_node*)::__FUNCTION__> "fold_convert_loc") at
>> ../../gcc/diagnostic.c:1332
>> #2  0x00000000009c8330 in fold_convert_loc (loc=0, type=0x7ffff18a9d20,
>>     arg=0x7ffff1a7f488) at ../../gcc/fold-const.c:2216
>> #3  0x00000000009f003f in fold_ternary_loc (loc=0, code=VEC_COND_EXPR,
>>     type=0x7ffff18a9d20, op0=0x7ffff1a7f460, op1=0x7ffff18c2000,
>>     op2=0x7ffff18c2030) at ../../gcc/fold-const.c:11453
>> #4  0x00000000009f2f94 in fold_build3_stat_loc (loc=0, code=VEC_COND_EXPR,
>>     type=0x7ffff18a9d20, op0=0x7ffff1a7f460, op1=0x7ffff18c2000,
>>     op2=0x7ffff18c2030) at ../../gcc/fold-const.c:12394
>> #5  0x00000000009d870c in fold_binary_op_with_conditional_arg (loc=0,
>>     code=EQ_EXPR, type=0x7ffff18a9d20, op0=0x7ffff1a7f460,
>>     op1=0x7ffff1a48780, cond=0x7ffff1a7f460, arg=0x7ffff1a48780,
>>     cond_first_p=1) at ../../gcc/fold-const.c:6465
>> #6  0x00000000009e3407 in fold_binary_loc (loc=0, code=EQ_EXPR,
>>     type=0x7ffff18a9d20, op0=0x7ffff1a7f460, op1=0x7ffff1a48780)
>>     at ../../gcc/fold-const.c:9211
>> #7  0x0000000000ecb8fa in combine_cond_expr_cond (stmt=0x7ffff1a487d0,
>>     code=EQ_EXPR, type=0x7ffff18a9d20, op0=0x7ffff1a7f460,
>>     op1=0x7ffff1a48780, invariant_only=true)
>>     at ../../gcc/tree-ssa-forwprop.c:382
>>
>>
>> Secondly, I did not catch your idea to implement GCC Vector Extension
>> for vector comparison with bool result since
>> such extension completely depends on comparison context, e.g. for your
>> example, result type of comparison depends on using - for
>> if-comparison it is scalar, but for c = (a==b) - result type is
>> vector. I don't think that this is reasonable for current release.
>>
>> And finally about AMD performance. I checked that this transformation
>> works for "-march=bdver4" option and regression for 481.wrf must
>> disappear too.
>>
>> Thanks.
>> Yuri.
>>
>> 2015-12-04 15:18 GMT+03:00 Richard Biener <richard.guent...@gmail.com>:
>>> On Mon, Nov 30, 2015 at 2:11 PM, Yuri Rumyantsev <ysrum...@gmail.com> wrote:
>>>> Hi All,
>>>>
>>>> Here is a patch for 481.wrf preformance regression for avx2 which is
>>>> sligthly modified mask store optimization. This transformation allows
>>>> perform unpredication for semi-hammock containing masked stores, other
>>>> words if we have a loop like
>>>> for (i=0; i<n; i++)
>>>>   if (c[i]) {
>>>>     p1[i] += 1;
>>>>     p2[i] = p3[i] +2;
>>>>   }
>>>>
>>>> then it will be transformed to
>>>>    if (!mask__ifc__42.18_165 == { 0, 0, 0, 0, 0, 0, 0, 0 }) {
>>>>      vect__11.19_170 = MASK_LOAD (vectp_p1.20_168, 0B, 
>>>> mask__ifc__42.18_165);
>>>>      vect__12.22_172 = vect__11.19_170 + vect_cst__171;
>>>>      MASK_STORE (vectp_p1.23_175, 0B, mask__ifc__42.18_165, 
>>>> vect__12.22_172);
>>>>      vect__18.25_182 = MASK_LOAD (vectp_p3.26_180, 0B, 
>>>> mask__ifc__42.18_165);
>>>>      vect__19.28_184 = vect__18.25_182 + vect_cst__183;
>>>>      MASK_STORE (vectp_p2.29_187, 0B, mask__ifc__42.18_165, 
>>>> vect__19.28_184);
>>>>    }
>>>> i.e. it will put all computations related to masked stores to semi-hammock.
>>>>
>>>> Bootstrapping and regression testing did not show any new failures.
>>>
>>> Can you please split out the middle-end support for vector equality 
>>> compares?
>>>
>>> @@ -3448,10 +3448,17 @@ verify_gimple_comparison (tree type, tree op0, tree 
>>> op1)
>>>        if (TREE_CODE (op0_type) == VECTOR_TYPE
>>>           || TREE_CODE (op1_type) == VECTOR_TYPE)
>>>          {
>>> -          error ("vector comparison returning a boolean");
>>> -          debug_generic_expr (op0_type);
>>> -          debug_generic_expr (op1_type);
>>> -          return true;
>>> +         /* Allow vector comparison returning boolean if operand types
>>> +            are equal and CODE is EQ/NE.  */
>>> +         if ((code != EQ_EXPR && code != NE_EXPR)
>>> +             || !(VECTOR_BOOLEAN_TYPE_P (op0_type)
>>> +                  || VECTOR_INTEGER_TYPE_P (op0_type)))
>>> +           {
>>> +             error ("type mismatch for vector comparison returning a 
>>> boolean");
>>> +             debug_generic_expr (op0_type);
>>> +             debug_generic_expr (op1_type);
>>> +             return true;
>>> +           }
>>>          }
>>>      }
>>>
>>> please merge the conditions with a &&
>>>
>>> @@ -13888,6 +13888,25 @@ fold_relational_const (enum tree_code code,
>>> tree type, tree op0, tree op1)
>>>
>>>    if (TREE_CODE (op0) == VECTOR_CST && TREE_CODE (op1) == VECTOR_CST)
>>>      {
>>> +      if (INTEGRAL_TYPE_P (type)
>>> +         && (TREE_CODE (type) == BOOLEAN_TYPE
>>> +             || TYPE_PRECISION (type) == 1))
>>> +       {
>>> +         /* Have vector comparison with scalar boolean result.  */
>>> +         bool result = true;
>>> +         gcc_assert (code == EQ_EXPR || code == NE_EXPR);
>>> +         gcc_assert (VECTOR_CST_NELTS (op0) == VECTOR_CST_NELTS (op1));
>>> +         for (unsigned i = 0; i < VECTOR_CST_NELTS (op0); i++)
>>> +           {
>>> +             tree elem0 = VECTOR_CST_ELT (op0, i);
>>> +             tree elem1 = VECTOR_CST_ELT (op1, i);
>>> +             tree tmp = fold_relational_const (code, type, elem0, elem1);
>>> +             result &= integer_onep (tmp);
>>> +         if (code == NE_EXPR)
>>> +           result = !result;
>>> +         return constant_boolean_node (result, type);
>>>
>>> ... just assumes it is either EQ_EXPR or NE_EXPR.   I believe you want
>>> to change the
>>> guarding condition to just
>>>
>>>    if (! VECTOR_TYPE_P (type))
>>>
>>> and assert the boolean/precision.  Please also merge the asserts into
>>> one with &&
>>>
>>> diff --git a/gcc/tree-ssa-forwprop.c b/gcc/tree-ssa-forwprop.c
>>> index b82ae3c..73ee3be 100644
>>> --- a/gcc/tree-ssa-forwprop.c
>>> +++ b/gcc/tree-ssa-forwprop.c
>>> @@ -373,6 +373,11 @@ combine_cond_expr_cond (gimple *stmt, enum
>>> tree_code code, tree type,
>>>
>>>    gcc_assert (TREE_CODE_CLASS (code) == tcc_comparison);
>>>
>>> +  /* Do not perform combining it types are not compatible.  */
>>> +  if (TREE_CODE (TREE_TYPE (op0)) == VECTOR_TYPE
>>> +      && !tree_int_cst_equal (TYPE_SIZE (type), TYPE_SIZE (TREE_TYPE 
>>> (op0))))
>>> +    return NULL_TREE;
>>> +
>>>
>>> again, how does this happen?
>>>
>>> diff --git a/gcc/tree-vrp.c b/gcc/tree-vrp.c
>>> index e67048e..1605520c 100644
>>> --- a/gcc/tree-vrp.c
>>> +++ b/gcc/tree-vrp.c
>>> @@ -5760,6 +5760,12 @@ register_edge_assert_for (tree name, edge e,
>>> gimple_stmt_iterator si,
>>>                                                 &comp_code, &val))
>>>      return;
>>>
>>> +  /* Use of vector comparison in gcond is very restricted and used to check
>>> +     that the mask in masked store is zero, so assert for such comparison
>>> +     is not implemented yet.  */
>>> +  if (TREE_CODE (TREE_TYPE (name)) == VECTOR_TYPE)
>>> +    return;
>>> +
>>>
>>> VECTOR_TYPE_P
>>>
>>> I believe the comment should simply say that VRP doesn't track ranges for
>>> vector types.
>>>
>>> In the previous review I suggested you should make sure that RTL expansion
>>> ends up using a well-defined optab for these compares.  To make sure
>>> this happens across targets I suggest you make these comparisons available
>>> via the GCC vector extension.  Thus allow
>>>
>>> typedef int v4si __attribute__((vector_size(16)));
>>>
>>> int foo (v4si a, v4si b)
>>> {
>>>   if (a == b)
>>>     return 4;
>>> }
>>>
>>> and != and also using floating point vectors.
>>>
>>> Otherwise it's hard to see the impact of this change.  Obvious choices
>>> are the eq/ne optabs for FP compares and [u]cmp optabs for integer
>>> compares.
>>>
>>> A half-way implementation like your VRP comment suggests (only
>>> ==/!= zero against integer vectors is implemented?!) this doesn't sound
>>> good without also limiting the feature this way in the verifier.
>>>
>>> Btw, the regression with WRF is >50% on AMD Bulldozer (which only
>>> has AVX, not AVX2).
>>>
>>> Thanks,
>>> Richard.
>>>
>>>> ChangeLog:
>>>> 2015-11-30  Yuri Rumyantsev  <ysrum...@gmail.com>
>>>>
>>>> PR middle-end/68542
>>>> * config/i386/i386.c (ix86_expand_branch): Implement integral vector
>>>> comparison with boolean result.
>>>> * config/i386/sse.md (define_expand "cbranch<mode>4): Add define-expand
>>>> for vector comparion with eq/ne only.
>>>> * fold-const.c (fold_relational_const): Add handling of vector
>>>> comparison with boolean result.
>>>> * tree-cfg.c (verify_gimple_comparison): Add argument CODE, allow
>>>> comparison of vector operands with boolean result for EQ/NE only.
>>>> (verify_gimple_assign_binary): Adjust call for verify_gimple_comparison.
>>>> (verify_gimple_cond): Likewise.
>>>> * tree-ssa-forwprop.c (combine_cond_expr_cond): Do not perform
>>>> combining for non-compatible vector types.
>>>> * tree-vect-loop.c (is_valid_sink): New function.
>>>> (optimize_mask_stores): Likewise.
>>>> * tree-vect-stmts.c (vectorizable_mask_load_store): Initialize
>>>> has_mask_store field of vect_info.
>>>> * tree-vectorizer.c (vectorize_loops): Invoke optimaze_mask_stores for
>>>> vectorized loops having masked stores.
>>>> * tree-vectorizer.h (loop_vec_info): Add new has_mask_store field and
>>>> correspondent macros.
>>>> (optimize_mask_stores): Add prototype.
>>>> * tree-vrp.c (register_edge_assert_for): Do not handle NAME with vector
>>>> type.
>>>>
>>>> gcc/testsuite/ChangeLog:
>>>> * gcc.target/i386/avx2-vect-mask-store-move1.c: New test.

PR68542.patch2
Description: Binary data

Re: [PATCH PR68542]

Reply via email to