Hi Richard, Did you have anu chance to look at updated patch?
Thanks. Yuri. 2015-12-18 13:20 GMT+03:00 Yuri Rumyantsev <ysrum...@gmail.com>: > Hi Richard, > > Here is updated patch for middle-end part of the whole patch which > fixes all your remarks I hope. > > Regression testing and bootstrapping did not show any new failures. > Is it OK for trunk? > > Yuri. > > ChangeLog: > 2015-12-18 Yuri Rumyantsev <ysrum...@gmail.com> > > PR middle-end/68542 > * fold-const.c (fold_binary_op_with_conditional_arg): Bail out for case > of mixind vector and scalar types. > (fold_relational_const): Add handling of vector > comparison with boolean result. > * tree-cfg.c (verify_gimple_comparison): Add argument CODE, allow > comparison of vector operands with boolean result for EQ/NE only. > (verify_gimple_assign_binary): Adjust call for verify_gimple_comparison. > (verify_gimple_cond): Likewise. > * tree-ssa-forwprop.c (combine_cond_expr_cond): Do not perform > combining for non-compatible vector types. > * tree-vrp.c (register_edge_assert_for): VRP does not track ranges for > vector types. > > 2015-12-16 16:37 GMT+03:00 Richard Biener <richard.guent...@gmail.com>: >> On Fri, Dec 11, 2015 at 3:03 PM, Yuri Rumyantsev <ysrum...@gmail.com> wrote: >>> Richard. >>> Thanks for your review. >>> I re-designed fix for assert by adding additional checks for vector >>> comparison with boolean result to fold_binary_op_with_conditional_arg >>> and remove early exit to combine_cond_expr_cond. >>> Unfortunately, I am not able to provide you with test-case since it is >>> in my second patch related to back-end patch which I sent earlier >>> (12-08). >>> >>> Bootstrapping and regression testing did not show any new failures. >>> Is it OK for trunk? >> >> + else if (TREE_CODE (type) == VECTOR_TYPE) >> { >> tree testtype = TREE_TYPE (cond); >> test = cond; >> true_value = constant_boolean_node (true, testtype); >> false_value = constant_boolean_node (false, testtype); >> } >> + else >> + { >> + test = cond; >> + cond_type = type; >> + true_value = boolean_true_node; >> + false_value = boolean_false_node; >> + } >> >> So this is, say, vec1 != vec2 with scalar vs. vector result. If we have >> scalar result and thus, say, scalar + vec1 != vec2. I believe rather >> than doing the above (not seeing how this not would generate wrong >> code eventually) we should simply detect the case of mixing vector >> and scalar types and bail out. At least without some comments >> your patch makes the function even more difficult to understand than >> it is already. >> >> @@ -3448,10 +3448,17 @@ verify_gimple_comparison (tree type, tree op0, tree >> op1) >> if (TREE_CODE (op0_type) == VECTOR_TYPE >> || TREE_CODE (op1_type) == VECTOR_TYPE) >> { >> - error ("vector comparison returning a boolean"); >> - debug_generic_expr (op0_type); >> - debug_generic_expr (op1_type); >> - return true; >> + /* Allow vector comparison returning boolean if operand types >> + are boolean or integral and CODE is EQ/NE. */ >> + if (code != EQ_EXPR && code != NE_EXPR >> + && !VECTOR_BOOLEAN_TYPE_P (op0_type) >> + && !VECTOR_INTEGER_TYPE_P (op0_type)) >> + { >> + error ("type mismatch for vector comparison returning a >> boolean"); >> + debug_generic_expr (op0_type); >> + debug_generic_expr (op1_type); >> + return true; >> + } >> } >> } >> /* Or a boolean vector type with the same element count >> >> as said before please merge the cascaded if()s. Better wording for >> the error is "unsupported operation or type for vector comparison >> returning a boolean" >> >> Otherwise the patch looks sensible to me though it shows that overloading of >> EQ/NE_EXPR for scalar result and vector operands might have some more >> unexpected >> fallout (which is why I originally prefered the view-convert to large >> integer type variant). >> >> Thanks, >> Richard. >> >> >>> ChangeLog: >>> 2015-12-11 Yuri Rumyantsev <ysrum...@gmail.com> >>> >>> PR middle-end/68542 >>> * fold-const.c (fold_binary_op_with_conditional_arg): Add checks oh >>> vector comparison with boolean result to avoid ICE. >>> (fold_relational_const): Add handling of vector >>> comparison with boolean result. >>> * tree-cfg.c (verify_gimple_comparison): Add argument CODE, allow >>> comparison of vector operands with boolean result for EQ/NE only. >>> (verify_gimple_assign_binary): Adjust call for verify_gimple_comparison. >>> (verify_gimple_cond): Likewise. >>> * tree-ssa-forwprop.c (combine_cond_expr_cond): Do not perform >>> combining for non-compatible vector types. >>> * tree-vrp.c (register_edge_assert_for): VRP does not track ranges for >>> vector types. >>> >>> 2015-12-10 16:36 GMT+03:00 Richard Biener <richard.guent...@gmail.com>: >>>> On Fri, Dec 4, 2015 at 4:07 PM, Yuri Rumyantsev <ysrum...@gmail.com> wrote: >>>>> Hi Richard. >>>>> >>>>> Thanks a lot for your review. >>>>> Below are my answers. >>>>> >>>>> You asked why I inserted additional check to >>>>> ++ b/gcc/tree-ssa-forwprop.c >>>>> @@ -373,6 +373,11 @@ combine_cond_expr_cond (gimple *stmt, enum >>>>> tree_code code, tree type, >>>>> >>>>> gcc_assert (TREE_CODE_CLASS (code) == tcc_comparison); >>>>> >>>>> + /* Do not perform combining it types are not compatible. */ >>>>> + if (TREE_CODE (TREE_TYPE (op0)) == VECTOR_TYPE >>>>> + && !tree_int_cst_equal (TYPE_SIZE (type), TYPE_SIZE (TREE_TYPE >>>>> (op0)))) >>>>> + return NULL_TREE; >>>>> + >>>>> >>>>> again, how does this happen? >>>>> >>>>> This is because without it I've got assert in fold_convert_loc >>>>> gcc_assert (TREE_CODE (orig) == VECTOR_TYPE >>>>> && tree_int_cst_equal (TYPE_SIZE (type), TYPE_SIZE (orig))); >>>>> >>>>> since it tries to convert vector of bool to scalar bool. >>>>> Here is essential part of call-stack: >>>>> >>>>> #0 internal_error (gmsgid=0x1e48397 "in %s, at %s:%d") >>>>> at ../../gcc/diagnostic.c:1259 >>>>> #1 0x0000000001743ada in fancy_abort ( >>>>> file=0x1847fc3 "../../gcc/fold-const.c", line=2217, >>>>> function=0x184b9d0 <fold_convert_loc(unsigned int, tree_node*, >>>>> tree_node*)::__FUNCTION__> "fold_convert_loc") at >>>>> ../../gcc/diagnostic.c:1332 >>>>> #2 0x00000000009c8330 in fold_convert_loc (loc=0, type=0x7ffff18a9d20, >>>>> arg=0x7ffff1a7f488) at ../../gcc/fold-const.c:2216 >>>>> #3 0x00000000009f003f in fold_ternary_loc (loc=0, code=VEC_COND_EXPR, >>>>> type=0x7ffff18a9d20, op0=0x7ffff1a7f460, op1=0x7ffff18c2000, >>>>> op2=0x7ffff18c2030) at ../../gcc/fold-const.c:11453 >>>>> #4 0x00000000009f2f94 in fold_build3_stat_loc (loc=0, code=VEC_COND_EXPR, >>>>> type=0x7ffff18a9d20, op0=0x7ffff1a7f460, op1=0x7ffff18c2000, >>>>> op2=0x7ffff18c2030) at ../../gcc/fold-const.c:12394 >>>>> #5 0x00000000009d870c in fold_binary_op_with_conditional_arg (loc=0, >>>>> code=EQ_EXPR, type=0x7ffff18a9d20, op0=0x7ffff1a7f460, >>>>> op1=0x7ffff1a48780, cond=0x7ffff1a7f460, arg=0x7ffff1a48780, >>>>> cond_first_p=1) at ../../gcc/fold-const.c:6465 >>>>> #6 0x00000000009e3407 in fold_binary_loc (loc=0, code=EQ_EXPR, >>>>> type=0x7ffff18a9d20, op0=0x7ffff1a7f460, op1=0x7ffff1a48780) >>>>> at ../../gcc/fold-const.c:9211 >>>>> #7 0x0000000000ecb8fa in combine_cond_expr_cond (stmt=0x7ffff1a487d0, >>>>> code=EQ_EXPR, type=0x7ffff18a9d20, op0=0x7ffff1a7f460, >>>>> op1=0x7ffff1a48780, invariant_only=true) >>>>> at ../../gcc/tree-ssa-forwprop.c:382 >>>> >>>> Ok, but that only shows that >>>> >>>> /* Convert A ? 1 : 0 to simply A. */ >>>> if ((code == VEC_COND_EXPR ? integer_all_onesp (op1) >>>> : (integer_onep (op1) >>>> && !VECTOR_TYPE_P (type))) >>>> && integer_zerop (op2) >>>> /* If we try to convert OP0 to our type, the >>>> call to fold will try to move the conversion inside >>>> a COND, which will recurse. In that case, the COND_EXPR >>>> is probably the best choice, so leave it alone. */ >>>> && type == TREE_TYPE (arg0)) >>>> return pedantic_non_lvalue_loc (loc, arg0); >>>> >>>> /* Convert A ? 0 : 1 to !A. This prefers the use of NOT_EXPR >>>> over COND_EXPR in cases such as floating point comparisons. */ >>>> if (integer_zerop (op1) >>>> && (code == VEC_COND_EXPR ? integer_all_onesp (op2) >>>> : (integer_onep (op2) >>>> && !VECTOR_TYPE_P (type))) >>>> && truth_value_p (TREE_CODE (arg0))) >>>> return pedantic_non_lvalue_loc (loc, >>>> fold_convert_loc (loc, type, >>>> invert_truthvalue_loc (loc, >>>> >>>> arg0))); >>>> >>>> are wrong? I can't say for sure without a testcase. >>>> >>>> That said, papering over this in tree-ssa-forwprop.c is not the >>>> correct thing to do. >>>> >>>>> Secondly, I did not catch your idea to implement GCC Vector Extension >>>>> for vector comparison with bool result since >>>>> such extension completely depends on comparison context, e.g. for your >>>>> example, result type of comparison depends on using - for >>>>> if-comparison it is scalar, but for c = (a==b) - result type is >>>>> vector. I don't think that this is reasonable for current release. >>>> >>>> The idea was to be able to write testcases exercising different EQ/NE >>>> vector >>>> compares. But yes, if that's non-trivial the it's not appropriate for >>>> stage3. >>>> >>>> Can you add a testcase for the forwprop issue and try to fix the offending >>>> bogus folders instead? >>>> >>>> Thanks, >>>> Richard. >>>> >>>>> And finally about AMD performance. I checked that this transformation >>>>> works for "-march=bdver4" option and regression for 481.wrf must >>>>> disappear too. >>>>> >>>>> Thanks. >>>>> Yuri. >>>>> >>>>> 2015-12-04 15:18 GMT+03:00 Richard Biener <richard.guent...@gmail.com>: >>>>>> On Mon, Nov 30, 2015 at 2:11 PM, Yuri Rumyantsev <ysrum...@gmail.com> >>>>>> wrote: >>>>>>> Hi All, >>>>>>> >>>>>>> Here is a patch for 481.wrf preformance regression for avx2 which is >>>>>>> sligthly modified mask store optimization. This transformation allows >>>>>>> perform unpredication for semi-hammock containing masked stores, other >>>>>>> words if we have a loop like >>>>>>> for (i=0; i<n; i++) >>>>>>> if (c[i]) { >>>>>>> p1[i] += 1; >>>>>>> p2[i] = p3[i] +2; >>>>>>> } >>>>>>> >>>>>>> then it will be transformed to >>>>>>> if (!mask__ifc__42.18_165 == { 0, 0, 0, 0, 0, 0, 0, 0 }) { >>>>>>> vect__11.19_170 = MASK_LOAD (vectp_p1.20_168, 0B, >>>>>>> mask__ifc__42.18_165); >>>>>>> vect__12.22_172 = vect__11.19_170 + vect_cst__171; >>>>>>> MASK_STORE (vectp_p1.23_175, 0B, mask__ifc__42.18_165, >>>>>>> vect__12.22_172); >>>>>>> vect__18.25_182 = MASK_LOAD (vectp_p3.26_180, 0B, >>>>>>> mask__ifc__42.18_165); >>>>>>> vect__19.28_184 = vect__18.25_182 + vect_cst__183; >>>>>>> MASK_STORE (vectp_p2.29_187, 0B, mask__ifc__42.18_165, >>>>>>> vect__19.28_184); >>>>>>> } >>>>>>> i.e. it will put all computations related to masked stores to >>>>>>> semi-hammock. >>>>>>> >>>>>>> Bootstrapping and regression testing did not show any new failures. >>>>>> >>>>>> Can you please split out the middle-end support for vector equality >>>>>> compares? >>>>>> >>>>>> @@ -3448,10 +3448,17 @@ verify_gimple_comparison (tree type, tree op0, >>>>>> tree op1) >>>>>> if (TREE_CODE (op0_type) == VECTOR_TYPE >>>>>> || TREE_CODE (op1_type) == VECTOR_TYPE) >>>>>> { >>>>>> - error ("vector comparison returning a boolean"); >>>>>> - debug_generic_expr (op0_type); >>>>>> - debug_generic_expr (op1_type); >>>>>> - return true; >>>>>> + /* Allow vector comparison returning boolean if operand types >>>>>> + are equal and CODE is EQ/NE. */ >>>>>> + if ((code != EQ_EXPR && code != NE_EXPR) >>>>>> + || !(VECTOR_BOOLEAN_TYPE_P (op0_type) >>>>>> + || VECTOR_INTEGER_TYPE_P (op0_type))) >>>>>> + { >>>>>> + error ("type mismatch for vector comparison returning a >>>>>> boolean"); >>>>>> + debug_generic_expr (op0_type); >>>>>> + debug_generic_expr (op1_type); >>>>>> + return true; >>>>>> + } >>>>>> } >>>>>> } >>>>>> >>>>>> please merge the conditions with a && >>>>>> >>>>>> @@ -13888,6 +13888,25 @@ fold_relational_const (enum tree_code code, >>>>>> tree type, tree op0, tree op1) >>>>>> >>>>>> if (TREE_CODE (op0) == VECTOR_CST && TREE_CODE (op1) == VECTOR_CST) >>>>>> { >>>>>> + if (INTEGRAL_TYPE_P (type) >>>>>> + && (TREE_CODE (type) == BOOLEAN_TYPE >>>>>> + || TYPE_PRECISION (type) == 1)) >>>>>> + { >>>>>> + /* Have vector comparison with scalar boolean result. */ >>>>>> + bool result = true; >>>>>> + gcc_assert (code == EQ_EXPR || code == NE_EXPR); >>>>>> + gcc_assert (VECTOR_CST_NELTS (op0) == VECTOR_CST_NELTS (op1)); >>>>>> + for (unsigned i = 0; i < VECTOR_CST_NELTS (op0); i++) >>>>>> + { >>>>>> + tree elem0 = VECTOR_CST_ELT (op0, i); >>>>>> + tree elem1 = VECTOR_CST_ELT (op1, i); >>>>>> + tree tmp = fold_relational_const (code, type, elem0, >>>>>> elem1); >>>>>> + result &= integer_onep (tmp); >>>>>> + if (code == NE_EXPR) >>>>>> + result = !result; >>>>>> + return constant_boolean_node (result, type); >>>>>> >>>>>> ... just assumes it is either EQ_EXPR or NE_EXPR. I believe you want >>>>>> to change the >>>>>> guarding condition to just >>>>>> >>>>>> if (! VECTOR_TYPE_P (type)) >>>>>> >>>>>> and assert the boolean/precision. Please also merge the asserts into >>>>>> one with && >>>>>> >>>>>> diff --git a/gcc/tree-ssa-forwprop.c b/gcc/tree-ssa-forwprop.c >>>>>> index b82ae3c..73ee3be 100644 >>>>>> --- a/gcc/tree-ssa-forwprop.c >>>>>> +++ b/gcc/tree-ssa-forwprop.c >>>>>> @@ -373,6 +373,11 @@ combine_cond_expr_cond (gimple *stmt, enum >>>>>> tree_code code, tree type, >>>>>> >>>>>> gcc_assert (TREE_CODE_CLASS (code) == tcc_comparison); >>>>>> >>>>>> + /* Do not perform combining it types are not compatible. */ >>>>>> + if (TREE_CODE (TREE_TYPE (op0)) == VECTOR_TYPE >>>>>> + && !tree_int_cst_equal (TYPE_SIZE (type), TYPE_SIZE (TREE_TYPE >>>>>> (op0)))) >>>>>> + return NULL_TREE; >>>>>> + >>>>>> >>>>>> again, how does this happen? >>>>>> >>>>>> diff --git a/gcc/tree-vrp.c b/gcc/tree-vrp.c >>>>>> index e67048e..1605520c 100644 >>>>>> --- a/gcc/tree-vrp.c >>>>>> +++ b/gcc/tree-vrp.c >>>>>> @@ -5760,6 +5760,12 @@ register_edge_assert_for (tree name, edge e, >>>>>> gimple_stmt_iterator si, >>>>>> &comp_code, &val)) >>>>>> return; >>>>>> >>>>>> + /* Use of vector comparison in gcond is very restricted and used to >>>>>> check >>>>>> + that the mask in masked store is zero, so assert for such >>>>>> comparison >>>>>> + is not implemented yet. */ >>>>>> + if (TREE_CODE (TREE_TYPE (name)) == VECTOR_TYPE) >>>>>> + return; >>>>>> + >>>>>> >>>>>> VECTOR_TYPE_P >>>>>> >>>>>> I believe the comment should simply say that VRP doesn't track ranges for >>>>>> vector types. >>>>>> >>>>>> In the previous review I suggested you should make sure that RTL >>>>>> expansion >>>>>> ends up using a well-defined optab for these compares. To make sure >>>>>> this happens across targets I suggest you make these comparisons >>>>>> available >>>>>> via the GCC vector extension. Thus allow >>>>>> >>>>>> typedef int v4si __attribute__((vector_size(16))); >>>>>> >>>>>> int foo (v4si a, v4si b) >>>>>> { >>>>>> if (a == b) >>>>>> return 4; >>>>>> } >>>>>> >>>>>> and != and also using floating point vectors. >>>>>> >>>>>> Otherwise it's hard to see the impact of this change. Obvious choices >>>>>> are the eq/ne optabs for FP compares and [u]cmp optabs for integer >>>>>> compares. >>>>>> >>>>>> A half-way implementation like your VRP comment suggests (only >>>>>> ==/!= zero against integer vectors is implemented?!) this doesn't sound >>>>>> good without also limiting the feature this way in the verifier. >>>>>> >>>>>> Btw, the regression with WRF is >50% on AMD Bulldozer (which only >>>>>> has AVX, not AVX2). >>>>>> >>>>>> Thanks, >>>>>> Richard. >>>>>> >>>>>>> ChangeLog: >>>>>>> 2015-11-30 Yuri Rumyantsev <ysrum...@gmail.com> >>>>>>> >>>>>>> PR middle-end/68542 >>>>>>> * config/i386/i386.c (ix86_expand_branch): Implement integral vector >>>>>>> comparison with boolean result. >>>>>>> * config/i386/sse.md (define_expand "cbranch<mode>4): Add define-expand >>>>>>> for vector comparion with eq/ne only. >>>>>>> * fold-const.c (fold_relational_const): Add handling of vector >>>>>>> comparison with boolean result. >>>>>>> * tree-cfg.c (verify_gimple_comparison): Add argument CODE, allow >>>>>>> comparison of vector operands with boolean result for EQ/NE only. >>>>>>> (verify_gimple_assign_binary): Adjust call for verify_gimple_comparison. >>>>>>> (verify_gimple_cond): Likewise. >>>>>>> * tree-ssa-forwprop.c (combine_cond_expr_cond): Do not perform >>>>>>> combining for non-compatible vector types. >>>>>>> * tree-vect-loop.c (is_valid_sink): New function. >>>>>>> (optimize_mask_stores): Likewise. >>>>>>> * tree-vect-stmts.c (vectorizable_mask_load_store): Initialize >>>>>>> has_mask_store field of vect_info. >>>>>>> * tree-vectorizer.c (vectorize_loops): Invoke optimaze_mask_stores for >>>>>>> vectorized loops having masked stores. >>>>>>> * tree-vectorizer.h (loop_vec_info): Add new has_mask_store field and >>>>>>> correspondent macros. >>>>>>> (optimize_mask_stores): Add prototype. >>>>>>> * tree-vrp.c (register_edge_assert_for): Do not handle NAME with vector >>>>>>> type. >>>>>>> >>>>>>> gcc/testsuite/ChangeLog: >>>>>>> * gcc.target/i386/avx2-vect-mask-store-move1.c: New test.