Hi Richard, Here is the second part of patch.
Is it OK for trunk? I assume that it should fix huge degradation on 481.wrf for -march=bdver4 also. ChangeLog: 2015-12-08 Yuri Rumyantsev <ysrum...@gmail.com> PR middle-end/68542 * config/i386/i386.c (ix86_expand_branch): Implement integral vector comparison with boolean result. * config/i386/sse.md (define_expand "cbranch<mode>4): Add define-expand for vector comparion with eq/ne only. * tree-vect-loop.c (is_valid_sink): New function. (optimize_mask_stores): Likewise. * tree-vect-stmts.c (vectorizable_mask_load_store): Initialize has_mask_store field of vect_info. * tree-vectorizer.c (vectorize_loops): Invoke optimaze_mask_stores for vectorized loops having masked stores. * tree-vectorizer.h (loop_vec_info): Add new has_mask_store field and correspondent macros. (optimize_mask_stores): Add prototype. gcc/testsuite/ChangeLog: * gcc.target/i386/avx2-vect-mask-store-move1.c: New test. 2015-12-07 13:57 GMT+03:00 Yuri Rumyantsev <ysrum...@gmail.com>: > Richard! > > Here is middle-end part of patch with changes proposed by you. > > Is it OK for trunk? > > Thanks. > Yuri. > > ChangeLog: > 2015-12-07 Yuri Rumyantsev <ysrum...@gmail.com> > > PR middle-end/68542 > * fold-const.c (fold_relational_const): Add handling of vector > comparison with boolean result. > * tree-cfg.c (verify_gimple_comparison): Add argument CODE, allow > comparison of vector operands with boolean result for EQ/NE only. > (verify_gimple_assign_binary): Adjust call for verify_gimple_comparison. > (verify_gimple_cond): Likewise. > * tree-ssa-forwprop.c (combine_cond_expr_cond): Do not perform > combining for non-compatible vector types. > * tree-vrp.c (register_edge_assert_for): VRP does not track ranges for > vector types. > > > > 2015-12-04 18:07 GMT+03:00 Yuri Rumyantsev <ysrum...@gmail.com>: >> Hi Richard. >> >> Thanks a lot for your review. >> Below are my answers. >> >> You asked why I inserted additional check to >> ++ b/gcc/tree-ssa-forwprop.c >> @@ -373,6 +373,11 @@ combine_cond_expr_cond (gimple *stmt, enum >> tree_code code, tree type, >> >> gcc_assert (TREE_CODE_CLASS (code) == tcc_comparison); >> >> + /* Do not perform combining it types are not compatible. */ >> + if (TREE_CODE (TREE_TYPE (op0)) == VECTOR_TYPE >> + && !tree_int_cst_equal (TYPE_SIZE (type), TYPE_SIZE (TREE_TYPE >> (op0)))) >> + return NULL_TREE; >> + >> >> again, how does this happen? >> >> This is because without it I've got assert in fold_convert_loc >> gcc_assert (TREE_CODE (orig) == VECTOR_TYPE >> && tree_int_cst_equal (TYPE_SIZE (type), TYPE_SIZE (orig))); >> >> since it tries to convert vector of bool to scalar bool. >> Here is essential part of call-stack: >> >> #0 internal_error (gmsgid=0x1e48397 "in %s, at %s:%d") >> at ../../gcc/diagnostic.c:1259 >> #1 0x0000000001743ada in fancy_abort ( >> file=0x1847fc3 "../../gcc/fold-const.c", line=2217, >> function=0x184b9d0 <fold_convert_loc(unsigned int, tree_node*, >> tree_node*)::__FUNCTION__> "fold_convert_loc") at >> ../../gcc/diagnostic.c:1332 >> #2 0x00000000009c8330 in fold_convert_loc (loc=0, type=0x7ffff18a9d20, >> arg=0x7ffff1a7f488) at ../../gcc/fold-const.c:2216 >> #3 0x00000000009f003f in fold_ternary_loc (loc=0, code=VEC_COND_EXPR, >> type=0x7ffff18a9d20, op0=0x7ffff1a7f460, op1=0x7ffff18c2000, >> op2=0x7ffff18c2030) at ../../gcc/fold-const.c:11453 >> #4 0x00000000009f2f94 in fold_build3_stat_loc (loc=0, code=VEC_COND_EXPR, >> type=0x7ffff18a9d20, op0=0x7ffff1a7f460, op1=0x7ffff18c2000, >> op2=0x7ffff18c2030) at ../../gcc/fold-const.c:12394 >> #5 0x00000000009d870c in fold_binary_op_with_conditional_arg (loc=0, >> code=EQ_EXPR, type=0x7ffff18a9d20, op0=0x7ffff1a7f460, >> op1=0x7ffff1a48780, cond=0x7ffff1a7f460, arg=0x7ffff1a48780, >> cond_first_p=1) at ../../gcc/fold-const.c:6465 >> #6 0x00000000009e3407 in fold_binary_loc (loc=0, code=EQ_EXPR, >> type=0x7ffff18a9d20, op0=0x7ffff1a7f460, op1=0x7ffff1a48780) >> at ../../gcc/fold-const.c:9211 >> #7 0x0000000000ecb8fa in combine_cond_expr_cond (stmt=0x7ffff1a487d0, >> code=EQ_EXPR, type=0x7ffff18a9d20, op0=0x7ffff1a7f460, >> op1=0x7ffff1a48780, invariant_only=true) >> at ../../gcc/tree-ssa-forwprop.c:382 >> >> >> Secondly, I did not catch your idea to implement GCC Vector Extension >> for vector comparison with bool result since >> such extension completely depends on comparison context, e.g. for your >> example, result type of comparison depends on using - for >> if-comparison it is scalar, but for c = (a==b) - result type is >> vector. I don't think that this is reasonable for current release. >> >> And finally about AMD performance. I checked that this transformation >> works for "-march=bdver4" option and regression for 481.wrf must >> disappear too. >> >> Thanks. >> Yuri. >> >> 2015-12-04 15:18 GMT+03:00 Richard Biener <richard.guent...@gmail.com>: >>> On Mon, Nov 30, 2015 at 2:11 PM, Yuri Rumyantsev <ysrum...@gmail.com> wrote: >>>> Hi All, >>>> >>>> Here is a patch for 481.wrf preformance regression for avx2 which is >>>> sligthly modified mask store optimization. This transformation allows >>>> perform unpredication for semi-hammock containing masked stores, other >>>> words if we have a loop like >>>> for (i=0; i<n; i++) >>>> if (c[i]) { >>>> p1[i] += 1; >>>> p2[i] = p3[i] +2; >>>> } >>>> >>>> then it will be transformed to >>>> if (!mask__ifc__42.18_165 == { 0, 0, 0, 0, 0, 0, 0, 0 }) { >>>> vect__11.19_170 = MASK_LOAD (vectp_p1.20_168, 0B, >>>> mask__ifc__42.18_165); >>>> vect__12.22_172 = vect__11.19_170 + vect_cst__171; >>>> MASK_STORE (vectp_p1.23_175, 0B, mask__ifc__42.18_165, >>>> vect__12.22_172); >>>> vect__18.25_182 = MASK_LOAD (vectp_p3.26_180, 0B, >>>> mask__ifc__42.18_165); >>>> vect__19.28_184 = vect__18.25_182 + vect_cst__183; >>>> MASK_STORE (vectp_p2.29_187, 0B, mask__ifc__42.18_165, >>>> vect__19.28_184); >>>> } >>>> i.e. it will put all computations related to masked stores to semi-hammock. >>>> >>>> Bootstrapping and regression testing did not show any new failures. >>> >>> Can you please split out the middle-end support for vector equality >>> compares? >>> >>> @@ -3448,10 +3448,17 @@ verify_gimple_comparison (tree type, tree op0, tree >>> op1) >>> if (TREE_CODE (op0_type) == VECTOR_TYPE >>> || TREE_CODE (op1_type) == VECTOR_TYPE) >>> { >>> - error ("vector comparison returning a boolean"); >>> - debug_generic_expr (op0_type); >>> - debug_generic_expr (op1_type); >>> - return true; >>> + /* Allow vector comparison returning boolean if operand types >>> + are equal and CODE is EQ/NE. */ >>> + if ((code != EQ_EXPR && code != NE_EXPR) >>> + || !(VECTOR_BOOLEAN_TYPE_P (op0_type) >>> + || VECTOR_INTEGER_TYPE_P (op0_type))) >>> + { >>> + error ("type mismatch for vector comparison returning a >>> boolean"); >>> + debug_generic_expr (op0_type); >>> + debug_generic_expr (op1_type); >>> + return true; >>> + } >>> } >>> } >>> >>> please merge the conditions with a && >>> >>> @@ -13888,6 +13888,25 @@ fold_relational_const (enum tree_code code, >>> tree type, tree op0, tree op1) >>> >>> if (TREE_CODE (op0) == VECTOR_CST && TREE_CODE (op1) == VECTOR_CST) >>> { >>> + if (INTEGRAL_TYPE_P (type) >>> + && (TREE_CODE (type) == BOOLEAN_TYPE >>> + || TYPE_PRECISION (type) == 1)) >>> + { >>> + /* Have vector comparison with scalar boolean result. */ >>> + bool result = true; >>> + gcc_assert (code == EQ_EXPR || code == NE_EXPR); >>> + gcc_assert (VECTOR_CST_NELTS (op0) == VECTOR_CST_NELTS (op1)); >>> + for (unsigned i = 0; i < VECTOR_CST_NELTS (op0); i++) >>> + { >>> + tree elem0 = VECTOR_CST_ELT (op0, i); >>> + tree elem1 = VECTOR_CST_ELT (op1, i); >>> + tree tmp = fold_relational_const (code, type, elem0, elem1); >>> + result &= integer_onep (tmp); >>> + if (code == NE_EXPR) >>> + result = !result; >>> + return constant_boolean_node (result, type); >>> >>> ... just assumes it is either EQ_EXPR or NE_EXPR. I believe you want >>> to change the >>> guarding condition to just >>> >>> if (! VECTOR_TYPE_P (type)) >>> >>> and assert the boolean/precision. Please also merge the asserts into >>> one with && >>> >>> diff --git a/gcc/tree-ssa-forwprop.c b/gcc/tree-ssa-forwprop.c >>> index b82ae3c..73ee3be 100644 >>> --- a/gcc/tree-ssa-forwprop.c >>> +++ b/gcc/tree-ssa-forwprop.c >>> @@ -373,6 +373,11 @@ combine_cond_expr_cond (gimple *stmt, enum >>> tree_code code, tree type, >>> >>> gcc_assert (TREE_CODE_CLASS (code) == tcc_comparison); >>> >>> + /* Do not perform combining it types are not compatible. */ >>> + if (TREE_CODE (TREE_TYPE (op0)) == VECTOR_TYPE >>> + && !tree_int_cst_equal (TYPE_SIZE (type), TYPE_SIZE (TREE_TYPE >>> (op0)))) >>> + return NULL_TREE; >>> + >>> >>> again, how does this happen? >>> >>> diff --git a/gcc/tree-vrp.c b/gcc/tree-vrp.c >>> index e67048e..1605520c 100644 >>> --- a/gcc/tree-vrp.c >>> +++ b/gcc/tree-vrp.c >>> @@ -5760,6 +5760,12 @@ register_edge_assert_for (tree name, edge e, >>> gimple_stmt_iterator si, >>> &comp_code, &val)) >>> return; >>> >>> + /* Use of vector comparison in gcond is very restricted and used to check >>> + that the mask in masked store is zero, so assert for such comparison >>> + is not implemented yet. */ >>> + if (TREE_CODE (TREE_TYPE (name)) == VECTOR_TYPE) >>> + return; >>> + >>> >>> VECTOR_TYPE_P >>> >>> I believe the comment should simply say that VRP doesn't track ranges for >>> vector types. >>> >>> In the previous review I suggested you should make sure that RTL expansion >>> ends up using a well-defined optab for these compares. To make sure >>> this happens across targets I suggest you make these comparisons available >>> via the GCC vector extension. Thus allow >>> >>> typedef int v4si __attribute__((vector_size(16))); >>> >>> int foo (v4si a, v4si b) >>> { >>> if (a == b) >>> return 4; >>> } >>> >>> and != and also using floating point vectors. >>> >>> Otherwise it's hard to see the impact of this change. Obvious choices >>> are the eq/ne optabs for FP compares and [u]cmp optabs for integer >>> compares. >>> >>> A half-way implementation like your VRP comment suggests (only >>> ==/!= zero against integer vectors is implemented?!) this doesn't sound >>> good without also limiting the feature this way in the verifier. >>> >>> Btw, the regression with WRF is >50% on AMD Bulldozer (which only >>> has AVX, not AVX2). >>> >>> Thanks, >>> Richard. >>> >>>> ChangeLog: >>>> 2015-11-30 Yuri Rumyantsev <ysrum...@gmail.com> >>>> >>>> PR middle-end/68542 >>>> * config/i386/i386.c (ix86_expand_branch): Implement integral vector >>>> comparison with boolean result. >>>> * config/i386/sse.md (define_expand "cbranch<mode>4): Add define-expand >>>> for vector comparion with eq/ne only. >>>> * fold-const.c (fold_relational_const): Add handling of vector >>>> comparison with boolean result. >>>> * tree-cfg.c (verify_gimple_comparison): Add argument CODE, allow >>>> comparison of vector operands with boolean result for EQ/NE only. >>>> (verify_gimple_assign_binary): Adjust call for verify_gimple_comparison. >>>> (verify_gimple_cond): Likewise. >>>> * tree-ssa-forwprop.c (combine_cond_expr_cond): Do not perform >>>> combining for non-compatible vector types. >>>> * tree-vect-loop.c (is_valid_sink): New function. >>>> (optimize_mask_stores): Likewise. >>>> * tree-vect-stmts.c (vectorizable_mask_load_store): Initialize >>>> has_mask_store field of vect_info. >>>> * tree-vectorizer.c (vectorize_loops): Invoke optimaze_mask_stores for >>>> vectorized loops having masked stores. >>>> * tree-vectorizer.h (loop_vec_info): Add new has_mask_store field and >>>> correspondent macros. >>>> (optimize_mask_stores): Add prototype. >>>> * tree-vrp.c (register_edge_assert_for): Do not handle NAME with vector >>>> type. >>>> >>>> gcc/testsuite/ChangeLog: >>>> * gcc.target/i386/avx2-vect-mask-store-move1.c: New test.
PR68542.patch2
Description: Binary data