on 2019/3/11 下午6:24, Richard Biener wrote: > On Mon, 11 Mar 2019, Kewen.Lin wrote: > >> on 2019/3/8 下午6:57, Richard Biener wrote: >>> On Fri, Mar 8, 2019 at 2:40 AM Kewen.Lin <li...@linux.ibm.com> wrote: >>>> >>>> Hi, >>>> >>>> As PR88497 (https://gcc.gnu.org/bugzilla/show_bug.cgi?id=88497), >>>> when we meet some code pattern like: >>>> >>>> V1[0] + V1[1] + ... + V1[k] + V2[0] + ... + V2[k] + ... Vn[k] >>>> // V1...Vn of VECTOR_TYPE >>>> >>>> We can teach reassoc to transform it to: >>>> >>>> Vs = (V1 + V2 + ... + Vn) >>>> Vs[0] + Vs[1] + ... + Vs[k] >>>> >>>> It saves addition and bit_field_ref operations and exposes more >>>> opportunities for downstream passes, I notice that even on one >>>> target doesn't support vector type and vector type gets expanded >>>> in veclower, it's still better to have it, since the generated >>>> sequence is more friendly for widening_mul. (If one more time >>>> DCE after forwprop, it should be the same. ) >>>> >>>> Bootstrapped/regtested on powerpc64le-linux-gnu, ok for trunk? >>> >>> Hmm. You are basically vectorizing the reduction partially here (note >>> basic-block vectorization doesn't handle reductions yet). So I'm not sure >>> whether we should eventually just change op sort order to sort >>> v1[1] after v2[0] to sort v1[0] and v2[0] together. That would be also >>> maybe >>> an intermediate step that makes implementing the vectorization part >>> "easier"? I also imagine that the "vectorization" would be profitable even >>> if there's just two elements of the vectors summed and the real vectors are >>> larger. >>> >> >> Hi Richard, >> >> Sorry, I have no idea about basic-block vectorization (SLP?), did you >> suggest >> supporting this there rather than in reassoc? Changing op sort order for >> its expected pattern? > > I was thinking you're smashing two things together here - one part > suitable for reassocation and one that's on the border. The reassoc > pass already gathered quite some stuff that doesn't really belong there, > we should at least think twice adding to that. >
Got it! Since the discussion context of the PR is mainly about reassocation, I started to look from it. :) >>> Note that the code you transform contains no vector operations (just >>> element extracts) and thus you need to check for target support before >>> creating vector add. >>> >> >> I had the same concern before. But I thought that if this VECTOR type is >> valid >> on the target, the addition operation should be fundamental on the target, >> then >> it's ok; if it's invalid, then veclower will expand it into scalar type as >> well >> as the addition operation. So it looks fine to guard it. Does it make sense? > > But there's a reassoc phase after veclower. Generally we avoid generating > vector code not supported by the target. You are not checking whether > the vector type is valid on the target either btw. The user might have > written an optimal scalar sequence for a non-natively supported vector > type (and veclower wouldn't touch the original scalar code) and veclower > generally makes a mess of things, likely worse than the original code. > Thanks for your explanation! I'll update the code to check vector supported by target or not. >> >>> This is for GCC 10. Also it doesn't fix PR88497, does it? >>> >> >> Yes, it's in low priority and for GCC10. I think it does. Did I miss >> something? > > Wasn't the original testcase in the PR for _vectorized_ code? The > PR then got an additional testcase which you indeed fix. > The original case is actually optimal with -Ofast, but additional case isn't. :) >> For comment 1 and comment 7 cases, trunk gcc generates the expected code >> with -Ofast. There isn't any issues for the original loop. But for the >> expanded code in comment 1 (manually expanded case), gcc can't generate >> better code with multiply-add. >> >> From comment 9 case (same as comment 1 expanded version): >> >> without this patch, optimized tree and final asm: >> >> <bb 2> [local count: 1073741824]: >> _1 = *arg2_26(D); >> _2 = *arg3_27(D); >> _3 = _1 * _2; >> _4 = BIT_FIELD_REF <_3, 64, 0>; >> _5 = BIT_FIELD_REF <_3, 64, 64>; >> _18 = _4 + _5; >> _7 = MEM[(__vector double *)arg2_26(D) + 16B]; >> _8 = MEM[(__vector double *)arg3_27(D) + 16B]; >> _9 = _7 * _8; >> _10 = BIT_FIELD_REF <_9, 64, 0>; >> _11 = BIT_FIELD_REF <_9, 64, 64>; >> _31 = _10 + _11; >> _29 = _18 + _31; >> _13 = MEM[(__vector double *)arg2_26(D) + 32B]; >> _14 = MEM[(__vector double *)arg3_27(D) + 32B]; >> _15 = _13 * _14; >> _16 = BIT_FIELD_REF <_15, 64, 0>; >> _17 = BIT_FIELD_REF <_15, 64, 64>; >> _6 = _16 + _17; >> _12 = _6 + accumulator_28(D); >> _37 = _12 + _29; >> _19 = MEM[(__vector double *)arg2_26(D) + 48B]; >> _20 = MEM[(__vector double *)arg3_27(D) + 48B]; >> _21 = _19 * _20; >> _22 = BIT_FIELD_REF <_21, 64, 0>; >> _23 = BIT_FIELD_REF <_21, 64, 64>; >> _24 = _22 + _23; >> accumulator_32 = _24 + _37; >> return accumulator_32; >> >> 0000000000000000 <foo>: >> 0: 01 00 e5 f4 lxv vs7,0(r5) >> 4: 11 00 05 f4 lxv vs0,16(r5) >> 8: 21 00 24 f5 lxv vs9,32(r4) >> c: 21 00 c5 f4 lxv vs6,32(r5) >> 10: 01 00 44 f5 lxv vs10,0(r4) >> 14: 11 00 64 f5 lxv vs11,16(r4) >> 18: 31 00 05 f5 lxv vs8,48(r5) >> 1c: 31 00 84 f5 lxv vs12,48(r4) >> 20: 80 33 29 f1 xvmuldp vs9,vs9,vs6 >> 24: 80 3b 4a f1 xvmuldp vs10,vs10,vs7 >> 28: 80 03 6b f1 xvmuldp vs11,vs11,vs0 >> 2c: 80 43 8c f1 xvmuldp vs12,vs12,vs8 >> 30: 50 4b 09 f1 xxspltd vs8,vs9,1 >> 34: 50 5b eb f0 xxspltd vs7,vs11,1 >> 38: 50 53 0a f0 xxspltd vs0,vs10,1 >> 3c: 2a 48 28 fd fadd f9,f8,f9 >> 40: 2a 58 67 fd fadd f11,f7,f11 >> 44: 2a 50 00 fc fadd f0,f0,f10 >> 48: 50 63 4c f1 xxspltd vs10,vs12,1 >> 4c: 2a 60 8a fd fadd f12,f10,f12 >> 50: 2a 08 29 fd fadd f9,f9,f1 >> 54: 2a 58 00 fc fadd f0,f0,f11 >> 58: 2a 48 20 fc fadd f1,f0,f9 >> 5c: 2a 08 2c fc fadd f1,f12,f1 >> 60: 20 00 80 4e blr >> >> >> with this patch, optimized tree and final asm: >> >> _1 = *arg2_26(D); >> _2 = *arg3_27(D); >> _7 = MEM[(__vector double *)arg2_26(D) + 16B]; >> _8 = MEM[(__vector double *)arg3_27(D) + 16B]; >> _9 = _7 * _8; >> _5 = .FMA (_1, _2, _9); >> _13 = MEM[(__vector double *)arg2_26(D) + 32B]; >> _14 = MEM[(__vector double *)arg3_27(D) + 32B]; >> _19 = MEM[(__vector double *)arg2_26(D) + 48B]; >> _20 = MEM[(__vector double *)arg3_27(D) + 48B]; >> _21 = _19 * _20; >> _10 = .FMA (_13, _14, _21); >> _41 = _5 + _10; >> _43 = BIT_FIELD_REF <_41, 64, 64>; >> _42 = BIT_FIELD_REF <_41, 64, 0>; >> _44 = _42 + _43; >> accumulator_32 = accumulator_28(D) + _44; >> return accumulator_32; >> >> 0000000000000000 <foo>: >> 0: 11 00 04 f4 lxv vs0,16(r4) >> 4: 11 00 c5 f4 lxv vs6,16(r5) >> 8: 31 00 44 f5 lxv vs10,48(r4) >> c: 31 00 e5 f4 lxv vs7,48(r5) >> 10: 01 00 84 f5 lxv vs12,0(r4) >> 14: 01 00 05 f5 lxv vs8,0(r5) >> 18: 21 00 64 f5 lxv vs11,32(r4) >> 1c: 21 00 25 f5 lxv vs9,32(r5) >> 20: 80 33 00 f0 xvmuldp vs0,vs0,vs6 >> 24: 80 3b 4a f1 xvmuldp vs10,vs10,vs7 >> 28: 08 43 0c f0 xvmaddadp vs0,vs12,vs8 >> 2c: 90 54 8a f1 xxlor vs12,vs10,vs10 >> 30: 08 4b 8b f1 xvmaddadp vs12,vs11,vs9 >> 34: 00 63 00 f0 xvadddp vs0,vs0,vs12 >> 38: 50 03 80 f1 xxspltd vs12,vs0,1 >> 3c: 2a 00 0c fc fadd f0,f12,f0 >> 40: 2a 08 20 fc fadd f1,f0,f1 >> 44: 20 00 80 4e blr > > OK, I see. > > Btw, your code should equally well work for multiplications and > bit operations, no? So it would be nice to extend it for those. > Good point! I'll extend it. > Please add the appropriate target support checks for the vector > operations. > OK. Will fix it. > Richard. > >>> Richard. >>> >>>> Thanks in advance! >>>> >>>> >>>> gcc/ChangeLog >>>> >>>> 2019-03-08 Kewen Lin <li...@gcc.gnu.org> >>>> >>>> PR target/88497 >>>> * tree-ssa-reassoc.c (reassociate_bb): Swap the positions of >>>> GIMPLE_BINARY_RHS check and gimple_visited_p check, call new >>>> function undistribute_bitref_for_vector. >>>> (undistribute_bitref_for_vector): New function. >>>> (cleanup_vinfo_map): Likewise. >>>> (unsigned_cmp): Likewise. >>>> >>>> gcc/testsuite/ChangeLog >>>> >>>> 2019-03-08 Kewen Lin <li...@gcc.gnu.org> >>>> >>>> * gcc.dg/tree-ssa/pr88497.c: New test. >>>> >>>> --- >>>> gcc/testsuite/gcc.dg/tree-ssa/pr88497.c | 18 +++ >>>> gcc/tree-ssa-reassoc.c | 274 >>>> +++++++++++++++++++++++++++++++- >>>> 2 files changed, 287 insertions(+), 5 deletions(-) >>>> create mode 100644 gcc/testsuite/gcc.dg/tree-ssa/pr88497.c >>>> >>>> diff --git a/gcc/testsuite/gcc.dg/tree-ssa/pr88497.c >>>> b/gcc/testsuite/gcc.dg/tree-ssa/pr88497.c >>>> new file mode 100644 >>>> index 0000000..4d9ac67 >>>> --- /dev/null >>>> +++ b/gcc/testsuite/gcc.dg/tree-ssa/pr88497.c >>>> @@ -0,0 +1,18 @@ >>>> +/* { dg-do compile } */ >>>> +/* { dg-options "-O2 -ffast-math -fdump-tree-reassoc1" } */ >>>> +typedef double v2df __attribute__ ((vector_size (16))); >>>> +double >>>> +test (double accumulator, v2df arg1[], v2df arg2[]) >>>> +{ >>>> + v2df temp; >>>> + temp = arg1[0] * arg2[0]; >>>> + accumulator += temp[0] + temp[1]; >>>> + temp = arg1[1] * arg2[1]; >>>> + accumulator += temp[0] + temp[1]; >>>> + temp = arg1[2] * arg2[2]; >>>> + accumulator += temp[0] + temp[1]; >>>> + temp = arg1[3] * arg2[3]; >>>> + accumulator += temp[0] + temp[1]; >>>> + return accumulator; >>>> +} >>>> +/* { dg-final { scan-tree-dump-times "BIT_FIELD_REF" 2 "reassoc1" } } */ >>>> diff --git a/gcc/tree-ssa-reassoc.c b/gcc/tree-ssa-reassoc.c >>>> index e1c4dfe..fc0e297 100644 >>>> --- a/gcc/tree-ssa-reassoc.c >>>> +++ b/gcc/tree-ssa-reassoc.c >>>> @@ -1772,6 +1772,263 @@ undistribute_ops_list (enum tree_code opcode, >>>> return changed; >>>> } >>>> >>>> +/* Hold the information of one specific VECTOR_TYPE SSA_NAME. >>>> + - offsets: for different BIT_FIELD_REF offsets accessing same VECTOR. >>>> + - ops_indexes: the index of vec ops* for each relavant BIT_FIELD_REF. >>>> */ >>>> +struct v_info >>>> +{ >>>> + auto_vec<unsigned HOST_WIDE_INT, 32> offsets; >>>> + auto_vec<unsigned, 32> ops_indexes; >>>> +}; >>>> + >>>> +typedef struct v_info *v_info_ptr; >>>> + >>>> +/* Comparison function for qsort on unsigned BIT_FIELD_REF offsets. */ >>>> +static int >>>> +unsigned_cmp (const void *p_i, const void *p_j) >>>> +{ >>>> + if (*(const unsigned *) p_i >= *(const unsigned *) p_j) >>>> + return 1; >>>> + else >>>> + return -1; >>>> +} >>>> + >>>> +/* Cleanup hash map for VECTOR information. */ >>>> +static void >>>> +cleanup_vinfo_map (hash_map<tree, v_info_ptr> &info_map) >>>> +{ >>>> + for (hash_map<tree, v_info_ptr>::iterator it = info_map.begin (); >>>> + it != info_map.end (); ++it) >>>> + { >>>> + v_info_ptr info = (*it).second; >>>> + delete info; >>>> + (*it).second = NULL; >>>> + } >>>> +} >>>> + >>>> +/* Perform un-distribution of BIT_FIELD_REF on VECTOR_TYPE. >>>> + V1[0] + V1[1] + ... + V1[k] + V2[0] + V2[1] + ... + V2[k] + ... Vn[k] >>>> + is transformed to >>>> + Vs = (V1 + V2 + ... + Vn) >>>> + Vs[0] + Vs[1] + ... + Vs[k] >>>> + >>>> + The basic steps are listed below: >>>> + >>>> + 1) Check the addition chain *OPS by looking those summands coming from >>>> + VECTOR bit_field_ref on VECTOR type. Put the information into >>>> + v_info_map for each satisfied summand, using VECTOR SSA_NAME as >>>> key. >>>> + >>>> + 2) For each key (VECTOR SSA_NAME), validate all its BIT_FIELD_REFs are >>>> + continous, they can cover the whole VECTOR perfectly without any >>>> holes. >>>> + Obtain one VECTOR list which contain candidates to be transformed. >>>> + >>>> + 3) Build the addition statements for all VECTOR candidates, generate >>>> + BIT_FIELD_REFs accordingly. >>>> + >>>> + TODO: Now the implementation restrict all candidate VECTORs should >>>> have the >>>> + same VECTOR type, it can be extended into different groups by VECTOR >>>> types >>>> + in future if any profitable cases found. */ >>>> +static bool >>>> +undistribute_bitref_for_vector (enum tree_code opcode, vec<operand_entry >>>> *> *ops, >>>> + struct loop *loop) >>>> +{ >>>> + if (ops->length () <= 1 || opcode != PLUS_EXPR) >>>> + return false; >>>> + >>>> + hash_map<tree, v_info_ptr> v_info_map; >>>> + operand_entry *oe1; >>>> + unsigned i; >>>> + >>>> + /* Find those summands from VECTOR BIT_FIELD_REF in addition chain, put >>>> the >>>> + information into map. */ >>>> + FOR_EACH_VEC_ELT (*ops, i, oe1) >>>> + { >>>> + enum tree_code dcode; >>>> + gimple *oe1def; >>>> + >>>> + if (TREE_CODE (oe1->op) != SSA_NAME) >>>> + continue; >>>> + oe1def = SSA_NAME_DEF_STMT (oe1->op); >>>> + if (!is_gimple_assign (oe1def)) >>>> + continue; >>>> + dcode = gimple_assign_rhs_code (oe1def); >>>> + if (dcode != BIT_FIELD_REF || !is_reassociable_op (oe1def, dcode, >>>> loop)) >>>> + continue; >>>> + >>>> + tree rhs = gimple_op (oe1def, 1); >>>> + tree op0 = TREE_OPERAND (rhs, 0); >>>> + >>>> + if (TREE_CODE (op0) != SSA_NAME >>>> + || TREE_CODE (TREE_TYPE (op0)) != VECTOR_TYPE) >>>> + continue; >>>> + >>>> + tree op1 = TREE_OPERAND (rhs, 1); >>>> + tree op2 = TREE_OPERAND (rhs, 2); >>>> + >>>> + tree elem_type = TREE_TYPE (TREE_TYPE (op0)); >>>> + unsigned HOST_WIDE_INT size = TREE_INT_CST_LOW (TYPE_SIZE >>>> (elem_type)); >>>> + if (size != TREE_INT_CST_LOW (op1)) >>>> + continue; >>>> + >>>> + v_info_ptr *info_ptr = v_info_map.get (op0); >>>> + if (info_ptr) >>>> + { >>>> + v_info_ptr info = *info_ptr; >>>> + info->offsets.safe_push (TREE_INT_CST_LOW (op2)); >>>> + info->ops_indexes.safe_push (i); >>>> + } >>>> + else >>>> + { >>>> + v_info_ptr info = new v_info; >>>> + info->offsets.safe_push (TREE_INT_CST_LOW (op2)); >>>> + info->ops_indexes.safe_push (i); >>>> + v_info_map.put (op0, info); >>>> + } >>>> + } >>>> + >>>> + /* At least two VECTOR to combine. */ >>>> + if (v_info_map.elements () <= 1) >>>> + { >>>> + cleanup_vinfo_map (v_info_map); >>>> + return false; >>>> + } >>>> + >>>> + /* Use the first VECTOR and its information as the reference. >>>> + Firstly, we should validate it, that is: >>>> + 1) sorted offsets are adjacent, no holes. >>>> + 2) can fill the whole VECTOR perfectly. */ >>>> + hash_map<tree, v_info_ptr>::iterator it = v_info_map.begin (); >>>> + tree ref_vec = (*it).first; >>>> + v_info_ptr ref_info = (*it).second; >>>> + ref_info->offsets.qsort (unsigned_cmp); >>>> + tree elem_type = TREE_TYPE (TREE_TYPE (ref_vec)); >>>> + unsigned HOST_WIDE_INT elem_size = TREE_INT_CST_LOW (TYPE_SIZE >>>> (elem_type)); >>>> + unsigned HOST_WIDE_INT curr; >>>> + unsigned HOST_WIDE_INT prev = ref_info->offsets[0]; >>>> + >>>> + FOR_EACH_VEC_ELT_FROM (ref_info->offsets, i, curr, 1) >>>> + { >>>> + /* Continous check. */ >>>> + if (curr != (prev + elem_size)) >>>> + { >>>> + cleanup_vinfo_map (v_info_map); >>>> + return false; >>>> + } >>>> + prev = curr; >>>> + } >>>> + >>>> + /* Check whether fill the whole. */ >>>> + if ((prev + elem_size) != TREE_INT_CST_LOW (TYPE_SIZE (TREE_TYPE >>>> (ref_vec)))) >>>> + { >>>> + cleanup_vinfo_map (v_info_map); >>>> + return false; >>>> + } >>>> + >>>> + auto_vec<tree> vectors (v_info_map.elements ()); >>>> + vectors.quick_push (ref_vec); >>>> + >>>> + /* Use the ref_vec to filter others. */ >>>> + for (++it; it != v_info_map.end (); ++it) >>>> + { >>>> + tree vec = (*it).first; >>>> + v_info_ptr info = (*it).second; >>>> + if (TREE_TYPE (ref_vec) != TREE_TYPE (vec)) >>>> + continue; >>>> + if (ref_info->offsets.length () != info->offsets.length ()) >>>> + continue; >>>> + bool same_offset = true; >>>> + info->offsets.qsort (unsigned_cmp); >>>> + for (unsigned i = 0; i < ref_info->offsets.length (); i++) >>>> + { >>>> + if (ref_info->offsets[i] != info->offsets[i]) >>>> + { >>>> + same_offset = false; >>>> + break; >>>> + } >>>> + } >>>> + if (!same_offset) >>>> + continue; >>>> + vectors.quick_push (vec); >>>> + } >>>> + >>>> + if (vectors.length () < 2) >>>> + { >>>> + cleanup_vinfo_map (v_info_map); >>>> + return false; >>>> + } >>>> + >>>> + tree tr; >>>> + if (dump_file && (dump_flags & TDF_DETAILS)) >>>> + { >>>> + fprintf (dump_file, "The bit_field_ref vector list for summation: >>>> "); >>>> + FOR_EACH_VEC_ELT (vectors, i, tr) >>>> + { >>>> + print_generic_expr (dump_file, tr); >>>> + fprintf (dump_file, " "); >>>> + } >>>> + fprintf (dump_file, "\n"); >>>> + } >>>> + >>>> + /* Build the sum for all candidate VECTORs. */ >>>> + unsigned idx; >>>> + gimple *sum = NULL; >>>> + v_info_ptr info; >>>> + tree sum_vec = ref_vec; >>>> + FOR_EACH_VEC_ELT_FROM (vectors, i, tr, 1) >>>> + { >>>> + sum = build_and_add_sum (TREE_TYPE (ref_vec), sum_vec, tr, opcode); >>>> + info = *(v_info_map.get (tr)); >>>> + unsigned j; >>>> + FOR_EACH_VEC_ELT (info->ops_indexes, j, idx) >>>> + { >>>> + gimple *def = SSA_NAME_DEF_STMT ((*ops)[idx]->op); >>>> + gimple_set_visited (def, true); >>>> + (*ops)[idx]->op = build_zero_cst (TREE_TYPE ((*ops)[idx]->op)); >>>> + (*ops)[idx]->rank = 0; >>>> + } >>>> + sum_vec = gimple_get_lhs (sum); >>>> + if (dump_file && (dump_flags & TDF_DETAILS)) >>>> + { >>>> + fprintf (dump_file, "Generating addition -> "); >>>> + print_gimple_stmt (dump_file, sum, 0); >>>> + } >>>> + } >>>> + >>>> + /* Referring to any good shape vector (here using ref_vec), generate the >>>> + BIT_FIELD_REF statements accordingly. */ >>>> + info = *(v_info_map.get (ref_vec)); >>>> + gcc_assert (sum); >>>> + FOR_EACH_VEC_ELT (info->ops_indexes, i, idx) >>>> + { >>>> + tree dst = make_ssa_name (elem_type); >>>> + gimple *gs >>>> + = gimple_build_assign (dst, BIT_FIELD_REF, >>>> + build3 (BIT_FIELD_REF, elem_type, sum_vec, >>>> + TYPE_SIZE (elem_type), >>>> + bitsize_int (info->offsets[i]))); >>>> + insert_stmt_after (gs, sum); >>>> + update_stmt (gs); >>>> + gimple *def = SSA_NAME_DEF_STMT ((*ops)[idx]->op); >>>> + gimple_set_visited (def, true); >>>> + (*ops)[idx]->op = gimple_assign_lhs (gs); >>>> + (*ops)[idx]->rank = get_rank ((*ops)[idx]->op); >>>> + if (dump_file && (dump_flags & TDF_DETAILS)) >>>> + { >>>> + fprintf (dump_file, "Generating bit_field_ref -> "); >>>> + print_gimple_stmt (dump_file, gs, 0); >>>> + } >>>> + } >>>> + >>>> + if (dump_file && (dump_flags & TDF_DETAILS)) >>>> + { >>>> + fprintf (dump_file, "undistributiong bit_field_ref for vector >>>> done.\n"); >>>> + } >>>> + >>>> + cleanup_vinfo_map (v_info_map); >>>> + >>>> + return true; >>>> +} >>>> + >>>> /* If OPCODE is BIT_IOR_EXPR or BIT_AND_EXPR and CURR is a comparison >>>> expression, examine the other OPS to see if any of them are comparisons >>>> of the same values, which we may be able to combine or eliminate. >>>> @@ -5880,11 +6137,6 @@ reassociate_bb (basic_block bb) >>>> tree lhs, rhs1, rhs2; >>>> enum tree_code rhs_code = gimple_assign_rhs_code (stmt); >>>> >>>> - /* If this is not a gimple binary expression, there is >>>> - nothing for us to do with it. */ >>>> - if (get_gimple_rhs_class (rhs_code) != GIMPLE_BINARY_RHS) >>>> - continue; >>>> - >>>> /* If this was part of an already processed statement, >>>> we don't need to touch it again. */ >>>> if (gimple_visited_p (stmt)) >>>> @@ -5911,6 +6163,11 @@ reassociate_bb (basic_block bb) >>>> continue; >>>> } >>>> >>>> + /* If this is not a gimple binary expression, there is >>>> + nothing for us to do with it. */ >>>> + if (get_gimple_rhs_class (rhs_code) != GIMPLE_BINARY_RHS) >>>> + continue; >>>> + >>>> lhs = gimple_assign_lhs (stmt); >>>> rhs1 = gimple_assign_rhs1 (stmt); >>>> rhs2 = gimple_assign_rhs2 (stmt); >>>> @@ -5950,6 +6207,13 @@ reassociate_bb (basic_block bb) >>>> optimize_ops_list (rhs_code, &ops); >>>> } >>>> >>>> + if (undistribute_bitref_for_vector (rhs_code, &ops, >>>> + loop_containing_stmt >>>> (stmt))) >>>> + { >>>> + ops.qsort (sort_by_operand_rank); >>>> + optimize_ops_list (rhs_code, &ops); >>>> + } >>>> + >>>> if (rhs_code == PLUS_EXPR >>>> && transform_add_to_multiply (&ops)) >>>> ops.qsort (sort_by_operand_rank); >>>> -- >>>> 2.7.4 >>>> >>> >> >> >