On Wed, Jun 25, 2014 at 4:06 PM, Yuri Rumyantsev <ysrum...@gmail.com> wrote: > Hi All, > > We implemented additional support for pragma omp simd in part of > extended if-conversion loops with such pragma. These extensions > include: > > 1. All extensions are performed only if considered loop or its outer > loop was marked with pragma omp simd (force_vectorize); For ordinary > loops behavior was not changed. > 2. Took off cfg restriction on basic block which can have more than 2 > predecessors. > 3. Put additional restriction on phi nodes which was missed in current design: > all phi nodes must be in non-predicated basic block to conform > semantic of COND_EXPR which is used for transformation.
How is that so? If the PHI is predicated then its result will be used in a PHI node again and thus we'd create a sequence of COND_EXPRs. No? > 4. Extend predication of phi nodes: phi may have more than 2 arguments > with some limitations: > - for phi nodes which have more than 2 arguments, but only two > arguments are different and one of them has the only occurence, > transformation to single COND_EXPR can be done. > - if phi node has more different arguments and all edge predicates > correspondent to phi-arguments are disjoint, a chain of COND_EXPR > will be generated for it. In current design very simple check is used: > check starting from end that two edges correspondent to neighbor > arguments have common predecessor which is used for further check > with next edge. > These guarantee that phi predication will produce the correct result. Btw, you can think of these extensions as unfactoring a PHI node by inserting forwarder blocks. Thus x = PHI <1(2), 1(3), 2(4)> becomes bb 5: <forwarder-from(2)-and(3)> x = PHI <1(5), 2(4)> and x = PHI <1(2), 2(3), 3(4)> becomes bb 5: x' = PHI <1(2), 2(3)> b = PHI<x'(5), 3(4)> which means that 3) has to work. Note that we want this kind of PHI transforms for out-of-SSA as well to reduce the number of copies we need to insert on edges. Thus it would be nice if you implemented 4) in terms of a pre-pass over the force_vect loops PHI nodes, applying that CFG transform. And make 3) work properly if it doesn't already. It looks like you introduce a "negate predicate" to work around the critical edge limitation? Please instead change if-conversion to work with edge predicates (as opposed to BB predicates). Thanks, Richard. > > Here is example of such extended predication (compile with -march=core-avx2): > #pragma omp simd safelen(8) > for (i=0; i<512; i++) > { > float t = a[i]; > if (t > 0 & t < 1.0e+17f) > if (c[i] != 0) > res += 1; > } > <bb 4>: > # res_15 = PHI <res_1(5), 0(3)> > # i_16 = PHI <i_11(5), 0(3)> > # ivtmp_17 = PHI <ivtmp_14(5), 512(3)> > t_5 = a[i_16]; > _6 = t_5 > 0.0; > _7 = t_5 < 9.9999998430674944e+16; > _8 = _7 & _6; > _ifc__28 = (unsigned int) _8; > _10 = &c[i_16]; > _ifc__36 = _ifc__28 != 0 ? 4294967295 : 0; > _9 = MASK_LOAD (_10, 0B, _ifc__36); > _ifc__29 = _ifc__28 != 0 ? 1 : 0; > _ifc__30 = (int) _ifc__29; > _ifc__31 = _9 != 0 ? _ifc__30 : 0; > _ifc__32 = _ifc__28 != 0 ? 1 : 0; > _ifc__33 = (int) _ifc__32; > _ifc__34 = _9 == 0 ? _ifc__33 : 0; > _ifc__35 = _ifc__31 != 0 ? 1 : 0; > res_1 = res_15 + _ifc__35; > i_11 = i_16 + 1; > ivtmp_14 = ivtmp_17 - 1; > if (ivtmp_14 != 0) > goto <bb 4>; > > Bootstrap and regression testing did not show any new failures. > > gcc/ChageLog > > 2014-06-25 Yuri Rumyantsev <ysrum...@gmail.com> > > * tree-if-conv.c (flag_force_vectorize): New variable. > (struct bb_predicate_s): Add negate_predicate field. > (bb_negate_predicate): New function. > (set_bb_negate_predicate): New function. > (bb_copy_predicate): New function. > (add_stmt_to_bb_predicate_gimplified_stmts): New function. > (init_bb_predicate): Add initialization of negate_predicate field. > (reset_bb_predicate): Reset negate_predicate to NULL_TREE. > (convert_name_to_cmp): New function. > (get_type_for_cond): New function. > (convert_bool_predicate): New function. > (predicate_disjunction): New function. > (predicate_conjunction): New function. > (add_to_predicate_list): Add convert_bool argument. > Add call of predicate_disjunction if convert_bool argument is true. > (add_to_dst_predicate_list): Add convert_bool argument. > Add early function exit if edge target block is always executed. > Add call of predicate_conjunction if convert_bool argument is true. > Pass convert_bool argument for add_to_predicate_list. > (equal_phi_args): New function. > (phi_has_two_different_args): New function. > (phi_args_disjoint): New function. > (if_convertible_phi_p): Accept phi nodes with more than two args > for loops marked with pragma omp simd. Add check that phi nodes are > in non-predicated basic blocks. > (ifcvt_can_use_mask_load_store): Use flag_force_vectorize. > (all_edges_are_critical): New function. > (if_convertible_bb_p): Allow bb has more than two predecessors if > flag_force_vectorize was setup. Use call of all_edges_are_critical > to reject block if-conversion with imcoming critical edges only if > flag_force_vectorize was not setup. > (walk_cond_tree): New function. > (vect_bool_pattern_is_applicable): New function. > (predicate_bbs): Add convert_bool argument that is used to transform > comparison expressions of boolean type into conditional expressions > with integral operands. If bool_conv argument is false or both > outgoing edges are not critical old algorithm of predicate assignments > is used, otherwise the following code was added: check on applicable > of vect-bool-pattern recognition and trnasformation of > (bool) x != 0 --> y = (int) x; x != 0; > compute predicates for both outgoing edges one of which is critical > one using 'normal' edge, i.e. compute true and false predicates using > normal outgoing edge only; evaluated predicates are stored in > predicate and negate_predicate fields of struct bb_predicate_s and > negate_predicate of normal edge conatins predicate of critical edge, > but generated gimplified statements are stored in their destination > block fields. Additional argument 'convert_bool" is passed to > add_to_dst_predicate_list and add_to_predicate_list. > (if_convertible_loop_p_1): Call predicate_bbs with additional argument > equal to false. > (find_phi_replacement_condition): Extend function interface: > it returns NULL if given phi node must be handled by means of > extended phi node predication. If number of predecessors of phi-block > is equal 2 and atleast one incoming edge is not critical original > algorithm is used. > (is_cond_scalar_reduction): Add 'extended' argument which signals that > both phi arguments must be evaluated through phi_has_two_different_args. > (predicate_scalar_phi): Add invoŅation of convert_name_to_cmp if cond > is SSA_NAME. Add 'false' argument to call of is_cond_scalar_reduction. > (get_predicate_for_edge): New function. > (find_insertion_point): New function. > (predicate_phi_disjoint_args): New function. > (predicate_extended_scalar_phi): New function. > (predicate_all_scalar_phis): Add code to set-up gimple statement > iterator for predication of extended scalar phi's for insertion. > (insert_gimplified_predicates): Add test for non-predicated basic > blocks that there are no gimplified statements to insert. Insert > predicates at the block begining for extended if-conversion. > (predicate_mem_writes): Invoke convert_name_to_cmp for extended > predication to build mask. > (combine_blocks): Pass flag_force_vectorize to predicate_bbs. > (split_crit_edge): New function. > (tree_if_conversion): Initialize flag_force_vectorize from current > loop or outer loop (to support pragma omp declare). Invoke > split_crit_edge for extended predication. Do loop versioning for > innermost loop marked with pragma omp simd.