On Mon, Aug 13, 2012 at 3:03 PM, Marc Glisse <marc.gli...@inria.fr> wrote: > On Mon, 13 Aug 2012, Richard Guenther wrote: > >> On Sat, Aug 11, 2012 at 2:25 PM, Marc Glisse <marc.gli...@inria.fr> wrote: >>> >>> On Fri, 10 Aug 2012, Marc Glisse wrote: >>> >>>> 1) I am not sure we always want to combine permutations. Indeed, someone >>>> (user? vectorizer?) may have written 2 permutations to help the backend >>>> generate optimal code, whereas it will do a bad job on the complicated >>>> combined permutation. At tree level, I am not sure what can be done >>>> except >>>> possibly restrict the optimization to the case where the combined >>>> permutation is the identity. At the RTL level, we could compare what >>>> insns >>>> are generated, but that's going to be painful (doing anything with >>>> permutations at the RTL level is painful). > > > I guess people will complain soon enough if this causes horrible performance > regressions in vectorized code. > > >>> +/* Return true if VAR has no nondebug use but in stmt. */ >>> +static bool >>> +is_only_used_in (const_tree var, const_gimple stmt) >>> +{ >>> + const ssa_use_operand_t *const ptr0 = &(SSA_NAME_IMM_USE_NODE (var)); >>> + const ssa_use_operand_t *ptr = ptr0->next; >>> + >>> + for (; ptr != ptr0; ptr = ptr->next) >>> + { >>> + const_gimple use_stmt = USE_STMT (ptr); >>> + if (is_gimple_debug (use_stmt)) >>> + continue; >>> + if (use_stmt != stmt) >>> + return false; >>> + } >>> + return true; >>> +} >> >> >> if (!single_imm_use (var, &use, &use_stmt) || use_stmt != stmt) >> >> instead of the above. > > > I don't think that works with statements that use a variable twice. > > >>> +/* Combine two shuffles in a row. Returns 1 if there were any changes >>> + made, 2 if cfg-cleanup needs to run. Else it returns 0. */ >>> + >>> +static int >>> +simplify_permutation (gimple_stmt_iterator *gsi) >>> +{ >>> + gimple stmt = gsi_stmt (*gsi); >>> + gimple def_stmt; >>> + tree op0, op1, op2, op3, mask; >>> + enum tree_code code = gimple_assign_rhs_code (stmt); >>> + enum tree_code code2; >>> + location_t loc = gimple_location (stmt); >>> + >>> + gcc_checking_assert (code == VEC_PERM_EXPR); >>> + >>> + op0 = gimple_assign_rhs1 (stmt); >>> + op1 = gimple_assign_rhs2 (stmt); >>> + op2 = gimple_assign_rhs3 (stmt); >>> + >>> + if (TREE_CODE (op0) != SSA_NAME) >>> + return 0; >>> + >>> + if (TREE_CODE (op2) != VECTOR_CST) >>> + return 0; >>> + >>> + def_stmt = SSA_NAME_DEF_STMT (op0); >>> + if (!def_stmt || !is_gimple_assign (def_stmt) >>> + || !can_propagate_from (def_stmt) >>> + || !is_only_used_in (op0, stmt)) >> >> >> Or rather than the above, simply check >> >> || !has_single_use (op0) >> >> here. > > > Then there was my previous (non-working) patch that used > get_prop_source_stmt. > > >>> + return 0; >>> + >>> + /* Check that it is only used here. We cannot use has_single_use >>> + since the expression is using it twice itself... */ >> >> >> Ah ... so then >> >> || num_imm_uses (op0) != 2 > > > Ah, ok, that's simpler indeed, but there were such dire warnings to never > use that evil function unless absolutely necessary that I didn't dare use > it... Thanks for the permission.
If your new predicate would match more places (can you do a quick search?) then we want it in tree-flow-inline.h instead of in tree-ssa-forwprop.c. But yes, num_imm_uses can be expensive. For now just stick with the above. >>> + code2 = gimple_assign_rhs_code (def_stmt); >>> + >>> + /* Two consecutive shuffles. */ >>> + if (code2 == VEC_PERM_EXPR) >>> + { >>> + op3 = gimple_assign_rhs3 (def_stmt); >>> + if (TREE_CODE (op3) != VECTOR_CST) >>> + return 0; >>> + mask = fold_build3_loc (loc, VEC_PERM_EXPR, TREE_TYPE (op3), >>> + op3, op3, op2); >>> + gcc_assert (TREE_CODE (mask) == VECTOR_CST); >> >> >> which means you can use >> >> mask = fold_ternary (loc, ...); >> >> Ok with that changes. > > > Thanks a lot, I'll do that. > > Next step should be either BIT_FIELD_REF(VEC_PERM_EXPR) or > VEC_PERM_EXPR(CONSTRUCTOR). Is there a good way to determine whether this > kind of combination should be done forwards or backwards (i.e. start from > VEC_PERM_EXPR and look if its single use is in BIT_FIELD_REF, or start from > BIT_FIELD_REF and look if its argument is a VEC_PERM_EXPR only used here? The natural SSA (and forwprop) way is to look for BIT_FIELD_REF and see if the def stmt is a VEC_PERM_EXPR. Richard. > -- > Marc Glisse