Kyrill Tkachov <kyrylo.tkac...@foss.arm.com> writes: > Hi all, > > This patch aims to optimise sequences involving uses of 1.0 / sqrt (a) under > -freciprocal-math and -funsafe-math-optimizations. > In particular consider: > > x = 1.0 / sqrt (a); > r1 = x * x; // same as 1.0 / a > r2 = a * x; // same as sqrt (a) > > If x, r1 and r2 are all used further on in the code, this can be transformed > into: > tmp1 = 1.0 / a > tmp2 = sqrt (a) > tmp3 = tmp1 * tmp2 > x = tmp3 > r1 = tmp1 > r2 = tmp2
Nice optimisation :-) Someone who knows the pass better should review, but: There seems to be an implicit assumption that this is a win even when the r1 and r2 assignments are only conditionally executed. That's probably true, but it might be worth saying explicitly. > +/* Return TRUE if USE_STMT is a multiplication of DEF by A. */ > + > +static inline bool > +is_mult_by (gimple *use_stmt, tree def, tree a) > +{ > + if (gimple_code (use_stmt) == GIMPLE_ASSIGN > + && gimple_assign_rhs_code (use_stmt) == MULT_EXPR) > + { > + tree op0 = gimple_assign_rhs1 (use_stmt); > + tree op1 = gimple_assign_rhs2 (use_stmt); > + > + return (op0 == def && op1 == a) > + || (op0 == a && op1 == def); > + } > + return 0; > +} Seems like is_square_of could now be a light-weight wrapper around this. > @@ -652,6 +669,180 @@ execute_cse_reciprocals_1 (gimple_stmt_iterator > *def_gsi, tree def) > occ_head = NULL; > } > > +/* Transform sequences like > + x = 1.0 / sqrt (a); > + r1 = x * x; > + r2 = a * x; > + into: > + tmp1 = 1.0 / a; > + tmp2 = sqrt (a); > + tmp3 = tmp1 * tmp2; > + x = tmp3; > + r1 = tmp1; > + r2 = tmp2; > + depending on the uses of x, r1, r2. This removes one multiplication and > + allows the sqrt and division operations to execute in parallel. > + DEF_GSI is the gsi of the initial division by sqrt that defines > + DEF (x in the example abovs). */ > + > +static void > +optimize_recip_sqrt (gimple_stmt_iterator *def_gsi, tree def) > +{ > + use_operand_p use_p; > + imm_use_iterator use_iter; > + gimple *stmt = gsi_stmt (*def_gsi); > + tree x = def; > + tree orig_sqrt_ssa_name = gimple_assign_rhs2 (stmt); > + tree div_rhs1 = gimple_assign_rhs1 (stmt); > + > + if (TREE_CODE (orig_sqrt_ssa_name) != SSA_NAME > + || TREE_CODE (div_rhs1) != REAL_CST > + || !real_equal (&TREE_REAL_CST (div_rhs1), &dconst1)) > + return; > + > + gimple *sqrt_stmt = SSA_NAME_DEF_STMT (orig_sqrt_ssa_name); > + if (!is_gimple_call (sqrt_stmt) > + || !gimple_call_lhs (sqrt_stmt)) > + return; > + > + gcall *call = as_a <gcall *> (sqrt_stmt); Very minor, but: gcall *sqrt_stmt = dyn_cast <gcall *> (SSA_NAME_DEF_STMT (orig_sqrt_ssa_name)); if (!sqrt_stmt || !gimple_call_lhs (sqrt_stmt)) return; would avoid the need for the separate as_a<>, and would mean that we only call gimple_call_* on gcalls. > + if (has_other_use) > + { > + /* Using the two temporaries tmp1, tmp2 from above > + the original x is now: > + x = tmp1 * tmp2. */ > + gcc_assert (mult_ssa_name); > + gcc_assert (sqr_ssa_name); > + gimple_stmt_iterator gsi2 = gsi_for_stmt (stmt); > + > + tree new_ssa_name > + = make_temp_ssa_name (TREE_TYPE (a), NULL, "recip_sqrt_transformed"); > + gimple *new_stmt > + = gimple_build_assign (new_ssa_name, MULT_EXPR, > + mult_ssa_name, sqr_ssa_name); > + gsi_insert_before (&gsi2, new_stmt, GSI_SAME_STMT); > + gcc_assert (gsi_stmt (gsi2) == stmt); > + gimple_assign_set_rhs_from_tree (&gsi2, new_ssa_name); > + fold_stmt (&gsi2); > + update_stmt (stmt); In this case we're replacing the statement in its original position, so there's no real need to use a temporary. It seems better to change the rhs_code, rhs1 and rhs2 of stmt in-place, with the same lhs as before. > @@ -762,6 +953,23 @@ pass_cse_reciprocals::execute (function *fun) > if (optimize_bb_for_size_p (bb)) > continue; Seems unnecessary to skip the new optimisation when optimising for size. Like you say, it saves a multiplication overall. Also: > + if (flag_unsafe_math_optimizations) > + { > + for (gimple_stmt_iterator gsi = gsi_after_labels (bb); > + !gsi_end_p (gsi); > + gsi_next (&gsi)) > + { > + gimple *stmt = gsi_stmt (gsi); > + > + if (gimple_has_lhs (stmt) > + && (def = SINGLE_SSA_TREE_OPERAND (stmt, SSA_OP_DEF)) != NULL > + && FLOAT_TYPE_P (TREE_TYPE (def)) > + && TREE_CODE (def) == SSA_NAME > + && is_gimple_assign (stmt) > + && gimple_assign_rhs_code (stmt) == RDIV_EXPR) > + optimize_recip_sqrt (&gsi, def); > + } > + } It looks like this could safely be done in one of the existing walks (e.g. the execute_cse_reciprocals_1 one, if we do this when optimising for size). Thanks, Richard