On Mon, Jun 14, 2021 at 7:26 PM Stefan Schulze Frielinghaus <stefa...@linux.ibm.com> wrote: > > On Thu, May 20, 2021 at 08:37:24PM +0200, Stefan Schulze Frielinghaus wrote: > [...] > > > but we won't ever arrive here because of the niters condition. But > > > yes, doing the pattern matching in the innermost loop processing code > > > looks good to me - for the specific case it would be > > > > > > /* Don't distribute loop if niters is unknown. */ > > > tree niters = number_of_latch_executions (loop); > > > if (niters == NULL_TREE || niters == chrec_dont_know) > > > ---> here? > > > continue; > > > > Right, please find attached a new version of the patch where everything > > is included in the loop distribution pass. I will do a bootstrap and > > regtest on IBM Z over night. If you give me green light I will also do > > the same on x86_64. > > Meanwhile I gave it a shot on x86_64 where the testsuite runs fine (at > least the ldist-strlen testcase). If you are Ok with the patch, then I > would rebase and run the testsuites again and post a patch series > including the rawmemchr implementation for IBM Z.
@@ -3257,6 +3261,464 @@ find_seed_stmts_for_distribution (class loop *loop, vec<gimple *> *work_list) return work_list->length () > 0; } +static void +generate_rawmemchr_builtin (loop_p loop, tree reduction_var, + data_reference_p store_dr, tree base, tree pattern, + location_t loc) +{ this new function needs a comment. Applies to all of the new ones, btw. + gcc_checking_assert (POINTER_TYPE_P (TREE_TYPE (base)) + && TREE_TYPE (TREE_TYPE (base)) == TREE_TYPE (pattern)); this looks fragile and is probably unnecessary as well. + gcc_checking_assert (TREE_TYPE (reduction_var) == TREE_TYPE (base)); in general you want types_compatible_p () checks which for pointers means all pointers are compatible ... (skipping stuff) @@ -3321,10 +3783,20 @@ loop_distribution::execute (function *fun) && !optimize_loop_for_speed_p (loop))) continue; - /* Don't distribute loop if niters is unknown. */ + /* If niters is unknown don't distribute loop but rather try to transform + it to a call to a builtin. */ tree niters = number_of_latch_executions (loop); if (niters == NULL_TREE || niters == chrec_dont_know) - continue; + { + if (transform_reduction_loop (loop)) + { + changed = true; + loops_to_be_destroyed.safe_push (loop); + if (dump_file) + fprintf (dump_file, "Loop %d transformed into a builtin.\n", loop->num); + } + continue; + } please look at if (nb_generated_loops + nb_generated_calls > 0) { changed = true; if (dump_enabled_p ()) dump_printf_loc (MSG_OPTIMIZED_LOCATIONS, loc, "Loop%s %d distributed: split to %d loops " "and %d library calls.\n", str, loop->num, nb_generated_loops, nb_generated_calls); and follow the use of dump_* and MSG_OPTIMIZED_LOCATIONS so the transforms are reported with -fopt-info-loop + + return transform_reduction_loop_1 (loop, load_dr, store_dr, reduction_var); +} what's the point in tail-calling here and visually splitting the function in half? (sorry for picking random pieces now ;)) + for (gphi_iterator bsi = gsi_start_phis (bb); !gsi_end_p (bsi); + gsi_next (&bsi), ++ninsns) + { this counts debug insns, I guess you want gsi_next_nondebug at least. not sure why you are counting PHIs at all btw - for the loops you match you are expecting at most two, one IV and eventually one for the virtual operand of the store? + if (gimple_has_volatile_ops (phi)) + return false; PHIs never have volatile ops. + if (gimple_clobber_p (phi)) + continue; or are clobbers. Btw, can you factor out a helper from find_single_drs working on a stmt to reduce code duplication? + tree reduction_var; + switch (gimple_code (reduction_stmt)) + { + case GIMPLE_PHI: + reduction_var = gimple_phi_result (reduction_stmt); + break; + case GIMPLE_ASSIGN: + reduction_var = gimple_assign_lhs (reduction_stmt); + break; + default: + /* Bail out e.g. for GIMPLE_CALL. */ + return false; gimple_get_lhs (reduction_stmt); would work for both PHIs and assigns. + if (reduction_var == NULL) + return false; it can never be NULL here. + /* Bail out if this is a bitfield memory reference. */ + if (TREE_CODE (DR_REF (load_dr)) == COMPONENT_REF + && DECL_BIT_FIELD (TREE_OPERAND (DR_REF (load_dr), 1))) + return false; ... I see this is again quite some code copied from find_single_drs, please see how to avoid this much duplication by splitting out helpers. +static bool +transform_reduction_loop_1 (loop_p loop, + data_reference_p load_dr, + data_reference_p store_dr, + tree reduction_var) +{ + tree load_ref = DR_REF (load_dr); + tree load_type = TREE_TYPE (load_ref); + tree load_access_base = build_fold_addr_expr (load_ref); + tree load_access_size = TYPE_SIZE_UNIT (load_type); + affine_iv load_iv, reduction_iv; + tree pattern; + + /* A limitation of the current implementation is that we only support + constant patterns. */ + edge e = single_exit (loop); + gcond *cond_stmt = safe_dyn_cast <gcond *> (last_stmt (e->src)); + if (!cond_stmt) + return false; that looks like checks to be done at the start of transform_reduction_loop, not this late. + if (gimple_cond_code (cond_stmt) != NE_EXPR + || gimple_cond_lhs (cond_stmt) != gimple_assign_lhs (DR_STMT (load_dr)) + || TREE_CODE (pattern) != INTEGER_CST) + return false; half of this as well. Btw, there's no canonicalization for the tests so you have to verify the false edge actually exits the loop and allow for EQ_EXPR in case the false edge does. + /* Handle strlen like loops. */ + if (store_dr == NULL + && integer_zerop (pattern) + && TREE_CODE (reduction_iv.base) == INTEGER_CST + && TREE_CODE (reduction_iv.step) == INTEGER_CST + && integer_onep (reduction_iv.step) + && (types_compatible_p (TREE_TYPE (reduction_var), size_type_node) + || TYPE_OVERFLOW_UNDEFINED (TREE_TYPE (reduction_var)))) + { I wonder what goes wrong with a larger or smaller wrapping IV type? The iteration only stops when you load a NUL and the increments just wrap along (you're using the pointer IVs to compute the strlen result). Can't you simply truncate? For larger than size_type_node (actually larger than ptr_type_node would matter I guess), the argument is that since pointer wrapping would be undefined anyway the IV cannot wrap either. Now, the correct check here would IMHO be TYPE_PRECISION (TREE_TYPE (reduction_var)) < TYPE_PRECISION (ptr_type_node) || TYPE_OVERFLOW_UNDEFINED (TREE_TYPE (pointer-iv-var)) ? + if (TYPE_OVERFLOW_UNDEFINED (TREE_TYPE (reduction_var))) + { + const char *msg = G_("assuming signed overflow does not occur " + "when optimizing strlen like loop"); + fold_overflow_warning (msg, WARN_STRICT_OVERFLOW_MISC); + } no, please don't add any new strict-overflow warnings ;) The generate_*_builtin routines need some factoring - if you code-generate into a gimple_seq you could use gimple_build () which would do the fold_stmt (not sure why you do that - you should see to fold the call, not necessarily the rest). The replacement of reduction_var and the dumping could be shared. There's also GET_MODE_NAME for the printing. I think overall the approach is sound now but the details still need work. Thanks, Richard. > Cheers, > Stefan