On Tue, Jul 4, 2017 at 2:06 PM, Bin.Cheng <amker.ch...@gmail.com> wrote: > On Tue, Jul 4, 2017 at 12:19 PM, Richard Biener > <richard.guent...@gmail.com> wrote: >> On Mon, Jul 3, 2017 at 4:17 PM, Bin.Cheng <amker.ch...@gmail.com> wrote: >>> On Mon, Jul 3, 2017 at 10:38 AM, Richard Biener >>> <richard.guent...@gmail.com> wrote: >>>> On Tue, Jun 27, 2017 at 12:49 PM, Bin Cheng <bin.ch...@arm.com> wrote: >>>>> Hi, >>>>> For the moment, tree-predcom.c only supports >>>>> invariant/load-loads/store-loads chains. >>>>> This patch generalizes dead store elimination (or store motion) across >>>>> loop iterations in >>>>> predictive commoning pass by supporting store-store chain. As comment in >>>>> the patch: >>>>> >>>>> Apart from predictive commoning on Load-Load and Store-Load chains, we >>>>> also support Store-Store chains -- stores killed by other store can be >>>>> eliminated. Given below example: >>>>> >>>>> for (i = 0; i < n; i++) >>>>> { >>>>> a[i] = 1; >>>>> a[i+2] = 2; >>>>> } >>>>> >>>>> It can be replaced with: >>>>> >>>>> t0 = a[0]; >>>>> t1 = a[1]; >>>>> for (i = 0; i < n; i++) >>>>> { >>>>> a[i] = 1; >>>>> t2 = 2; >>>>> t0 = t1; >>>>> t1 = t2; >>>>> } >>>>> a[n] = t0; >>>>> a[n+1] = t1; >>>>> >>>>> If the loop runs more than 1 iterations, it can be further simplified >>>>> into: >>>>> >>>>> for (i = 0; i < n; i++) >>>>> { >>>>> a[i] = 1; >>>>> } >>>>> a[n] = 2; >>>>> a[n+1] = 2; >>>>> >>>>> The interesting part is this can be viewed either as general store >>>>> motion >>>>> or general dead store elimination in either intra/inter-iterations way. >>>>> >>>>> There are number of interesting facts about this enhancement: >>>>> a) This patch supports dead store elimination for both across-iteration >>>>> case and single-iteration >>>>> case. For the latter, it is dead store elimination. >>>>> b) There are advantages supporting dead store elimination in predcom, for >>>>> example, it has >>>>> complete information about memory address. On the contrary, DSE >>>>> pass can only handle >>>>> memory references with exact the same memory address expression. >>>>> c) It's cheap to support store-stores chain in predcom based on existing >>>>> code. >>>>> d) As commented, the enhancement can be viewed as either generalized dead >>>>> store elimination >>>>> or generalized store motion. I prefer DSE here. >>>>> >>>>> Bootstrap(O2/O3) in patch series on x86_64 and AArch64. Is it OK? >>>> >>>> Looks mostly ok. I have a few questions though. >>>> >>>> + /* Don't do store elimination if loop has multiple exit edges. */ >>>> + bool eliminate_store_p = single_exit (loop) != NULL; >>>> >>>> handling this would be an enhancement? IIRC LIM store-motion handles this >>>> just fine by emitting code on all exits. >>> It is an enhancement with a little bit more complication. We would >>> need to setup/record finalizer memory references for different exit >>> edges. I added TODO description for this (and following one). Is it >>> okay to pick up this in the future? >> >> Yes. >> >>>> >>>> @@ -1773,6 +2003,9 @@ determine_unroll_factor (vec<chain_p> chains) >>>> { >>>> if (chain->type == CT_INVARIANT) >>>> continue; >>>> + /* Don't unroll when eliminating stores. */ >>>> + else if (chain->type == CT_STORE_STORE) >>>> + return 1; >>>> >>>> this is a hard exit value so we do not handle the case where another chain >>>> in the loop would want to unroll? (enhancement?) I'd have expected to do >>>> the same as for CT_INVARIANT here. >>> I didn't check what change is needed in case of unrolling. I am not >>> very sure if we should prefer unroll for *load chains or prefer not >>> unroll for store-store chains, because unroll in general increases >>> loop-carried register pressure for store-store chains rather than >>> decreases register pressure for *load chains. >>> I was also thinking if it's possible to restrict unrolling somehow in >>> order to enable predcom at O2. BTW, this is not common, it only >>> happens once in spec2k6 with factor forced to 1. So okay if as it is >>> now? >> >> I think it is ok for now with a TODO added. Please change the comment >> to "we can't handle unrolling when eliminating stores" (it's not clear if we >> can -- did you try? maybe add a testcase that would ICE) >> >>> >>>> >>>> + tree init = ref_at_iteration (dr, (int) 0 - i, &stmts); >>>> + if (!chain->all_always_accessed && tree_could_trap_p (init)) >>>> + { >>>> + gimple_seq_discard (stmts); >>>> + return false; >>>> >>>> so this is the only place that remotely cares for not always performed >>>> stores. >>>> But as-is the patch doesn't seem to avoid speculating stores and thus >>>> violates the C++ memory model, aka, introduces store-data-races? The LIM >>>> store-motion code was fixed to avoid this by keeping track of whether a BB >>>> has executed to guard the stores done in the compensation code on the loop >>>> exit. >>>> >>>> That said, to "fix" this all && tree_could_trap_p cases would need to be >>>> removed >>>> (or similarly flag vars be introduced). Speculating loads that do not >>>> trap is ok >>>> (might only introduce false uninit use reports by tools like valgrind). >>> Hmm, not sure IIUC. Patch updated, is it correct (though conservative)? >> >> +static bool >> +prepare_initializers_chain_store_elim (struct loop *, chain_p chain) >> +{ >> ... >> + tree init = ref_at_iteration (dr, (int) 0 - i, &stmts); >> + if (!chain->all_always_accessed && tree_could_trap_p (init)) >> + { >> >> remove && tree_could_trap_p and hoist the other check. >> >> Same in prepare_finalizers_chain. I think you should check >> >> if (! chain->all_always_accessed >> && ! PARAM_VALUE (PARAM_ALLOW_STORE_DATA_RACES)) >> return false; >> >> at the beginning of both functions and retain >> >> if (! chain->all_always_accessed && tree_could_trap_p ()) >> >> in the loops. >> >> Ok with that change and a testcase that would exercise failure/ICE of >> store chains w/ unrolling. > Hmm, now I remember maybe these > all_always_accessed/trap/data_store_race checks can be simplified. In > function suitable_component_p, we call just_once_each_iteration_p for > all references. So we shouldn't not end up with > chain->all_always_accessed == false cases, right? why means we don't > really need to check at all?
Not sure. We have this check in the existing code already. Please come up with a testcase that we might not DSE-pcom because of store speculation. Richard. > Thanks, > bin >> >> Thanks, >> Richard. >> >>> Thanks, >>> bin >>>> >>>> Thanks, >>>> Richard. >>>> >>>>> Thanks, >>>>> bin >>>>> 2017-06-21 Bin Cheng <bin.ch...@arm.com> >>>>> >>>>> * tree-predcom.c: Revise general description of pass. >>>>> (enum chain_type): New enum type for store elimination. >>>>> (struct chain): New field supporting store elimination. >>>>> (dump_chain): Dump store-stores chain. >>>>> (release_chain): Release resources. >>>>> (split_data_refs_to_components): Compute and create component >>>>> contains only stores for elimination. >>>>> (get_chain_last_ref_at): New function. >>>>> (make_invariant_chain): Initialization. >>>>> (make_rooted_chain): Specify chain type in parameter. >>>>> (add_looparound_copies): Skip for store-stores chain. >>>>> (determine_roots_comp): Compute type of chain and pass it to >>>>> make_rooted_chain. >>>>> (initialize_root_vars_store_elim_2): New function. >>>>> (finalize_eliminated_stores): New function. >>>>> (remove_stmt): Handle store for elimination. >>>>> (execute_pred_commoning_chain): Execute predictive commoning on >>>>> store-store chains. >>>>> (determine_unroll_factor): Skip unroll for store-stores chain. >>>>> (prepare_initializers_chain_store_elim): New function. >>>>> (prepare_initializers_chain): Hanlde store-store chain. >>>>> (prepare_finalizers_chain, prepare_finalizers): New function. >>>>> (tree_predictive_commoning_loop): Return integer value indicating >>>>> if loop is unrolled or lcssa form is corrupted. >>>>> (tree_predictive_commoning): Rewrite for lcssa form if necessary. >>>>> >>>>> gcc/testsuite/ChangeLog >>>>> 2017-06-21 Bin Cheng <bin.ch...@arm.com> >>>>> >>>>> * gcc.dg/tree-ssa/predcom-dse-1.c: New test. >>>>> * gcc.dg/tree-ssa/predcom-dse-2.c: New test. >>>>> * gcc.dg/tree-ssa/predcom-dse-3.c: New test. >>>>> * gcc.dg/tree-ssa/predcom-dse-4.c: New test. >>>>> * gcc.dg/tree-ssa/predcom-dse-5.c: New test. >>>>> * gcc.dg/tree-ssa/predcom-dse-6.c: New test. >>>>> * gcc.dg/tree-ssa/predcom-dse-7.c: New test. >>>>> * gcc.dg/tree-ssa/predcom-dse-8.c: New test. >>>>> * gcc.dg/tree-ssa/predcom-dse-9.c: New test.