On Mon, Jul 3, 2017 at 4:17 PM, Bin.Cheng <amker.ch...@gmail.com> wrote: > On Mon, Jul 3, 2017 at 10:38 AM, Richard Biener > <richard.guent...@gmail.com> wrote: >> On Tue, Jun 27, 2017 at 12:49 PM, Bin Cheng <bin.ch...@arm.com> wrote: >>> Hi, >>> For the moment, tree-predcom.c only supports >>> invariant/load-loads/store-loads chains. >>> This patch generalizes dead store elimination (or store motion) across loop >>> iterations in >>> predictive commoning pass by supporting store-store chain. As comment in >>> the patch: >>> >>> Apart from predictive commoning on Load-Load and Store-Load chains, we >>> also support Store-Store chains -- stores killed by other store can be >>> eliminated. Given below example: >>> >>> for (i = 0; i < n; i++) >>> { >>> a[i] = 1; >>> a[i+2] = 2; >>> } >>> >>> It can be replaced with: >>> >>> t0 = a[0]; >>> t1 = a[1]; >>> for (i = 0; i < n; i++) >>> { >>> a[i] = 1; >>> t2 = 2; >>> t0 = t1; >>> t1 = t2; >>> } >>> a[n] = t0; >>> a[n+1] = t1; >>> >>> If the loop runs more than 1 iterations, it can be further simplified >>> into: >>> >>> for (i = 0; i < n; i++) >>> { >>> a[i] = 1; >>> } >>> a[n] = 2; >>> a[n+1] = 2; >>> >>> The interesting part is this can be viewed either as general store motion >>> or general dead store elimination in either intra/inter-iterations way. >>> >>> There are number of interesting facts about this enhancement: >>> a) This patch supports dead store elimination for both across-iteration >>> case and single-iteration >>> case. For the latter, it is dead store elimination. >>> b) There are advantages supporting dead store elimination in predcom, for >>> example, it has >>> complete information about memory address. On the contrary, DSE pass >>> can only handle >>> memory references with exact the same memory address expression. >>> c) It's cheap to support store-stores chain in predcom based on existing >>> code. >>> d) As commented, the enhancement can be viewed as either generalized dead >>> store elimination >>> or generalized store motion. I prefer DSE here. >>> >>> Bootstrap(O2/O3) in patch series on x86_64 and AArch64. Is it OK? >> >> Looks mostly ok. I have a few questions though. >> >> + /* Don't do store elimination if loop has multiple exit edges. */ >> + bool eliminate_store_p = single_exit (loop) != NULL; >> >> handling this would be an enhancement? IIRC LIM store-motion handles this >> just fine by emitting code on all exits. > It is an enhancement with a little bit more complication. We would > need to setup/record finalizer memory references for different exit > edges. I added TODO description for this (and following one). Is it > okay to pick up this in the future?
Yes. >> >> @@ -1773,6 +2003,9 @@ determine_unroll_factor (vec<chain_p> chains) >> { >> if (chain->type == CT_INVARIANT) >> continue; >> + /* Don't unroll when eliminating stores. */ >> + else if (chain->type == CT_STORE_STORE) >> + return 1; >> >> this is a hard exit value so we do not handle the case where another chain >> in the loop would want to unroll? (enhancement?) I'd have expected to do >> the same as for CT_INVARIANT here. > I didn't check what change is needed in case of unrolling. I am not > very sure if we should prefer unroll for *load chains or prefer not > unroll for store-store chains, because unroll in general increases > loop-carried register pressure for store-store chains rather than > decreases register pressure for *load chains. > I was also thinking if it's possible to restrict unrolling somehow in > order to enable predcom at O2. BTW, this is not common, it only > happens once in spec2k6 with factor forced to 1. So okay if as it is > now? I think it is ok for now with a TODO added. Please change the comment to "we can't handle unrolling when eliminating stores" (it's not clear if we can -- did you try? maybe add a testcase that would ICE) > >> >> + tree init = ref_at_iteration (dr, (int) 0 - i, &stmts); >> + if (!chain->all_always_accessed && tree_could_trap_p (init)) >> + { >> + gimple_seq_discard (stmts); >> + return false; >> >> so this is the only place that remotely cares for not always performed >> stores. >> But as-is the patch doesn't seem to avoid speculating stores and thus >> violates the C++ memory model, aka, introduces store-data-races? The LIM >> store-motion code was fixed to avoid this by keeping track of whether a BB >> has executed to guard the stores done in the compensation code on the loop >> exit. >> >> That said, to "fix" this all && tree_could_trap_p cases would need to be >> removed >> (or similarly flag vars be introduced). Speculating loads that do not >> trap is ok >> (might only introduce false uninit use reports by tools like valgrind). > Hmm, not sure IIUC. Patch updated, is it correct (though conservative)? +static bool +prepare_initializers_chain_store_elim (struct loop *, chain_p chain) +{ ... + tree init = ref_at_iteration (dr, (int) 0 - i, &stmts); + if (!chain->all_always_accessed && tree_could_trap_p (init)) + { remove && tree_could_trap_p and hoist the other check. Same in prepare_finalizers_chain. I think you should check if (! chain->all_always_accessed && ! PARAM_VALUE (PARAM_ALLOW_STORE_DATA_RACES)) return false; at the beginning of both functions and retain if (! chain->all_always_accessed && tree_could_trap_p ()) in the loops. Ok with that change and a testcase that would exercise failure/ICE of store chains w/ unrolling. Thanks, Richard. > Thanks, > bin >> >> Thanks, >> Richard. >> >>> Thanks, >>> bin >>> 2017-06-21 Bin Cheng <bin.ch...@arm.com> >>> >>> * tree-predcom.c: Revise general description of pass. >>> (enum chain_type): New enum type for store elimination. >>> (struct chain): New field supporting store elimination. >>> (dump_chain): Dump store-stores chain. >>> (release_chain): Release resources. >>> (split_data_refs_to_components): Compute and create component >>> contains only stores for elimination. >>> (get_chain_last_ref_at): New function. >>> (make_invariant_chain): Initialization. >>> (make_rooted_chain): Specify chain type in parameter. >>> (add_looparound_copies): Skip for store-stores chain. >>> (determine_roots_comp): Compute type of chain and pass it to >>> make_rooted_chain. >>> (initialize_root_vars_store_elim_2): New function. >>> (finalize_eliminated_stores): New function. >>> (remove_stmt): Handle store for elimination. >>> (execute_pred_commoning_chain): Execute predictive commoning on >>> store-store chains. >>> (determine_unroll_factor): Skip unroll for store-stores chain. >>> (prepare_initializers_chain_store_elim): New function. >>> (prepare_initializers_chain): Hanlde store-store chain. >>> (prepare_finalizers_chain, prepare_finalizers): New function. >>> (tree_predictive_commoning_loop): Return integer value indicating >>> if loop is unrolled or lcssa form is corrupted. >>> (tree_predictive_commoning): Rewrite for lcssa form if necessary. >>> >>> gcc/testsuite/ChangeLog >>> 2017-06-21 Bin Cheng <bin.ch...@arm.com> >>> >>> * gcc.dg/tree-ssa/predcom-dse-1.c: New test. >>> * gcc.dg/tree-ssa/predcom-dse-2.c: New test. >>> * gcc.dg/tree-ssa/predcom-dse-3.c: New test. >>> * gcc.dg/tree-ssa/predcom-dse-4.c: New test. >>> * gcc.dg/tree-ssa/predcom-dse-5.c: New test. >>> * gcc.dg/tree-ssa/predcom-dse-6.c: New test. >>> * gcc.dg/tree-ssa/predcom-dse-7.c: New test. >>> * gcc.dg/tree-ssa/predcom-dse-8.c: New test. >>> * gcc.dg/tree-ssa/predcom-dse-9.c: New test.