On 05/16/2018 04:12 AM, Richard Biener wrote: > On Tue, 15 May 2018, Richard Biener wrote: > >> On Tue, 15 May 2018, Richard Biener wrote: >> >>> On Tue, 15 May 2018, Richard Biener wrote: >>> >>>> On Tue, 15 May 2018, Richard Biener wrote: >>>> >>>>> On Tue, 15 May 2018, Richard Biener wrote: >>>>> >>>>>> On Mon, 14 May 2018, Kugan Vivekanandarajah wrote: >>>>>> >>>>>>> Hi, >>>>>>> >>>>>>> Attached patch handles PR63185 when we reach PHI with temp != NULLL. >>>>>>> We could see the PHI and if there isn't any uses for PHI that is >>>>>>> interesting, we could ignore that ? >>>>>>> >>>>>>> Bootstrapped and regression tested on x86_64-linux-gnu. >>>>>>> Is this OK? >>>>>> >>>>>> No, as Jeff said we can't do it this way. >>>>>> >>>>>> If we end up with multiple VDEFs in the walk of defvar immediate >>>>>> uses we know we are dealing with a CFG fork. We can't really >>>>>> ignore any of the paths but we have to >>>>>> >>>>>> a) find the merge point (and the associated VDEF) >>>>>> b) verify for each each chain of VDEFs with associated VUSEs >>>>>> up to that merge VDEF that we have no uses of the to classify >>>>>> store and collect (partial) kills >>>>>> c) intersect kill info and continue walking from the merge point >>>>>> >>>>>> in b) there's the optional possibility to find sinking opportunities >>>>>> in case we have kills on some paths but uses on others. This is why >>>>>> DSE should be really merged with (store) sinking. >>>>>> >>>>>> So if we want to enhance DSEs handling of branches then we need >>>>>> to refactor the simple dse_classify_store function. Let me take >>>>>> an attempt at this today. >>>>> >>>>> First (baby) step is the following - it arranges to collect the >>>>> defs we need to continue walking from and implements trivial >>>>> reduction by stopping at (full) kills. This allows us to handle >>>>> the new testcase (which was already handled in the very late DSE >>>>> pass with the help of sinking the store). >>>>> >>>>> I took the opportunity to kill the use_stmt parameter of >>>>> dse_classify_store as the only user is only looking for whether >>>>> the kills were all clobbers which I added a new parameter for. >>>>> >>>>> I didn't adjust the byte-tracking case fully (I'm not fully understanding >>>>> the code in the case of a use and I'm not sure whether it's worth >>>>> doing the def reduction with byte-tracking). >>>>> >>>>> Your testcase can be handled by reducing the PHI and the call def >>>>> by seeing that the only use of a candidate def is another def >>>>> we have already processed. Not sure if worth special-casing though, >>>>> I'd rather have a go at "recursing". That will be the next >>>>> patch. >>>>> >>>>> Bootstrap & regtest running on x86_64-unknown-linux-gnu. >>>> >>>> Applied. >>>> >>>> Another intermediate one below, fixing the byte-tracking for >>>> stmt with uses. This also re-does the PHI handling by simply >>>> avoiding recursion by means of a visited bitmap and stopping >>>> at the DSE classify stmt when re-visiting it instead of failing. >>>> This covers Pratamesh loop case for which I added ssa-dse-33.c. >>>> For the do-while loop this still runs into the inability to >>>> handle two defs to walk from. >>>> >>>> Bootstrap & regtest running on x86_64-unknown-linux-gnu. >>> >>> Ok, loop handling doesn't work in general since we run into the >>> issue that SSA form across the backedge is not representing the >>> same values. Consider >>> >>> <bb 3> >>> # .MEM_22 = PHI <.MEM_12(D)(2), .MEM_13(4)> >>> # n_20 = PHI <0(2), n_7(4)> >>> # .MEM_13 = VDEF <.MEM_22> >>> bytes[n_20] = _4; >>> if (n_20 > 7) >>> goto <bb 5>; >>> >>> <bb 4> >>> n_7 = n_20 + 1; >>> # .MEM_15 = VDEF <.MEM_13> >>> bytes[n_20] = _5; >>> goto <bb 3>; >>> >>> then when classifying the store in bb4, visiting the PHI node >>> gets us to the store in bb3 which appears to be killing. >>> >>> if (gimple_code (temp) == GIMPLE_PHI) >>> - defvar = PHI_RESULT (temp); >>> + { >>> + /* If we visit this PHI by following a backedge then reset >>> + any info in ref that may refer to SSA names which we'd need >>> + to PHI translate. */ >>> + if (gimple_bb (temp) == gimple_bb (stmt) >>> + || dominated_by_p (CDI_DOMINATORS, gimple_bb (stmt), >>> + gimple_bb (temp))) >>> + /* ??? ref->ref may not refer to SSA names or it may only >>> + refer to SSA names that are invariant with respect to the >>> + loop represented by this PHI node. */ >>> + ref->ref = NULL_TREE; >>> + defvar = PHI_RESULT (temp); >>> + bitmap_set_bit (visited, SSA_NAME_VERSION (defvar)); >>> + } >>> >>> should be a workable solution for that. I'm checking that, but >>> eventually you can think of other things that might prevent us from >>> handling backedges. Note the current code tries to allow >>> looking across loops but not handle backedges of loops the >>> original stmt belongs to. >> >> Just to mention before I leave for the day I think I've identified >> a latent issue where I just fail to produce a testcase right now >> which is that non-return exits from a function are not reliably >> visited given they do not all have virtual operands, like >> for example resx. Thus consider >> >> void foo (int *p, int x) >> { >> *p = 0; >> if (x) >> resx; >> *p = 1; >> return; >> } >> >> where we will never visit any stmts on the resx path and thus think >> the *p = 0 store is dead (all with current trunk of course). >> >> Will continue to dig tomorrow. > > PR85803. > > I am currently re-testing the following on x86_64-unknown-linux-gnu > with --param dse-max-alias-queries-per-store=100000 in BOOT_CFLAGS. > > I'll refrain from doing the separate-walks-for-each-def thing > but will try to fix PR85757 by pattern matching the > def-has-single-virtual-use-that-is-another-def-in-the-vector case > in which case we can remove it from further processing. > > Richard. > > 2018-05-16 Richard Biener <rguent...@suse.de> > > * params.def (PARAM_DSE_MAX_ALIAS_QUERIES_PER_STORE): New param. > * doc/invoke.texi (dse-max-alias-queries-per-store): Document. > * tree-ssa-dse.c: Include tree-ssa-loop.h. > (check_name): New callback. > (dse_classify_store): Track cycles via a visited bitmap of PHI > defs and simplify handling of in-loop and across loop dead stores > and properly fail for loop-variant refs. Handle byte-tracking with > multiple defs. Use PARAM_DSE_MAX_ALIAS_QUERIES_PER_STORE for > limiting the walk. > > * gcc.dg/tree-ssa/ssa-dse-32.c: New testcase. > * gcc.dg/tree-ssa/ssa-dse-33.c: Likewise. Seems quite reasonable as well.
jeff