On Tue, Jan 17, 2017 at 3:42 PM, Jeff Law <l...@redhat.com> wrote: > On 01/17/2017 02:15 AM, Richard Biener wrote: >> >> On Mon, Jan 16, 2017 at 11:36 PM, Richard Biener >> <richard.guent...@gmail.com> wrote: >>> >>> On January 16, 2017 7:27:53 PM GMT+01:00, Jeff Law <l...@redhat.com> >>> wrote: >>>> >>>> On 01/16/2017 01:51 AM, Richard Biener wrote: >>>>> >>>>> On Sun, Jan 15, 2017 at 10:34 AM, Jeff Law <l...@redhat.com> wrote: >>>>>> >>>>>> >>>>>> At one time I know I had the max_size == size test in >>>> >>>> valid_ao_ref_for_dse. >>>>>> >>>>>> But it got lost at some point. This is what caused the Ada failure. >>>>>> >>>>>> Technically it'd be OK for the potentially dead store to have a >>>> >>>> variable >>>>>> >>>>>> size as long as the later stores covered the entire range of the >>>> >>>> potentially >>>>>> >>>>>> dead store. I doubt this happens enough to be worth checking. >>>>>> >>>>>> The ppc64 big endian failures were more interesting. We had this in >>>> >>>> the IL: >>>>>> >>>>>> >>>>>> memmove (dst, src, 0) >>>>>> >>>>>> The trimming code assumes that there's at least one live byte in the >>>> >>>> store, >>>>>> >>>>>> which obviously isn't the case here. The net result is we compute >>>> >>>> an >>>>>> >>>>>> incorrect trim and the copy goes wild with incorrect addresses and >>>> >>>> lengths. >>>>>> >>>>>> This is trivial to fix by validating that the store has a nonzero >>>> >>>> length. >>>>>> >>>>>> >>>>>> I was a bit curious how often this happened in practice because such >>>> >>>> a call >>>>>> >>>>>> is trivially dead. ~80 during a bootstrap and a few dozen in the >>>> >>>> testsuite. >>>>>> >>>>>> Given how trivial it is to detect and optimize, this patch includes >>>> >>>> removal >>>>>> >>>>>> of such calls. This hunk makes the check for zero size in >>>>>> valid_ao_ref_for_dse redundant, but I'd like to keep the check -- if >>>> >>>> we add >>>>>> >>>>>> more builtin support without filtering zero size we'd regress again. >>>>> >>>>> >>>>> Interesting - we do fold memset (..., 0) away so this means we either >>>>> have an unfolded memset stmt in the IL before DSE. >>>> >>>> It's actually exposed by fre3, both in the original test and in the >>>> reduced testcase. In the reduced testcase we have this just prior to >>>> FRE3: >>>> >>>> ;; basic block 3, loop depth 0, count 0, freq 7326, maybe hot >>>> ;; prev block 2, next block 4, flags: (NEW, REACHABLE, VISITED) >>>> ;; pred: 2 [73.3%] (TRUE_VALUE,EXECUTABLE) >>>> _3 = MEM[(const struct vec *)_4].m_num; >>>> if (_3 != 0) >>>> goto <bb 4>; [36.64%] >>>> else >>>> goto <bb 5>; [63.36%] >>>> ;; succ: 4 [36.6%] (TRUE_VALUE,EXECUTABLE) >>>> ;; 5 [63.4%] (FALSE_VALUE,EXECUTABLE) >>>> >>>> ;; basic block 4, loop depth 0, count 0, freq 2684, maybe hot >>>> ;; prev block 3, next block 5, flags: (NEW, REACHABLE, VISITED) >>>> ;; pred: 3 [36.6%] (TRUE_VALUE,EXECUTABLE) >>>> _6 = vec_av_set.m_vec; >>>> _7 = _6->m_num; >>>> _8 = _7 - _3; >>>> _6->m_num = _8; >>>> _9 = (long unsigned int) _8; >>>> _10 = _9 * 4; >>>> slot.2_11 = slot; >>>> dest.3_12 = dest; >>>> memmove (dest.3_12, slot.2_11, _10); >>>> ;; succ: 5 [100.0%] (FALLTHRU,EXECUTABLE) >>>> >>>> >>>> _3 has the value _6->m_num. Thus _8 will have the value 0, which in >>>> turn makes _10 have the value zero as seen in the .fre3 dump: >>>> >>>> ;; basic block 4, loop depth 0, count 0, freq 2684, maybe hot >>>> ;; prev block 3, next block 5, flags: (NEW, REACHABLE, VISITED) >>>> ;; pred: 3 [36.6%] (TRUE_VALUE,EXECUTABLE) >>>> _4->m_num = 0; >>>> slot.2_11 = slot; >>>> dest.3_12 = dest; >>>> memmove (dest.3_12, slot.2_11, 0); >>>> >>>> In the full test its similar. >>>> >>>> I don't know if you want to try and catch this in FRE though. >>> >>> >>> Ah, I think I have patches for this since a long time in my tree... >>> We're folding calls in a restricted way for some historical reason. >>> >>>> If we detect in DCE (where it makes reasonable sense) rather than DSE, >>>> then we detect the dead mem* about 17 passes earlier and the dead >>>> argument setup about 20 passes earlier. In the testcase I looked at, I >>>> >>>> didn't see additional secondary optimizations enabled, but I could >>>> imagine cases where it might. Seems like a gcc-8 thing though. >>> >>> >>> I'll give it a quick look tomorrow. >> >> >> The comment before fold_stmt_inplace no longer applies (but it seems I >> simply forgot to push >> this change...). It's better to not keep unfolded stmts around, so >> I'll commit this as last bit >> of stage3 if testing is fine. >> >> Bootstrap / regtest on x86_64-unknown-linux-gnu in progress. >> >> Richard. >> >> 2017-01-17 Richard Biener <rguent...@suse.de> >> >> * tree-ssa-pre.c (eliminate_dom_walker::before_dom_children): >> Fold calls regularly. >> >> * gcc.dg/tree-ssa/ssa-fre-57.c: New testcase. >> > Note you'll need the trivial update to the new ssa-dse testcase as it > verifies removal of the dead memmove.
Yep, the patch had other fallout so I'm postponing it for GCC 8. Richard. > jeff >