On Mon, Jan 16, 2017 at 11:36 PM, Richard Biener <richard.guent...@gmail.com> wrote: > On January 16, 2017 7:27:53 PM GMT+01:00, Jeff Law <l...@redhat.com> wrote: >>On 01/16/2017 01:51 AM, Richard Biener wrote: >>> On Sun, Jan 15, 2017 at 10:34 AM, Jeff Law <l...@redhat.com> wrote: >>>> >>>> At one time I know I had the max_size == size test in >>valid_ao_ref_for_dse. >>>> But it got lost at some point. This is what caused the Ada failure. >>>> >>>> Technically it'd be OK for the potentially dead store to have a >>variable >>>> size as long as the later stores covered the entire range of the >>potentially >>>> dead store. I doubt this happens enough to be worth checking. >>>> >>>> The ppc64 big endian failures were more interesting. We had this in >>the IL: >>>> >>>> memmove (dst, src, 0) >>>> >>>> The trimming code assumes that there's at least one live byte in the >>store, >>>> which obviously isn't the case here. The net result is we compute >>an >>>> incorrect trim and the copy goes wild with incorrect addresses and >>lengths. >>>> This is trivial to fix by validating that the store has a nonzero >>length. >>>> >>>> I was a bit curious how often this happened in practice because such >>a call >>>> is trivially dead. ~80 during a bootstrap and a few dozen in the >>testsuite. >>>> Given how trivial it is to detect and optimize, this patch includes >>removal >>>> of such calls. This hunk makes the check for zero size in >>>> valid_ao_ref_for_dse redundant, but I'd like to keep the check -- if >>we add >>>> more builtin support without filtering zero size we'd regress again. >>> >>> Interesting - we do fold memset (..., 0) away so this means we either >>> have an unfolded memset stmt in the IL before DSE. >>It's actually exposed by fre3, both in the original test and in the >>reduced testcase. In the reduced testcase we have this just prior to >>FRE3: >> >>;; basic block 3, loop depth 0, count 0, freq 7326, maybe hot >>;; prev block 2, next block 4, flags: (NEW, REACHABLE, VISITED) >>;; pred: 2 [73.3%] (TRUE_VALUE,EXECUTABLE) >> _3 = MEM[(const struct vec *)_4].m_num; >> if (_3 != 0) >> goto <bb 4>; [36.64%] >> else >> goto <bb 5>; [63.36%] >>;; succ: 4 [36.6%] (TRUE_VALUE,EXECUTABLE) >>;; 5 [63.4%] (FALSE_VALUE,EXECUTABLE) >> >>;; basic block 4, loop depth 0, count 0, freq 2684, maybe hot >>;; prev block 3, next block 5, flags: (NEW, REACHABLE, VISITED) >>;; pred: 3 [36.6%] (TRUE_VALUE,EXECUTABLE) >> _6 = vec_av_set.m_vec; >> _7 = _6->m_num; >> _8 = _7 - _3; >> _6->m_num = _8; >> _9 = (long unsigned int) _8; >> _10 = _9 * 4; >> slot.2_11 = slot; >> dest.3_12 = dest; >> memmove (dest.3_12, slot.2_11, _10); >>;; succ: 5 [100.0%] (FALLTHRU,EXECUTABLE) >> >> >>_3 has the value _6->m_num. Thus _8 will have the value 0, which in >>turn makes _10 have the value zero as seen in the .fre3 dump: >> >>;; basic block 4, loop depth 0, count 0, freq 2684, maybe hot >>;; prev block 3, next block 5, flags: (NEW, REACHABLE, VISITED) >>;; pred: 3 [36.6%] (TRUE_VALUE,EXECUTABLE) >> _4->m_num = 0; >> slot.2_11 = slot; >> dest.3_12 = dest; >> memmove (dest.3_12, slot.2_11, 0); >> >>In the full test its similar. >> >>I don't know if you want to try and catch this in FRE though. > > Ah, I think I have patches for this since a long time in my tree... We're > folding calls in a restricted way for some historical reason. > >>If we detect in DCE (where it makes reasonable sense) rather than DSE, >>then we detect the dead mem* about 17 passes earlier and the dead >>argument setup about 20 passes earlier. In the testcase I looked at, I >> >>didn't see additional secondary optimizations enabled, but I could >>imagine cases where it might. Seems like a gcc-8 thing though. > > I'll give it a quick look tomorrow.
The comment before fold_stmt_inplace no longer applies (but it seems I simply forgot to push this change...). It's better to not keep unfolded stmts around, so I'll commit this as last bit of stage3 if testing is fine. Bootstrap / regtest on x86_64-unknown-linux-gnu in progress. Richard. 2017-01-17 Richard Biener <rguent...@suse.de> * tree-ssa-pre.c (eliminate_dom_walker::before_dom_children): Fold calls regularly. * gcc.dg/tree-ssa/ssa-fre-57.c: New testcase.
p3
Description: Binary data