On Wed, Feb 17, 2016 at 8:30 AM, Jeff Law <l...@redhat.com> wrote: > On 02/14/2016 11:38 AM, Richard Biener wrote: >> >> On February 14, 2016 5:35:13 PM GMT+01:00, Jeff Law <l...@redhat.com> >> wrote: >>> >>> On 02/12/2016 10:21 AM, Jeff Law wrote: >>>>> >>>>> But really we simply need a better DSE algorithm. >>>> >>>> So the way to fix DSE is to keep the existing algorithm and track the >>>> hunks of Complex/aggregates that have been set a second time. >>>> >>>> My first thought was to implement this as an inverted bitmap. ie, >>> >>> set >>>> >>>> it to 1 for every byte in the complex/aggregate that is set by the >>> >>> first >>>> >>>> store. Clear bits for each byte in subsequent stores to the pieces. >>> >>> If >>>> >>>> the bitmap reaches an empty state, then the initial store is dead. >>>> >>>> Adjusting *ref could work too, but would probably be painful if the >>>> subsequent stores don't happen in a convenient order. >>> >>> So that was scary easy. We should have done this a long time ago. >>> >>> Essentially I call ao_get_ref_base to get the offset/size/max_size >>> filled in for the first statement. Those are used to initialize the >>> live bytes bitfield, as long as max_size != -1. >>> >>> Then when we have a possible killing statement, we use the ao in a >>> similar manner to determine which bytes to clear (taking care that the >>> base is the same between the two references and that in the killing >>> statement that the size/max_size are the same.). >>> >>> When all the live bytes are zero then we've killed the original >>> statement. >>> >>> It's ~20 lines of code. >>> >>> I need to pull together some additional tests, but it looks likely >>> we'll >>> be able to wrap this up easily for gcc-6. >> >> >> BTW, we had sth like this before but it had both correctness and more >> importantly scalability issues. > > Just a couple more tibits. > > I instrumented a bootstrap -- the improved DSE finds ~20k additional DSE > opportunities during a GCC bootstrap that could not be found by the current > DSE. Yes, 20k additional statements deleted by tree DSE. Yow!
Well, DCE also can do quite some DSE and it runs after DSE - did that 20k more DSE affect the overall end-result? > Of those additional opportunities, none require bit level tracking. So we > can just punt when the size/offset is not byte sized/aligned. Yep. I expect us to eventually lower all those bit-precision stuff. > Of those additional opportunities > 99% are for sizes of 64 bytes or > smaller. Thus we can pack those into 1 or 2 bitmap elements, depending on > the starting offset. So the bitmap side will be efficient with no real > searching if we choose our PARAM value wisely. So then please use a uint64_t or even uint32_t mask please. Which means a fixed size SBITMAP (32 bits) if you like to use the bitmap interface. > The outliers are, well, strange. There were cases where we found new DSE > opportunities for objects of size 2k bytes or larger. There weren't many of > these, but I was surprised at the size. Most likely it's a clobber or mem* > thing that's participating in DSE. But I haven't looked closely at those > cases yet. I suspect it's memset followed by actually initializing all elements. We have quite some of those I think. > There's a ton of statements that are clobbering zero-sized objects. My code > can determine when those clobbers are redundant (with some later clobber), > but I haven't looked closely to see if that's actually a good thing to do or > not. > > Anyway, I still don't see anything which makes me think this can't wrap-up > in the immediate future. Can you share your work-in-progress patch? Thanks, Richard. > jeff