-----Original Message----- From: gcc-patches-ow...@gcc.gnu.org [mailto:gcc-patches-ow...@gcc.gnu.org] On Behalf Of Richard Biener Sent: Wednesday, December 16, 2015 3:27 PM To: Ajit Kumar Agarwal Cc: Jeff Law; GCC Patches; Vinod Kathail; Shail Aditya Gupta; Vidhumouli Hunsigida; Nagaraju Mekala Subject: Re: [Patch,tree-optimization]: Add new path Splitting pass on tree ssa representation
On Wed, Dec 16, 2015 at 8:43 AM, Ajit Kumar Agarwal <ajit.kumar.agar...@xilinx.com> wrote: > Hello Jeff: > > Here is more of a data you have asked for. > > SPEC FP benchmarks. > a) No Path Splitting + tracer enabled > Geomean Score = 4749.726. > b) Path Splitting enabled + tracer enabled. > Geomean Score = 4781.655. > > Conclusion: With both Path Splitting and tracer enabled we got maximum gains. > I think we need to have Path Splitting pass. > > SPEC INT benchmarks. > a) Path Splitting enabled + tracer not enabled. > Geomean Score = 3745.193. > b) No Path Splitting + tracer enabled. > Geomean Score = 3738.558. > c) Path Splitting enabled + tracer enabled. > Geomean Score = 3742.833. >>I suppose with SPEC you mean SPEC CPU 2006? The performance data is with respect to SPEC CPU 2000 benchmarks. >>Can you disclose the architecture you did the measurements on and the compile >>flags you used otherwise? Intel(R) Xeon(R) CPU E5-2687W v3 @ 3.10GHz cpu cores : 10 cache size : 25600 KB I have used -O3 and enable the tracer with -ftracer . Thanks & Regards Ajit >>Note that tracer does a very good job only when paired with FDO so can you >>re-run SPEC with FDO and compare with path-splitting enabled on top of that? Thanks, Richard. > Conclusion: We are getting more gains with Path Splitting as compared to > tracer. With both Path Splitting and tracer enabled we are also getting > gains. > I think we should have Path Splitting pass. > > One more observation: Richard's concern is the creation of multiple > exits with Splitting paths through duplication. My observation is, in > tracer pass also there is a creation of multiple exits through duplication. I > don’t think that’s an issue with the practicality considering the gains we > are getting with Splitting paths with more PRE, CSE and DCE. > > Thanks & Regards > Ajit > > > > > -----Original Message----- > From: Jeff Law [mailto:l...@redhat.com] > Sent: Wednesday, December 16, 2015 5:20 AM > To: Richard Biener > Cc: Ajit Kumar Agarwal; GCC Patches; Vinod Kathail; Shail Aditya > Gupta; Vidhumouli Hunsigida; Nagaraju Mekala > Subject: Re: [Patch,tree-optimization]: Add new path Splitting pass on > tree ssa representation > > On 12/11/2015 03:05 AM, Richard Biener wrote: >> On Thu, Dec 10, 2015 at 9:08 PM, Jeff Law <l...@redhat.com> wrote: >>> On 12/03/2015 07:38 AM, Richard Biener wrote: >>>> >>>> This pass is now enabled by default with -Os but has no limits on >>>> the amount of stmts it copies. >>> >>> The more statements it copies, the more likely it is that the path >>> spitting will turn out to be useful! It's counter-intuitive. >> >> Well, it's still not appropriate for -Os (nor -O2 I think). -ftracer >> is enabled with -fprofile-use (but it is also properly driven to only >> trace hot paths) and otherwise not by default at any optimization level. > Definitely not appropriate for -Os. But as I mentioned, I really want to > look at the tracer code as it may totally subsume path splitting. > >> >> Don't see how this would work for the CFG pattern it operates on >> unless you duplicate the exit condition into that new block creating >> an even more obfuscated CFG. > Agreed, I don't see any way to fix the multiple exit problem. Then again, > this all runs after the tree loop optimizer, so I'm not sure how big of an > issue it is in practice. > > >>> It was only after I approved this code after twiddling it for Ajit >>> that I came across Honza's tracer implementation, which may in fact >>> be retargettable to these loops and do a better job. I haven't >>> experimented with that. >> >> Well, I originally suggested to merge this with the tracer pass... > I missed that, or it didn't sink into my brain. > >>> Again, the more statements it copies the more likely it is to be profitable. >>> Think superblocks to expose CSE, DCE and the like. >> >> Ok, so similar to tracer (where I think the main benefit is actually >> increasing scheduling opportunities for architectures where it matters). > Right. They're both building superblocks, which has the effect of larger > windows for scheduling, DCE, CSE, etc. > > >> >> Note that both passes are placed quite late and thus won't see much >> of the GIMPLE optimizations (DOM mainly). I wonder why they were not >> placed adjacent to each other. > Ajit had it fairly early, but that didn't play well with if-conversion. > I just pushed it past if-conversion and vectorization, but before > the last DOM pass. That turns out to be where tracer lives too as you noted. > >>> >>> I wouldn't lose any sleep if we disabled by default or removed, >>> particularly if we can repurpose Honza's code. In fact, I might >>> strongly support the former until we hear back from Ajit on performance >>> data. >> >> See above for what we do with -ftracer. path-splitting should at >> _least_ restrict itself to operate on optimize_loop_for_speed_p () loops. > I think we need to decide if we want the code at all, particularly > given the multiple-exit problem. > > The difficulty is I think Ajit posted some recent data that shows it's > helping. So maybe the thing to do is ask Ajit to try the tracer > independent of path splitting and take the obvious actions based on > Ajit's data. > > >> >> It should also (even if counter-intuitive) limit the amount of stmt >> copying it does - after all there is sth like an instruction cache >> size which exceeeding for loops will never be a good idea (and even >> smaller special loop caches on some archs). > Yup. > >> >> Note that a better heuristic than "at least more than one stmt" would >> be to have at least one PHI in the merger block. Otherwise I don't >> see how CSE opportunities could exist we don't see without the duplication. >> And yes, more PHIs -> more possible CSE. I wouldn't say so for the >> number of stmts. So please limit the number of stmt copies! >> (after all we do limit the number of stmts we copy during jump >> threading!) > Let's get some more data before we try to tune path splitting. In an > ideal world, the tracer can handle this for us and we just remove path > splitting completely. > > Jeff