Hello Jeff: Here is more of a data you have asked for.
SPEC FP benchmarks. a) No Path Splitting + tracer enabled Geomean Score = 4749.726. b) Path Splitting enabled + tracer enabled. Geomean Score = 4781.655. Conclusion: With both Path Splitting and tracer enabled we got maximum gains. I think we need to have Path Splitting pass. SPEC INT benchmarks. a) Path Splitting enabled + tracer not enabled. Geomean Score = 3745.193. b) No Path Splitting + tracer enabled. Geomean Score = 3738.558. c) Path Splitting enabled + tracer enabled. Geomean Score = 3742.833. Conclusion: We are getting more gains with Path Splitting as compared to tracer. With both Path Splitting and tracer enabled we are also getting gains. I think we should have Path Splitting pass. One more observation: Richard's concern is the creation of multiple exits with Splitting paths through duplication. My observation is, in tracer pass also there is a creation of multiple exits through duplication. I don’t think that’s an issue with the practicality considering the gains we are getting with Splitting paths with more PRE, CSE and DCE. Thanks & Regards Ajit -----Original Message----- From: Jeff Law [mailto:l...@redhat.com] Sent: Wednesday, December 16, 2015 5:20 AM To: Richard Biener Cc: Ajit Kumar Agarwal; GCC Patches; Vinod Kathail; Shail Aditya Gupta; Vidhumouli Hunsigida; Nagaraju Mekala Subject: Re: [Patch,tree-optimization]: Add new path Splitting pass on tree ssa representation On 12/11/2015 03:05 AM, Richard Biener wrote: > On Thu, Dec 10, 2015 at 9:08 PM, Jeff Law <l...@redhat.com> wrote: >> On 12/03/2015 07:38 AM, Richard Biener wrote: >>> >>> This pass is now enabled by default with -Os but has no limits on >>> the amount of stmts it copies. >> >> The more statements it copies, the more likely it is that the path >> spitting will turn out to be useful! It's counter-intuitive. > > Well, it's still not appropriate for -Os (nor -O2 I think). -ftracer > is enabled with -fprofile-use (but it is also properly driven to only > trace hot paths) and otherwise not by default at any optimization level. Definitely not appropriate for -Os. But as I mentioned, I really want to look at the tracer code as it may totally subsume path splitting. > > Don't see how this would work for the CFG pattern it operates on > unless you duplicate the exit condition into that new block creating > an even more obfuscated CFG. Agreed, I don't see any way to fix the multiple exit problem. Then again, this all runs after the tree loop optimizer, so I'm not sure how big of an issue it is in practice. >> It was only after I approved this code after twiddling it for Ajit >> that I came across Honza's tracer implementation, which may in fact >> be retargettable to these loops and do a better job. I haven't >> experimented with that. > > Well, I originally suggested to merge this with the tracer pass... I missed that, or it didn't sink into my brain. >> Again, the more statements it copies the more likely it is to be profitable. >> Think superblocks to expose CSE, DCE and the like. > > Ok, so similar to tracer (where I think the main benefit is actually > increasing scheduling opportunities for architectures where it matters). Right. They're both building superblocks, which has the effect of larger windows for scheduling, DCE, CSE, etc. > > Note that both passes are placed quite late and thus won't see much > of the GIMPLE optimizations (DOM mainly). I wonder why they were > not placed adjacent to each other. Ajit had it fairly early, but that didn't play well with if-conversion. I just pushed it past if-conversion and vectorization, but before the last DOM pass. That turns out to be where tracer lives too as you noted. >> >> I wouldn't lose any sleep if we disabled by default or removed, particularly >> if we can repurpose Honza's code. In fact, I might strongly support the >> former until we hear back from Ajit on performance data. > > See above for what we do with -ftracer. path-splitting should at _least_ > restrict itself to operate on optimize_loop_for_speed_p () loops. I think we need to decide if we want the code at all, particularly given the multiple-exit problem. The difficulty is I think Ajit posted some recent data that shows it's helping. So maybe the thing to do is ask Ajit to try the tracer independent of path splitting and take the obvious actions based on Ajit's data. > > It should also (even if counter-intuitive) limit the amount of stmt copying > it does - after all there is sth like an instruction cache size which > exceeeding > for loops will never be a good idea (and even smaller special loop caches on > some archs). Yup. > > Note that a better heuristic than "at least more than one stmt" would be > to have at least one PHI in the merger block. Otherwise I don't see how > CSE opportunities could exist we don't see without the duplication. > And yes, more PHIs -> more possible CSE. I wouldn't say so for > the number of stmts. So please limit the number of stmt copies! > (after all we do limit the number of stmts we copy during jump threading!) Let's get some more data before we try to tune path splitting. In an ideal world, the tracer can handle this for us and we just remove path splitting completely. Jeff