On Thu, Aug 8, 2013 at 3:23 PM, Jan Hubicka <hubi...@ucw.cz> wrote: > Hi, > Martin Liska was kind enough to generate disk seeking graph of gimp statrup > with his function reordering. > His code simply measures time of firest execution of a function and orders > functions in the given order. > The functions stay in the subsections (unlikely/startup/exit/hot/normal) that > are then glued together > in this order. > > I am attaching disk seeking with and without -freorder-blocks-and-partition > (with your patch). > > In 2.pdf you can see two increasing sequences in the text segment. If I am > not mistaken the bottom > one comes for hot and the top one for normal section. The big unused part on > bottom is unlikely > section since most of gimp is not trained.
2.pdf is reordered with Martin's technique? > > Now 1.pdf is with -freorder-blocks-and-partition and your patch. You can see > there is third sequence > near bottom of the text seciton. that is beggining of unlikely section, so it > tracks jumps where we > fall into cold section of function. 1.pdf is generated using the usual FDO + -freorder-blocks-and-partition (i.e. not using Martin's technique)? > > It still seems rather bad (i.e. good part of unlikely section is actually > used). I think the dominator > based approach is not going to work too reliably (I can "fix" my testcase to > contain multiple nested > conditionals and then the heuristic about predecestors won't help). Yes, this doesn't look good. Did you use the latest version of my patch that doesn't walk the dominators? Do you know how many training runs are done for this benchmark? I think a lot of the issues that you pointed out with the hot loop preceded by non-looping conditional code as in your earlier example, or multiple nested conditionals, comes from the fact that the cold cutoff is not 0, but some number less than the number of training runs. Perhaps the cutoff for splitting should be 0. Then the main issue that needs to be corrected is profile insanities, not code that is executed once (since that would not be marked cold). The only other issue that I can think of here is that the training data was not representative and didn't execute these blocks. > > What about simply walking the CFG from entry through all edges with non-0 > counts and making all reachable > blocks hot + forcingly make any hot blocks not reachable this way reachable? Is this different than what I currently have + changing the cold cutoff to 0? In that case any blocks reachable through non-0 edges should be non-0 and marked hot, and the current patch forces the most frequent paths to all hot blocks to be hot. Thanks, Teresa > I think we are really looking primarily for dead parts of the functions > (sanity checks/error handling) > that should not be visited by train run. We can then see how to make the > heuristic more aggressive? > > Honza -- Teresa Johnson | Software Engineer | tejohn...@google.com | 408-460-2413