On Thu, Nov 1, 2012 at 2:26 PM, Teresa Johnson <tejohn...@google.com> wrote: > On Thu, Nov 1, 2012 at 10:19 AM, Teresa Johnson <tejohn...@google.com> wrote: >> On Wed, Oct 31, 2012 at 4:02 PM, Steven Bosscher <stevenb....@gmail.com> >> wrote: >>> On Tue, Oct 30, 2012 at 10:43 PM, Teresa Johnson wrote: >>>> Sure, I will give this a try after your verification patch tests >>>> complete. Does this mean that the patch you posted above to >>>> force_nonfallthru_and_redirect is no longer needed either? I'll see if >>>> I can avoid the need for some of my fixes, although I believe at least >>>> the function.c one will still be needed. I'll check. >>> >>> The force_nonfallthru change is still necessary, because >>> force_nonfallthru should be almost a no-op in cfglayout mode. The >>> whole concept of a fallthru edge doesn't exist in cfglayout mode: any >>> single_succ edge is a fallthru edge until the order of the basic >>> blocks has been determined and the insn chain is re-linked (cfglayout >>> mode originally was developed for bb-reorder, to move blocks around >>> more easily). So the correct patch would actually be: >>> >>> Index: cfgrtl.c >>> =================================================================== >>> --- cfgrtl.c (revision 193046) >>> +++ cfgrtl.c (working copy) >>> @@ -4547,7 +4547,7 @@ struct cfg_hooks cfg_layout_rtl_cfg_hooks = { >>> cfg_layout_split_edge, >>> rtl_make_forwarder_block, >>> NULL, /* tidy_fallthru_edge */ >>> - rtl_force_nonfallthru, >>> + NULL, /* force_nonfallthru */ >>> rtl_block_ends_with_call_p, >>> rtl_block_ends_with_condjump_p, >>> rtl_flow_call_edges_add, >>> >>> (Or better yet: Remove the force_nonfallthru and tidy_fallthru_edge >>> hooks, they are cfgrtl-only.) >>> >>> But obviously that won't work because >>> bb-reorder.c:fix_up_fall_thru_edges calls this hook while we're in >>> cfglayout mode. That is a bug. The call to force_nonfallthru results >>> in a "dangling" barrier: >>> >>> cfgrtl.c:1523 emit_barrier_after (BB_END (jump_block)); >>> >>> In cfglayout mode, barriers don't exist in the insns chain, and they >>> don't have to because every edge is a fallthru edge. If there are >>> barriers before cfglayout mode, they are either removed or linked in >>> the basic block footer, and fixup_reorder_chain restores or inserts >>> barriers where necessary to drop out of cfglayout mode. This >>> emit_barrier_after call hangs a barrier after BB_END but not in the >>> footer, and I'm pretty sure the result will be that the barrier is >>> lost in fixup_reorder_chain. See also emit_barrier_after_bb for how >>> inserting a barrier should be done in cfglayout mode. >>> >>> So in short, bbpart doesn't know what it wants to be: a cfgrtl or a >>> cfglayout pass. It doesn't work without cfglayout but it's doing >>> things that are only correct in the cfgrtl world and Very Wrong Indeed >>> in cfglayout-land. >>> >>> >>>> Regarding your earlier question about why we needed to add the >>>> barrier, I need to dig up the details again but essentially I found >>>> that the barriers were being added by bbpart, but bbro was reordering >>>> things and the block that ended up at the border between the hot and >>>> cold section didn't necessarily have a barrier on it because it was >>>> not previously at the region boundary. >>> >>> That doesn't sound right. bbpart doesn't actually re-order the basic >>> blocks, it only marks the blocks with the partition they will be >>> assigned to. Whatever ends up at the border between the two partitions >>> is not relevant: the hot section cannot end in a fall-through edge to >>> the cold section (verify_flow_info even checks for that, see "fallthru >>> edge crosses section boundary (bb %i)") so it must end in some >>> explicit jump. Such jumps are always followed by a barrier. The only >>> reason I can think of why there might be a missing barrier, is because >>> fixup_reorder_chain has a bug and forgets to insert the barrier in >>> some cases (and I suspect this may be the case for return patterns, or >>> the a.m. issue of a dropper barrier). >>> >>> I would like to work on debugging this, but it's hard without test cases... >> >> I'm working on trying to reproduce some of these failures in a test >> case I can share. So far, I have only been able to reproduce the >> failure reported in PR 53743 in spec2006 (456.hmmer/sre_math.c). Still >> working on getting a smaller/shareable test case for the other 2 >> issues. >> >> The failure in PR 53743 (assert in cfg_layout_merge_blocks) is what I >> had fixed with my original changes to cfgrtl.c. Need to understand why >> there is a reg crossing note between 2 bbs in the same partition. > > Interestingly, this turned out to be the same root cause as the > verify_flow_info failures below. It is fixed by the same fix to > thread_prologue_and_epilogue_insns. When the code below created the > copy_bb and put it in e->src's partition, it made it insufficient for > the merge blocks routine to check if the two bbs were in the same > partition, because they were in the same partition but separated by > the region crossing jump. > > I'll do some testing of the fix below, but do you have any comments on > the correctness or the potential issue I raised (see my note just > below the patch)? > > Do you recommend pursuing the move of the bb partition phase until > later, after we leave cfglayout mode? I need to revisit to see if my > prologue/epilogue fix below also addresses the issue I saw when I > tried moving it.
I found that indeed it wasn't a complete fix as there can be cases where we needed to redirect another pred branch in the hot section to the copy_bb that was now marked cold. I also found another case further down in the same thread_prologue_and_epilogue_insns routine that required the exact same kind of fixup (where we create a new block to hold a simple return and redirect edges to that block). Here too a new block was created, put into the same partition as the pred that we redirect to it, and that pred no longer needed to be marked with a region crossing jump. In all of these cases the edge redirection is handled by rtl_redirect_edge_and_branch_force, so I simply moved the fixup there and had it fixup jumps/edges from all preds as necessary based on the new target's partition. Finally, I also fixed the below call to create_basic_block to correctly insert the new basic block after any barriers following the BB_END (e->src). Another email coming shortly on the root cause I found for one of the other issues I was attempting to fix in my original patch set, that you also had questions about. Thanks, Teresa > > Thanks, > Teresa > >> >> In the hmmer test case I also hit a failures in rtl_verify_flow_info >> and rtl_verify_flow_info_1: >> >> gcc -c -o sre_math.o -DSPEC_CPU -D >> NDEBUG -fprofile-use -freorder-blocks-and-partition -O2 >> sre_math.c >> sre_math.c: In function ‘Gammln’: >> sre_math.c:161:1: error: EDGE_CROSSING incorrectly set across same section >> } >> ^ >> sre_math.c:161:1: error: missing barrier after block 6 >> sre_math.c:161:1: internal compiler error: verify_flow_info failed >> >> >> This was due to some code in thread_prologue_and_epilogue_insns that >> duplicated tail blocks: >> >> if (e) >> { >> copy_bb = create_basic_block (NEXT_INSN (BB_END >> (e->src)), >> NULL_RTX, e->src); >> BB_COPY_PARTITION (copy_bb, e->src); >> } >> >> In this case e->src (bb 6) was in the cold section and e->dest was in >> the hot section, and e->src ended with a REG_CROSSING_JUMP followed by >> a barrier. The new copy_bb got put into the cold section by the copy >> partition above, leading to the first error. And because the >> create_basic_block call inserted the new copy_bb before NEXT_INSN >> (BB_END (e->src)), which in this case was the barrier, we ended up >> without the barrier after the crossing edge. >> >> I fixed this by making the following change: >> >> --- function.c (revision 192692) >> +++ function.c (working copy) >> @@ -6249,9 +6249,18 @@ thread_prologue_and_epilogue_insns (void) >> break; >> if (e) >> { >> + rtx note; >> copy_bb = create_basic_block (NEXT_INSN (BB_END >> (e->src)), >> NULL_RTX, e->src); >> BB_COPY_PARTITION (copy_bb, e->src); >> + /* Remove the region crossing note from jump at end of >> + e->src if it exists. */ >> + note = find_reg_note (BB_END (e->src), >> REG_CROSSING_JUMP, NULL_RTX); >> + if (note) >> + /* There would also have been a barrier after >> e->src, that >> + is now after copy_bb, but that shouldn't be a >> + problem?. */ >> + remove_note (BB_END (e->src), note); >> } >> else >> { >> >> But I am not sure this is really correct in all cases - for example, >> what if another hot bb that also didn't require a prologue branched >> into the new cloned tail sequence, which is now cold? E.g. >> dup_block_and_redirect will redirect all predecessors that don't need >> a prologue to the new copy. >> >> I'm going to see if I can get the other 2 failures I had found to >> trigger on spec or a smaller test case. >> >> Teresa >> >>> >>> Ciao! >>> Steven >> >> >> >> -- >> Teresa Johnson | Software Engineer | tejohn...@google.com | 408-460-2413 > > > > -- > Teresa Johnson | Software Engineer | tejohn...@google.com | 408-460-2413 -- Teresa Johnson | Software Engineer | tejohn...@google.com | 408-460-2413