On Thu, Nov 1, 2012 at 2:26 PM, Teresa Johnson <tejohn...@google.com> wrote:
> On Thu, Nov 1, 2012 at 10:19 AM, Teresa Johnson <tejohn...@google.com> wrote:
>> On Wed, Oct 31, 2012 at 4:02 PM, Steven Bosscher <stevenb....@gmail.com> 
>> wrote:
>>> On Tue, Oct 30, 2012 at 10:43 PM, Teresa Johnson wrote:
>>>> Sure, I will give this a try after your verification patch tests
>>>> complete. Does this mean that the patch you posted above to
>>>> force_nonfallthru_and_redirect is no longer needed either? I'll see if
>>>> I can avoid the need for some of my fixes, although I believe at least
>>>> the function.c one will still be needed. I'll check.
>>>
>>> The force_nonfallthru change is still necessary, because
>>> force_nonfallthru should be almost a no-op in cfglayout mode. The
>>> whole concept of a fallthru edge doesn't exist in cfglayout mode: any
>>> single_succ edge is a fallthru edge until the order of the basic
>>> blocks has been determined and the insn chain is re-linked (cfglayout
>>> mode originally was developed for bb-reorder, to move blocks around
>>> more easily). So the correct patch would actually be:
>>>
>>> Index: cfgrtl.c
>>> ===================================================================
>>> --- cfgrtl.c    (revision 193046)
>>> +++ cfgrtl.c    (working copy)
>>> @@ -4547,7 +4547,7 @@ struct cfg_hooks cfg_layout_rtl_cfg_hooks = {
>>>    cfg_layout_split_edge,
>>>    rtl_make_forwarder_block,
>>>    NULL, /* tidy_fallthru_edge */
>>> -  rtl_force_nonfallthru,
>>> +  NULL, /* force_nonfallthru */
>>>    rtl_block_ends_with_call_p,
>>>    rtl_block_ends_with_condjump_p,
>>>    rtl_flow_call_edges_add,
>>>
>>> (Or better yet: Remove the force_nonfallthru and tidy_fallthru_edge
>>> hooks, they are cfgrtl-only.)
>>>
>>> But obviously that won't work because
>>> bb-reorder.c:fix_up_fall_thru_edges calls this hook while we're in
>>> cfglayout mode. That is a bug. The call to force_nonfallthru results
>>> in a "dangling" barrier:
>>>
>>> cfgrtl.c:1523  emit_barrier_after (BB_END (jump_block));
>>>
>>> In cfglayout mode, barriers don't exist in the insns chain, and they
>>> don't have to because every edge is a fallthru edge. If there are
>>> barriers before cfglayout mode, they are either removed or linked in
>>> the basic block footer, and fixup_reorder_chain restores or inserts
>>> barriers where necessary to drop out of cfglayout mode. This
>>> emit_barrier_after call hangs a barrier after BB_END but not in the
>>> footer, and I'm pretty sure the result will be that the barrier is
>>> lost in fixup_reorder_chain. See also emit_barrier_after_bb for how
>>> inserting a barrier should be done in cfglayout mode.
>>>
>>> So in short, bbpart doesn't know what it wants to be: a cfgrtl or a
>>> cfglayout pass. It doesn't work without cfglayout but it's doing
>>> things that are only correct in the cfgrtl world and Very Wrong Indeed
>>> in cfglayout-land.
>>>
>>>
>>>> Regarding your earlier question about why we needed to add the
>>>> barrier, I need to dig up the details again but essentially I found
>>>> that the barriers were being added by bbpart, but bbro was reordering
>>>> things and the block that ended up at the border between the hot and
>>>> cold section didn't necessarily have a barrier on it because it was
>>>> not previously at the region boundary.
>>>
>>> That doesn't sound right. bbpart doesn't actually re-order the basic
>>> blocks, it only marks the blocks with the partition they will be
>>> assigned to. Whatever ends up at the border between the two partitions
>>> is not relevant: the hot section cannot end in a fall-through edge to
>>> the cold section (verify_flow_info even checks for that, see "fallthru
>>> edge crosses section boundary (bb %i)") so it must end in some
>>> explicit jump. Such jumps are always followed by a barrier. The only
>>> reason I can think of why there might be a missing barrier, is because
>>> fixup_reorder_chain has a bug and forgets to insert the barrier in
>>> some cases (and I suspect this may be the case for return patterns, or
>>> the a.m. issue of a dropper barrier).
>>>
>>> I would like to work on debugging this, but it's hard without test cases...
>>
>> I'm working on trying to reproduce some of these failures in a test
>> case I can share. So far, I have only been able to reproduce the
>> failure reported in PR 53743 in spec2006 (456.hmmer/sre_math.c). Still
>> working on getting a smaller/shareable test case for the other 2
>> issues.
>>
>> The failure in PR 53743 (assert in cfg_layout_merge_blocks) is what I
>> had fixed with my original changes to cfgrtl.c. Need to understand why
>> there is a reg crossing note between 2 bbs in the same partition.
>
> Interestingly, this turned out to be the same root cause as the
> verify_flow_info failures below. It is fixed by the same fix to
> thread_prologue_and_epilogue_insns. When the code below created the
> copy_bb and put it in e->src's partition, it made it insufficient for
> the merge blocks routine to check if the two bbs were in the same
> partition, because they were in the same partition but separated by
> the region crossing jump.
>
> I'll do some testing of the fix below, but do you have any comments on
> the correctness or the potential issue I raised (see my note just
> below the patch)?
>
> Do you recommend pursuing the move of the bb partition phase until
> later, after we leave cfglayout mode? I need to revisit to see if my
> prologue/epilogue fix below also addresses the issue I saw when I
> tried moving it.

I found that indeed it wasn't a complete fix as there can be cases
where we needed to redirect another pred branch in the hot section to
the copy_bb that was now marked cold. I also found another case
further down in the same thread_prologue_and_epilogue_insns routine
that required the exact same kind of fixup (where we create a new
block to hold a simple return and redirect edges to that block). Here
too a new block was created, put into the same partition as the pred
that we redirect to it, and that pred no longer needed to be marked
with a region crossing jump. In all of these cases the edge
redirection is handled by rtl_redirect_edge_and_branch_force, so I
simply moved the fixup there and had it fixup jumps/edges from all
preds as necessary based on the new target's partition.

Finally, I also fixed the below call to create_basic_block to
correctly insert the new basic block after any barriers following the
BB_END (e->src).

Another email coming shortly on the root cause I found for one of the
other issues I was attempting to fix in my original patch set, that
you also had questions about.

Thanks,
Teresa

>
> Thanks,
> Teresa
>
>>
>> In the hmmer test case I also hit a failures in rtl_verify_flow_info
>> and rtl_verify_flow_info_1:
>>
>> gcc  -c -o sre_math.o -DSPEC_CPU -D
>> NDEBUG    -fprofile-use -freorder-blocks-and-partition   -O2
>>      sre_math.c
>> sre_math.c: In function ‘Gammln’:
>> sre_math.c:161:1: error: EDGE_CROSSING incorrectly set across same section
>>  }
>>  ^
>> sre_math.c:161:1: error: missing barrier after block 6
>> sre_math.c:161:1: internal compiler error: verify_flow_info failed
>>
>>
>> This was due to some code in thread_prologue_and_epilogue_insns that
>> duplicated tail blocks:
>>
>>                 if (e)
>>                   {
>>                     copy_bb = create_basic_block (NEXT_INSN (BB_END 
>> (e->src)),
>>                                                   NULL_RTX, e->src);
>>                     BB_COPY_PARTITION (copy_bb, e->src);
>>                   }
>>
>> In this case e->src (bb 6) was in the cold section and e->dest was in
>> the hot section, and e->src ended with a REG_CROSSING_JUMP followed by
>> a barrier. The new copy_bb got put into the cold section by the copy
>> partition above, leading to the first error. And because the
>> create_basic_block call inserted the new copy_bb before NEXT_INSN
>> (BB_END (e->src)), which in this case was the barrier, we ended up
>> without the barrier after the crossing edge.
>>
>> I fixed this by making the following change:
>>
>> --- function.c  (revision 192692)
>> +++ function.c  (working copy)
>> @@ -6249,9 +6249,18 @@ thread_prologue_and_epilogue_insns (void)
>>                     break;
>>                 if (e)
>>                   {
>> +                    rtx note;
>>                     copy_bb = create_basic_block (NEXT_INSN (BB_END 
>> (e->src)),
>>                                                   NULL_RTX, e->src);
>>                     BB_COPY_PARTITION (copy_bb, e->src);
>> +                    /* Remove the region crossing note from jump at end of
>> +                       e->src if it exists.  */
>> +                    note = find_reg_note (BB_END (e->src),
>> REG_CROSSING_JUMP, NULL_RTX);
>> +                    if (note)
>> +                      /* There would also have been a barrier after
>> e->src, that
>> +                         is now after copy_bb, but that shouldn't be a
>> +                         problem?.  */
>> +                      remove_note (BB_END (e->src), note);
>>                   }
>>                 else
>>                   {
>>
>> But I am not sure this is really correct in all cases - for example,
>> what if another hot bb that also didn't require a prologue branched
>> into the new cloned tail sequence, which is now cold? E.g.
>> dup_block_and_redirect will redirect all predecessors that don't need
>> a prologue to the new copy.
>>
>> I'm going to see if I can get the other 2 failures I had found to
>> trigger on spec or a smaller test case.
>>
>> Teresa
>>
>>>
>>> Ciao!
>>> Steven
>>
>>
>>
>> --
>> Teresa Johnson | Software Engineer | tejohn...@google.com | 408-460-2413
>
>
>
> --
> Teresa Johnson | Software Engineer | tejohn...@google.com | 408-460-2413



-- 
Teresa Johnson | Software Engineer | tejohn...@google.com | 408-460-2413

Reply via email to