Re: [PATCH] Sanitize block partitioning under -freorder-blocks-and-partition

Teresa Johnson Wed, 25 Sep 2013 22:42:42 -0700

On Wed, Sep 25, 2013 at 2:33 PM, Teresa Johnson <tejohn...@google.com> wrote:
> On Tue, Sep 24, 2013 at 11:25 AM, Teresa Johnson <tejohn...@google.com> wrote:
>> On Tue, Sep 24, 2013 at 10:57 AM, Jan Hubicka <hubi...@ucw.cz> wrote:
>>>>
>>>> I looked at one that failed after 100 as well (20031204-1.c). In this
>>>> case, it was due to expansion which was creating multiple branches/bbs
>>>> from a logical OR and guessing incorrectly on how to assign the
>>>> counts:
>>>>
>>>>  if (octets == 4 && (*cp == ':' || *cp == '\0')) {
>>>>
>>>> The (*cp == ':' || *cp == '\0') part looked like the following going
>>>> into RTL expansion:
>>>>
>>>>   [20031204-1.c : 31:33] _29 = _28 == 58;
>>>>   [20031204-1.c : 31:33] _30 = _28 == 0;
>>>>   [20031204-1.c : 31:33] _31 = _29 | _30;
>>>>   [20031204-1.c : 31:18] if (_31 != 0)
>>>>     goto <bb 16>;
>>>>   else
>>>>     goto <bb 19>;
>>>>
>>>> where the result of the OR was always true, so bb 16 had a count of
>>>> 100 and bb 19 a count of 0. When it was expanded, the expanded version
>>>> of the above turned into 2 bbs with a branch in between. Both
>>>> comparisons were done in the first bb, but the first bb checked
>>>> whether the result of the *cp == '\0' compare was true, and if not
>>>> branched to the check for whether the *cp == ':' compare was true. It
>>>> gave the branch to the second check against ':' a count of 0, so that
>>>> bb got a count of 0 and was split out, and put the count of 100 on the
>>>> fall through assuming the compare with '\0' always evaluated to true.
>>>> In reality, this OR condition was always true because *cp was ':', not
>>>> '\0'. Therefore, the count of 0 on the second block with the check for
>>>> ':' was incorrect, we ended up trying to execute it, and failed.
>>>
>>> I see, we produce:
>>> ;; if (_26 != 0)
>>>
>>> (insn 94 93 95 (set (reg:CCZ 17 flags)
>>>         (compare:CCZ (reg:QI 107 [ D.2184 ])
>>>             (const_int 0 [0]))) a.c:31 -1
>>>      (nil))
>>>
>>> (insn 95 94 96 (set (reg:QI 122 [ D.2186 ])
>>>         (eq:QI (reg:CCZ 17 flags)
>>>             (const_int 0 [0]))) a.c:31 -1
>>>      (nil))
>>>
>>> (insn 96 95 97 (set (reg:CCZ 17 flags)
>>>         (compare:CCZ (reg:QI 122 [ D.2186 ])
>>>             (const_int 0 [0]))) a.c:31 -1
>>>      (nil))
>>>
>>> (jump_insn 97 96 98 (set (pc)
>>>         (if_then_else (ne (reg:CCZ 17 flags)
>>>                 (const_int 0 [0]))
>>>             (label_ref 100)
>>>             (pc))) a.c:31 -1
>>>      (expr_list:REG_BR_PROB (const_int 6100 [0x17d4])
>>>         (nil)))
>>>
>>> (insn 98 97 99 (set (reg:CCZ 17 flags)
>>>         (compare:CCZ (reg:QI 108 [ D.2186 ])
>>>             (const_int 0 [0]))) a.c:31 -1
>>>      (nil))
>>>
>>> (jump_insn 99 98 100 (set (pc)
>>>         (if_then_else (eq (reg:CCZ 17 flags)
>>>                 (const_int 0 [0]))
>>>             (label_ref 0)
>>>             (pc))) a.c:31 -1
>>>      (expr_list:REG_BR_PROB (const_int 3900 [0xf3c])
>>>         (nil)))
>>>
>>> (code_label 100 99 0 14 "" [0 uses])
>>>
>>> That is because we TER together "_26 = _25 | _24" and "if (_26 != 0)"
>>>
>>> First I think the logic of do_jump should really be moved to trees.  It is 
>>> not
>>> doing things that can not be adequately represented by gimple.
>>>
>>> I am not that certain we want to move it before profiling though.
>>>>
>>>> Presumably we had the correct profile data for both blocks, but the
>>>> accuracy was reduced when the OR was represented as a logical
>>>> computation with a single branch. We could change the expansion code
>>>> to do something different, e.g. treat as a 50-50 branch. But we would
>>>> still end up with integer truncation issues when there was a single
>>>> training run. But that could be dealt with conservatively in the
>>>
>>> Yep, but it is still better than what we have now - if the test above was
>>> in hot part of program (i.e. not executed once), we will end up optimizing
>>> the second conditional for size.
>>>
>>> So I think it is do_jump bug to not distribute probabilities across the two
>>> conditoinals introduced.
>>>> bbpart code as I suggested for the jump threading issue above. I.e. a
>>>> cold block with incoming non-cold edges conservatively not marked cold
>>>> for splitting.
>>>
>>> Yep, we can probably do that, but we ought to fix the individual cases
>>> above at least for resonable number of runs.
>>
>> I made this change and it removed a few of the failures.
>>
>> I looked at another case that still failed with 1 train run but passed
>> with 100. It turned out to be another truncation issue exposed by RTL
>> expansion, where we created some control flow for a memset builtin
>> which was in a block with an execution count of 1. Some of the blocks
>> got frequencies less than half the original block, so the count was
>> rounded down or truncated to 0. I noticed that in this case (as well
>> as the jump threading case I fixed by looking for non-zero incoming
>> edges in partitioning) that the bb frequency was non-zero.
>>
>> Why not just have probably_never_executed_bb_p return simply return
>> false bb->frequency is non-zero (right now it does the opposite -
>> returns true when bb->frequency is 0)? Making this change removed a
>> bunch of other failures. With this change as well, there are only 3
>> cases that still fail with 1 train run that pass with 100. Need to
>> look at those.
>
> FYI, These turned out to be more jump threading issues. I am currently
> working on getting jump threading profile updates to work properly, I
> think I'm pretty close. Haven't had a chance to look at do_jump yet.


Correction, it was not the 3 tests I mentioned above, but a different
set of tests being affected by this jump threading issue. I have a
patch I am regression testing. But there are other profile insanities
being caused upstream of jump threading that I haven't tracked down.
So I will also test and send the patch to handle some of these in the
splitting/cold cold detection code.  It also contains the change to
probably_never_executed_bb_p that I mention above to return false when
bb->frequency is non-zero.

Teresa

>
> Teresa
>
>>
>>>
>>> Will you look into logic of do_jump or shall I try to dive in?
>>
>> I can take a look, but probably won't have a chance until late this
>> week. If you don't get to it before then I will see if I can figure
>> out why it is applying the branch probabilities this way.
>>
>> Teresa
>>
>>>
>>> Honza
>>
>>
>>
>> --
>> Teresa Johnson | Software Engineer | tejohn...@google.com | 408-460-2413
>
>
>
> --
> Teresa Johnson | Software Engineer | tejohn...@google.com | 408-460-2413



-- 
Teresa Johnson | Software Engineer | tejohn...@google.com | 408-460-2413

Re: [PATCH] Sanitize block partitioning under -freorder-blocks-and-partition

Reply via email to