On 07/04/2018 02:12 AM, Aldy Hernandez wrote:
> 
> 
> On 07/03/2018 08:16 PM, Jeff Law wrote:
>> On 07/03/2018 03:31 AM, Aldy Hernandez wrote:
>>> On 07/02/2018 07:08 AM, Christophe Lyon wrote:
>>>
>>>>>> On 11/07/2017 10:33 AM, Aldy Hernandez wrote:
>>>>>>> While poking around in the backwards threader I noticed that we bail if
>>>>>>>
>>>>>>>
>>>>>>> we have already seen a starting BB.
>>>>>>>
>>>>>>>          /* Do not jump-thread twice from the same block.  */
>>>>>>>          if (bitmap_bit_p (threaded_blocks, entry->src->index)
>>>>>>>
>>>>>>> This limitation discards paths that are sub-paths of paths that have
>>>>>>> already been threaded.
>>>>>>>
>>>>>>> The following patch scans the remaining to-be-threaded paths to identify
>>>>>>>
>>>>>>>
>>>>>>> if any of them start from the same point, and are thus sub-paths of the
>>>>>>>
>>>>>>>
>>>>>>> just-threaded path.  By removing the common prefix of blocks in upcoming
>>>>>>>
>>>>>>>
>>>>>>> threadable paths, and then rewiring first non-common block
>>>>>>> appropriately, we expose new threading opportunities, since we are no
>>>>>>>
>>>>>>> longer starting from the same BB.  We also simplify the would-be
>>>>>>> threaded paths, because we don't duplicate already duplicated paths.
>>> [snip]
>>>> Hi,
>>>>
>>>> I've noticed a regression on aarch64:
>>>> FAIL: gcc.dg/tree-ssa/ssa-dom-thread-7.c scan-tree-dump thread3 "Jumps
>>>> threaded: 3"
>>>> very likely caused by this patch (appeared between 262282 and 262294)
>>>>
>>>> Christophe
>>>
>>> The test needs to be adjusted here.
>>>
>>> The long story is that the aarch64 IL is different at thread3 time in
>>> that it has 2 profitable sub-paths that can now be threaded with my
>>> patch.  This is causing the threaded count to be 5 for aarch64, versus 3
>>> for x86 64.  Previously we couldn't thread these in aarch64, so the
>>> backwards threader would bail.
>>>
>>> One can see the different threading opportunities by sticking
>>> debug_all_paths() at the top of thread_through_all_blocks().  You will
>>> notice that aarch64 has far more candidates to begin with.  The IL on
>>> the x86 backend, has no paths that start on the same BB.  The aarch64,
>>> on the other hand, has many to choose from:
>>>
>>> path: 52 -> 56, 56 -> 57, 57 -> 58, 58 -> 10, 10 -> 11,
>>> path: 51 -> 56, 56 -> 57, 57 -> 58, 58 -> 10, 10 -> 16,
>>> path: 53 -> 56, 56 -> 57, 57 -> 58, 58 -> 10, 10 -> 16,
>>> path: 52 -> 56, 56 -> 57, 57 -> 58, 58 -> 10, 10 -> 11, 11 -> 35,
>>> path: 51 -> 56, 56 -> 57, 57 -> 58, 58 -> 10, 10 -> 11, 11 -> 35,
>>> path: 53 -> 56, 56 -> 57, 57 -> 58, 58 -> 10, 10 -> 11, 11 -> 35,
>>> path: 52 -> 56, 56 -> 57, 57 -> 58, 58 -> 10, 10 -> 16, 16 -> 17,
>>> path: 51 -> 56, 56 -> 57, 57 -> 58, 58 -> 10, 10 -> 16, 16 -> 17,
>>> path: 53 -> 56, 56 -> 57, 57 -> 58, 58 -> 10, 10 -> 16, 16 -> 19,
>>>
>>> Some of these prove unprofitable, but 2 more than before are profitable now.
>>>
>>>
>>>
>>> BTW, I see another threading related failure on aarch64 which is
>>> unrelated to my patch, and was previously there:
>>>
>>> FAIL: gcc.dg/tree-ssa/ssa-dom-thread-7.c scan-tree-dump-not vrp2 "Jumps
>>> threaded"
>>>
>>> This is probably another IL incompatibility between architectures.
>>>
>>> Anyways... the attached path fixes the regression.  I have added a note
>>> to the test explaining the IL differences.  We really should rewrite all
>>> the threading tests (I am NOT volunteering ;-)).
>>>
>>> OK for trunk?
>>> Aldy
>>>
>>> curr.patch
>>>
>>>
>>> gcc/testsuite/
>>>
>>>     * gcc.dg/tree-ssa/ssa-dom-thread-7.c: Adjust test because aarch64
>>>     has a slightly different IL that provides more threading
>>>     opportunities.
>> OK.
>>
>> WRT rewriting the tests.  I'd certainly agree that we don't have the
>> right set of knobs to allow us to characterize the target nor do we have
>> the right dumping/scanning facilities to describe and query the CFG
>> changes.
>>
>> The fact that the IL changes so much across targets is a sign that
>> target dependency (probably BRANCH_COST) is twiddling the gimple we
>> generate.  I strongly suspect we'd be a lot better off if we tackled the
>> BRANCH_COST problem first.
> 
> Huh.  I've always accepted differing IL between architectures as a
> necessary evil for things like auto-vectorization and the like.
Yes.  We've made a conscious decision that introducing target
dependencies for the autovectorizer makes sense.  BRANCH_COST on the
other hand is a different beast :-)

> 
> What's the ideal plan here? A knob to set default values for target
> dependent variables that can affect IL layout?  Then we could pass
> -fthis-is-an-IL-test and things be normalized?
Well, lots of things.

I'd like decisions about how to expand branches deferred until rtl
expansion.  Kai was poking at this in the past but never really got any
traction.

Many tests should turn into gimple IL tests.

And I'd like a better framework for testing what we're doing to the IL.


Jeff

Reply via email to