On Wed, Jul 4, 2018 at 10:12 AM Aldy Hernandez <al...@redhat.com> wrote: > > > > On 07/03/2018 08:16 PM, Jeff Law wrote: > > On 07/03/2018 03:31 AM, Aldy Hernandez wrote: > >> On 07/02/2018 07:08 AM, Christophe Lyon wrote: > >> > >>>>> On 11/07/2017 10:33 AM, Aldy Hernandez wrote: > >>>>>> While poking around in the backwards threader I noticed that we bail if > >>>>>> > >>>>>> we have already seen a starting BB. > >>>>>> > >>>>>> /* Do not jump-thread twice from the same block. */ > >>>>>> if (bitmap_bit_p (threaded_blocks, entry->src->index) > >>>>>> > >>>>>> This limitation discards paths that are sub-paths of paths that have > >>>>>> already been threaded. > >>>>>> > >>>>>> The following patch scans the remaining to-be-threaded paths to > >>>>>> identify > >>>>>> > >>>>>> if any of them start from the same point, and are thus sub-paths of the > >>>>>> > >>>>>> just-threaded path. By removing the common prefix of blocks in > >>>>>> upcoming > >>>>>> > >>>>>> threadable paths, and then rewiring first non-common block > >>>>>> appropriately, we expose new threading opportunities, since we are no > >>>>>> longer starting from the same BB. We also simplify the would-be > >>>>>> threaded paths, because we don't duplicate already duplicated paths. > >> [snip] > >>> Hi, > >>> > >>> I've noticed a regression on aarch64: > >>> FAIL: gcc.dg/tree-ssa/ssa-dom-thread-7.c scan-tree-dump thread3 "Jumps > >>> threaded: 3" > >>> very likely caused by this patch (appeared between 262282 and 262294) > >>> > >>> Christophe > >> > >> The test needs to be adjusted here. > >> > >> The long story is that the aarch64 IL is different at thread3 time in > >> that it has 2 profitable sub-paths that can now be threaded with my > >> patch. This is causing the threaded count to be 5 for aarch64, versus 3 > >> for x86 64. Previously we couldn't thread these in aarch64, so the > >> backwards threader would bail. > >> > >> One can see the different threading opportunities by sticking > >> debug_all_paths() at the top of thread_through_all_blocks(). You will > >> notice that aarch64 has far more candidates to begin with. The IL on > >> the x86 backend, has no paths that start on the same BB. The aarch64, > >> on the other hand, has many to choose from: > >> > >> path: 52 -> 56, 56 -> 57, 57 -> 58, 58 -> 10, 10 -> 11, > >> path: 51 -> 56, 56 -> 57, 57 -> 58, 58 -> 10, 10 -> 16, > >> path: 53 -> 56, 56 -> 57, 57 -> 58, 58 -> 10, 10 -> 16, > >> path: 52 -> 56, 56 -> 57, 57 -> 58, 58 -> 10, 10 -> 11, 11 -> 35, > >> path: 51 -> 56, 56 -> 57, 57 -> 58, 58 -> 10, 10 -> 11, 11 -> 35, > >> path: 53 -> 56, 56 -> 57, 57 -> 58, 58 -> 10, 10 -> 11, 11 -> 35, > >> path: 52 -> 56, 56 -> 57, 57 -> 58, 58 -> 10, 10 -> 16, 16 -> 17, > >> path: 51 -> 56, 56 -> 57, 57 -> 58, 58 -> 10, 10 -> 16, 16 -> 17, > >> path: 53 -> 56, 56 -> 57, 57 -> 58, 58 -> 10, 10 -> 16, 16 -> 19, > >> > >> Some of these prove unprofitable, but 2 more than before are profitable > >> now. > >> > >> > >> BTW, I see another threading related failure on aarch64 which is > >> unrelated to my patch, and was previously there: > >> > >> FAIL: gcc.dg/tree-ssa/ssa-dom-thread-7.c scan-tree-dump-not vrp2 "Jumps > >> threaded" > >> > >> This is probably another IL incompatibility between architectures. > >> > >> Anyways... the attached path fixes the regression. I have added a note > >> to the test explaining the IL differences. We really should rewrite all > >> the threading tests (I am NOT volunteering ;-)). > >> > >> OK for trunk? > >> Aldy > >> > >> curr.patch > >> > >> > >> gcc/testsuite/ > >> > >> * gcc.dg/tree-ssa/ssa-dom-thread-7.c: Adjust test because aarch64 > >> has a slightly different IL that provides more threading > >> opportunities. > > OK. > > > > WRT rewriting the tests. I'd certainly agree that we don't have the > > right set of knobs to allow us to characterize the target nor do we have > > the right dumping/scanning facilities to describe and query the CFG changes. > > > > The fact that the IL changes so much across targets is a sign that > > target dependency (probably BRANCH_COST) is twiddling the gimple we > > generate. I strongly suspect we'd be a lot better off if we tackled the > > BRANCH_COST problem first. > > Huh. I've always accepted differing IL between architectures as a > necessary evil for things like auto-vectorization and the like. > > What's the ideal plan here? A knob to set default values for target > dependent variables that can affect IL layout? Then we could pass > -fthis-is-an-IL-test and things be normalized?
Use gimple testcases. It's currently a bit awkward in some cases but it worked for me in a few cases. Richard. > Thanks. > Aldy