On Sun, 1 Jul 2018 at 12:56, Aldy Hernandez <al...@redhat.com> wrote: > > > > On 06/29/2018 02:50 PM, Jeff Law wrote: > > [ Returning to another old patch... ] > > > > On 11/07/2017 10:33 AM, Aldy Hernandez wrote: > >> [One more time, but without rejected HTML mail, because apparently this > >> is my first post to gcc-patches *ever* ;-)]. > >> > >> Howdy! > >> > >> While poking around in the backwards threader I noticed that we bail if > >> we have already seen a starting BB. > >> > >> /* Do not jump-thread twice from the same block. */ > >> if (bitmap_bit_p (threaded_blocks, entry->src->index) > >> > >> This limitation discards paths that are sub-paths of paths that have > >> already been threaded. > >> > >> The following patch scans the remaining to-be-threaded paths to identify > >> if any of them start from the same point, and are thus sub-paths of the > >> just-threaded path. By removing the common prefix of blocks in upcoming > >> threadable paths, and then rewiring first non-common block > >> appropriately, we expose new threading opportunities, since we are no > >> longer starting from the same BB. We also simplify the would-be > >> threaded paths, because we don't duplicate already duplicated paths. > >> > >> This sounds confusing, but the documentation for the entry point to my > >> patch (adjust_paths_after_duplication) shows an actual example: > >> > >> +/* After an FSM path has been jump threaded, adjust the remaining FSM > >> + paths that are subsets of this path, so these paths can be safely > >> + threaded within the context of the new threaded path. > >> + > >> + For example, suppose we have just threaded: > >> + > >> + 5 -> 6 -> 7 -> 8 -> 12 => 5 -> 6' -> 7' -> 8' -> 12' > >> + > >> + And we have an upcoming threading candidate: > >> + 5 -> 6 -> 7 -> 8 -> 15 -> 20 > >> + > >> + This function adjusts the upcoming path into: > >> + 8' -> 15 -> 20 > >> > >> Notice that we will no longer have two paths that start from the same > >> BB. One will start with bb5, while the adjusted path will start with > >> bb8'. With this we kill two birds-- we are able to thread more paths, > >> and these paths will avoid duplicating a whole mess of things that have > >> already been threaded. > >> > >> The attached patch is a subset of some upcoming work that can live on > >> its own. It bootstraps and regtests. Also, by running it on a handful > >> of .ii files, I can see that we are able to thread sub-paths that we > >> previously dropped on the floor. More is better, right? :) > >> > >> To test this, I stole Jeff's method of using cachegrind to benchmark > >> instruction counts and conditional branches > >> (https://gcc.gnu.org/ml/gcc-patches/2013-11/msg02434.html). > >> > >> Basically, I bootstrapped two compilers, with and without improvements, > >> and used each to build a stage1 trunk. Each of these stage1-trunks were > >> used on 246 .ii GCC files I have lying around from a bootstrap some > >> random point last year. I used no special flags on builds apart from > >> --enable-languages=c,c++. > >> > >> Although I would've wished a larger improvement, this works comes for > >> free, as it's just a subset of other work I'm hacking on. > >> > >> Without further ado, here are my monumental, earth shattering improvements: > >> > >> Conditional branches > >> Without patch: 411846839709 > >> With patch: 411831330316 > >> %changed: -0.0037660% > >> > >> Number of instructions > >> Without patch: 2271844650634 > >> With patch: 2271761153578 > >> %changed: -0.0036754% > >> > >> > >> OK for trunk? > >> Aldy > >> > >> p.s. There is a lot of debugging/dumping code in my patch, which I can > >> gladly remove if/when approved. It helps keep my head straight while > >> looking at this spaghetti :). > >> > >> curr.patch > >> > >> > >> gcc/ > >> > >> * tree-ssa-threadupdate.c (mark_threaded_blocks): Avoid > >> dereferencing path[] beyond its length. > >> (debug_path): New. > >> (debug_all_paths): New. > >> (rewire_first_differing_edge): New. > >> (adjust_paths_after_duplication): New. > >> (duplicate_thread_path): Call adjust_paths_after_duplication. > >> Add new argument. > >> (thread_through_all_blocks): Add new argument to > >> duplicate_thread_path. > > This is fine for the trunk. I'd keep the dumping code as-is. It'll be > > useful in the future :-) > > Retested against current trunk and committed. >
Hi, I've noticed a regression on aarch64: FAIL: gcc.dg/tree-ssa/ssa-dom-thread-7.c scan-tree-dump thread3 "Jumps threaded: 3" very likely caused by this patch (appeared between 262282 and 262294) Christophe > Thanks. > > Aldy