On Mon, Oct 25, 2021 at 6:58 PM Andrew MacLeod <amacl...@redhat.com> wrote:
>
> On 10/20/21 6:28 AM, Aldy Hernandez wrote:
> > Sometimes we can solve a candidate path without having to recurse
> > further back.  This can mostly happen in fully resolving mode, because
> > we can ask the ranger what the range on entry to the path is, but
> > there's no reason this can't always apply.  This one-liner removes
> > the fully-resolving restriction.
> >
> > I'm tickled pink to see how many things we now get quite early
> > in the compilation.  I actually had to disable jump threading entirely
> > for a few tests because the early threader was catching things
> > disturbingly early.  Also, as Richi predicted, I saw a lot of pre-VRP
> > cleanups happening.
> >
> > I was going to commit this as obvious, but I think the test changes
> > merit discussion.
> >
> > We've been playing games with gcc.dg/tree-ssa/ssa-thread-11.c for quite
> > some time.  Every time a threading pass gets smarter, we push the
> > check further down the pipeline.  We've officially run out of dumb
> > threading passes to disable ;-).  In the last year we've gone up from a
> > handful of threads, to 34 threads with the current combination of
> > options.  I doubt this is testing anything useful any more, so I've
> > removed it.
> >
> > Similarly for gcc.dg/tree-ssa/ssa-dom-thread-4.c.  We used to thread 3
> > jump threads, but they were disallowed because of loop rotation.  Then
> > we started catching more jump threads in VRP2 threading so we tested
> > there.  With this patch though, we triple the number of threads found
> > from 11 to 31.  I believe this test has outlived its usefulness, and
> > I've removed it.  Note that even though we have these outrageous
> > possibilities for this test, the block copier ultimately chops them
> > down (23 survive though).
>
> Im running into an issue with ssa-dom-thread-4.c when trying to run
> ranger for the VRP2 pass.  It reduces the number of threads to 2, and
> upon closer inspection as to why, I see:
>
> unsigned char
> bitmap_ior_and_compl (bitmap dst, const_bitmap a, const_bitmap b,
>                        const_bitmap kill)
> {
>    unsigned char changed = 0;
>
>    bitmap_element *dst_elt;
>    const bitmap_element *a_elt, *b_elt, *kill_elt, *dst_prev;
>
>    while (a_elt || b_elt)
>      {
>
> Ranger determines that the uses of a_elt and b_elt in the guard are used
> before defined, so assumed UNDFINED and removes a condition check.
>
> So it seems like this entire test case is predicated on undefined
> behaviour?  fwiw, If I initialize them, I get 0 threads...

Hah.  That makes sense.  As the threaders have gotten smarter, the
scan-tree-dump-times has moved later on in the pipeline to less
capable threaders.  Currently it's testing for 3 threads in the first
VRP threader pass.  With my proposed patch I bet we see an UNDEFINED
somewhere in the calculation, and bail on the entire thread as
unreachable.

Aldy

Reply via email to