On 9/8/21 3:49 PM, Richard Biener wrote:
On Wed, Sep 8, 2021 at 3:25 PM Aldy Hernandez <al...@redhat.com> wrote:
It would be helpful to have the patch causing the issue to look at the IL.
But as Micha said, there needs to be a perfect loop nest for interchange
to work.
Richard.
Absolutely! I'm attaching the reduced testcase, as well as the patch.
The problematic thread shows up in the thread2 dump:
Checking profitability of path (backwards): bb:3 (4 insns) bb:9 (0
insns) bb:5
Control statement insns: 2
Overall: 2 insns
Registering FSM jump thread: (5, 9) incoming edge; (9, 3) (3, 8)
nocopy; (3, 8)
Well, so as Micha said, the threading destroys the outer loop by
making it have two entries - we actually _do_ recover from this
but it results in a non-empty latch of the outer loop and thus
the loop is no longer a perfect nest. The interchange pass has
special-sauce to recover from the loop store motion applied
but not to handle the kind of loop rotation the jump threading
caused.
Understood. The backedge into BB8 and the non-empty latch seem to cause
problems.
The forward threader guards against this by simply disallowing
threadings that involve different loops. As I see
The thread in question (5->9->3) is all within the same outer loop,
though. BTW, the backward threader also disallows threading across
different loops (see path_crosses_loops variable).
the threading done here should be 7->3 (outer loop entry) to bb 8
rather than one involving the backedge. Also note the condition
is invariant in the loop and in fact subsumed by the condition
outside of the loop and it should have been simplified by
VRP after pass_ch but I see there's a jump threading pass
inbetween pass_ch and the next VRP which is likely the
problem.
A 7->3->8 thread would cross loops though, because 7 is outside the
outer loop.
Why does jump threading not try to simplify a condition before
attempting to thread through it? After all ranger should be around?
That's a very good question ;-). The current code does not use the
ranger to solve unknowns. Anything further back than the first block in
a path will return varying. The plan was to add this ranger solving
functionality as a follow-up.
I have a whole slew of pending patches adding precisely this
functionality. We should be able to solve ranges outside the path with
ranger, as well as relationals occurring in a path.
However, even if there are alternate ways of threading this IL,
something like 5->9->3 could still happen. We need a way to disallow
this. I'm having a hard time determining the hammer for this. I would
vote for disabling threading through latches, but it seems the backward
threader is aware of this scenario and allows it anyhow (see
threaded_through_latch variable). Ughh.
Thanks for looking into this.
Aldy