On 12/12/19 11:38 AM, Segher Boessenkool wrote:
On Thu, Dec 12, 2019 at 10:21:03AM -0500, Nicholas Krause wrote:
On 12/12/19 4:11 AM, Segher Boessenkool wrote:
On Sun, Dec 08, 2019 at 03:03:56PM -0500, Nicholas Krause wrote:
The first questions are:
1. What current heuristics do we have as it seems none for figuring out
what state is shared
as it seems none? If I correct the first thing to do is discuss what
bits/bitmasks we want
for figuring out shared state or other ways.
Shared between what and what?
Between the passes in gcc. If we can launch certain passes in  gcc on
another
thread and join up to other passes that depend on the state used by a
earlier
pass. For example if a loop pass does not touch outside a function in GIMPLE
or RTL we should launch it on another thread. Then join up to the next pass
that requires the state. Seems not all passes touch everything or only parts
of either GIMPLE or RTL so this may be worth considering. The question
was what internal compiler data can be currently use for finding when this
should be the case .  I don't see anything so figuring out how to detect
this
is going to be part of the challenge.
Every pass depends on the previous pass.  You cannot do later passes
before earlier ones (for the same function).

Each separate pass can do its own thing in whatever way it wants of
course.  Nothing else will access the insn stream while a pass is
running.
My idea was to use something similar to:
https://linux-kernel-labs.github.io/master/labs/deferred_work.html

It's under the section workqueues. It would allow us to defer
non shared work between the passes. For example if a loop
pass a has state that c depends on but not b, why not launch
it on a thread and join back up to c.

Here's it sorta draw out:

a-> has state that c depends on so launch a workqueue
    -> b no shared state with a so continue this in parallel
    ->  c waits for a to join back up with shared or dependency state

Of course if its the same function this would not work but the question
is and it seems to be a lot of passes touch only certain parts of
GIMPLE or RTL. Therefore doing in a workqueue would be ideal
if the state is not shared.

The real question is how to detect when to launch a workqueue based
on how shared state is passed to dependent passes on the earlier
passes transformations of GIMPLE or RTL.
3. There are two ways to write this for RTL either one class for all the
state or a core
class will each major part being a subclass like delayed branch
scheduling e.t.c.Not sure
which is better so thought I would ask.
RTL as it is is pretty efficient.  Please keep it that way.  It also is
a dumb (and very "open") data structure, by design.  See how "XEXP" and
similar work.

That could be changed of course, for non-trivial cost, but what for?
I'm not talking about changing RTL itself it terms of its optimizations
but rewriting it for reading work queues in parallel on non shared
state between the current running pass and joining it back up
to the next pass requiring it.

For example why not run parts of the register allocation on separate
work queues if possible? I was asking Peter at Cauldron about the
register part and he seems to like doing something like this for
cost of allocating registers if I recall correctly.

Hopefully that explains it a little better,
No, I still do not understand what you mean at all :-/

I'm going to invest a little further into how to do this on the register side
but hopefully my diagram above makes sense.

Nick
Sorry for the confusion,
Nick
Segher

Reply via email to