On 12/12/19 4:11 AM, Segher Boessenkool wrote:
Hi Nick,
On Sun, Dec 08, 2019 at 03:03:56PM -0500, Nicholas Krause wrote:
The first questions are:
1. What current heuristics do we have as it seems none for figuring out
what state is shared
as it seems none? If I correct the first thing to do is discuss what
bits/bitmasks we want
for figuring out shared state or other ways.
Shared between what and what?
Between the passes in gcc. If we can launch certain passes in gcc on
another
thread and join up to other passes that depend on the state used by a
earlier
pass. For example if a loop pass does not touch outside a function in GIMPLE
or RTL we should launch it on another thread. Then join up to the next pass
that requires the state. Seems not all passes touch everything or only parts
of either GIMPLE or RTL so this may be worth considering. The question
was what internal compiler data can be currently use for finding when this
should be the case . I don't see anything so figuring out how to detect
this
is going to be part of the challenge.
I'm going to write up a wiki article on the GCC wiki explaining it
better but
that's a very brief idea alongside some other ideas like figuring out if
dominator trees should or can be lockless or very close to in nature
for insertion/deletion.
2. MD files seem to be a major source of shared state or reading them.
Is it possible
to read from them async? Doesn't seem to be a problem but the current
docs don't
mention it nor does it seem easy to do.
MD files are not read *at all* by the compiler itself; they aren't
installed, even. They are read by the gen* programs when the compiler
itself is built, to create the insn-*.c files and the like.
3. There are two ways to write this for RTL either one class for all the
state or a core
class will each major part being a subclass like delayed branch
scheduling e.t.c.Not sure
which is better so thought I would ask.
RTL as it is is pretty efficient. Please keep it that way. It also is
a dumb (and very "open") data structure, by design. See how "XEXP" and
similar work.
That could be changed of course, for non-trivial cost, but what for?
I'm not talking about changing RTL itself it terms of its optimizations
but rewriting it for reading work queues in parallel on non shared
state between the current running pass and joining it back up
to the next pass requiring it.
For example why not run parts of the register allocation on separate
work queues if possible? I was asking Peter at Cauldron about the
register part and he seems to like doing something like this for
cost of allocating registers if I recall correctly.
Hopefully that explains it a little better,
Nick
Segher