On Mon, 24 Jun 2019, Giuliano Belinassi wrote:

> Hi,
> 
> Parallelize GCC with Threads -- First Evaluation
> 
> Hi everyone,
> 
> I am attaching the first evaluation report here publicly for gathering
> feedback. The file is in markdown format and it can be easily be converted to
> PDF for better visualization.
> 
> I am also open to suggestions and ideas in order to improve the current 
> project :-)
> 
> My branch can be seen here: 
> https://gitlab.com/flusp/gcc/tree/giulianob_parallel

Thanks for your work on this!

I didn't think of the default_obstack and input_location issues
so it's good we get to know these.  The bitmap default obstack
is merely a convenience and fortunately (content!) lifetime is constrained
to be per-pass though that's not fully enforced.  Your solution is fine
I think and besides when parallelizing the IPA phase should be not
worse than making the default-obstack per thread context given how
many times we zap it.

input_location shouldn't really be used...  but oh well.

As of per-pass global state I expect that eventually all
global variables in the individual passes are a problem
and in nearly all cases they are global out of laziness
to pass down state across functions.

You identified some global state in infrastructure code
which are the more interesting cases, most relevant for
GIMPLE are the ones in tree-ssa-operands.c, tree-cfg.c
and et-forest.c I guess.

For individual passes a first step would be to wrap
all globals into a struct we can allocate at ::execute
time and pass down as pointer.  That's slightly less
intrusive than wrapping all of the pass in a class
but functionally equivalent.

<<<
1. The GCC `object_pool_allocator`

    There is also the GCC object_pool_allocator, which is used to allocate 
some
    objects. Since these objects may be used later in the compilation by 
other
    threads, we can't simply make them private to each thread. Therefore I 
added a
    threadsafe_object_pool_allocator object that currently uses locks 
guarantee
    safety, however I am not able to check its correctness. This is also
    not efficient and might require a better approach later.
>>>

I guess the same applies to the GC allocator - to make these
more efficient we'd have a per-thread freelist we can allocate
from without locking and which we'd, once empty, fill from the
main pool in larger chunks with locking.  At thread finalization
we have to return the freelist to the main allocator then
and for the GC allocator possibly at garbage collection time.

This scheme may also work for the bitmap default obstack
and its freelist.  I would also suggest when you run into
a specific issue with the default obstack to use a separate
obstack in the respective area.  You mention issues with LTO,
are the reproducible on the branch?  I suppose you are
currently testing GCC with num_threads = 1 to make sure you
are not introducing non-threading related issues?

Thanks again,
Richard.

Reply via email to