On Mon, 24 Jun 2019, Giuliano Belinassi wrote: > Hi, > > Parallelize GCC with Threads -- First Evaluation > > Hi everyone, > > I am attaching the first evaluation report here publicly for gathering > feedback. The file is in markdown format and it can be easily be converted to > PDF for better visualization. > > I am also open to suggestions and ideas in order to improve the current > project :-) > > My branch can be seen here: > https://gitlab.com/flusp/gcc/tree/giulianob_parallel
Thanks for your work on this! I didn't think of the default_obstack and input_location issues so it's good we get to know these. The bitmap default obstack is merely a convenience and fortunately (content!) lifetime is constrained to be per-pass though that's not fully enforced. Your solution is fine I think and besides when parallelizing the IPA phase should be not worse than making the default-obstack per thread context given how many times we zap it. input_location shouldn't really be used... but oh well. As of per-pass global state I expect that eventually all global variables in the individual passes are a problem and in nearly all cases they are global out of laziness to pass down state across functions. You identified some global state in infrastructure code which are the more interesting cases, most relevant for GIMPLE are the ones in tree-ssa-operands.c, tree-cfg.c and et-forest.c I guess. For individual passes a first step would be to wrap all globals into a struct we can allocate at ::execute time and pass down as pointer. That's slightly less intrusive than wrapping all of the pass in a class but functionally equivalent. <<< 1. The GCC `object_pool_allocator` There is also the GCC object_pool_allocator, which is used to allocate some objects. Since these objects may be used later in the compilation by other threads, we can't simply make them private to each thread. Therefore I added a threadsafe_object_pool_allocator object that currently uses locks guarantee safety, however I am not able to check its correctness. This is also not efficient and might require a better approach later. >>> I guess the same applies to the GC allocator - to make these more efficient we'd have a per-thread freelist we can allocate from without locking and which we'd, once empty, fill from the main pool in larger chunks with locking. At thread finalization we have to return the freelist to the main allocator then and for the GC allocator possibly at garbage collection time. This scheme may also work for the bitmap default obstack and its freelist. I would also suggest when you run into a specific issue with the default obstack to use a separate obstack in the respective area. You mention issues with LTO, are the reproducible on the branch? I suppose you are currently testing GCC with num_threads = 1 to make sure you are not introducing non-threading related issues? Thanks again, Richard.