On 06/25, Richard Biener wrote: > On Tue, 25 Jun 2019, Giuliano Belinassi wrote: > > > Hi, Richard > > > > On 06/25, Richard Biener wrote: > > > On Mon, 24 Jun 2019, Giuliano Belinassi wrote: > > > > > > > Hi, > > > > > > > > Parallelize GCC with Threads -- First Evaluation > > > > > > > > Hi everyone, > > > > > > > > I am attaching the first evaluation report here publicly for gathering > > > > feedback. The file is in markdown format and it can be easily be > > > > converted to > > > > PDF for better visualization. > > > > > > > > I am also open to suggestions and ideas in order to improve the current > > > > project :-) > > > > > > > > My branch can be seen here: > > > > https://gitlab.com/flusp/gcc/tree/giulianob_parallel > > > > > > Thanks for your work on this! > > > > > > I didn't think of the default_obstack and input_location issues > > > so it's good we get to know these. The bitmap default obstack > > > is merely a convenience and fortunately (content!) lifetime is constrained > > > to be per-pass though that's not fully enforced. Your solution is fine > > > I think and besides when parallelizing the IPA phase should be not > > > worse than making the default-obstack per thread context given how > > > many times we zap it. > > > > This is something that is bothering me. In this current state, when I > > declare the default_obstack as TLS, several LTO tests fails in > > ssa_verify and I am not sure why yet. I will investigate futher in > > this. > > If you tell me how to reproduce on your branch I can have a look as well.
To reproduce it, clone the branch with: git clone https://gitlab.com/flusp/gcc.git Then change branch with git checkout giulianob_parallel create a new folder (build_parallel, for instance) and compile it with ../configure --disable-bootstrap --enable-languages=c make -j<NUM_JOBS> Then ``` diff --git a/gcc/bitmap.c b/gcc/bitmap.c index 894aefa13de..87e6c7ac6fc 100644 --- a/gcc/bitmap.c +++ b/gcc/bitmap.c @@ -65,7 +65,7 @@ release_overhead (bitmap b, size_t amount, bool remove_from_map) /* Global data */ bitmap_element bitmap_zero_bits; /* An element of all zero bits. */ -bitmap_obstack bitmap_default_obstack; /* The default bitmap obstack. */ +__thread bitmap_obstack bitmap_default_obstack; /* The default bitmap obstack. */ static int bitmap_default_obstack_depth; static GTY((deletable)) bitmap_element *bitmap_ggc_free; /* Freelist of GC'd elements. */ diff --git a/gcc/bitmap.h b/gcc/bitmap.h index 39f509db611..b31e99ffcd0 100644 --- a/gcc/bitmap.h +++ b/gcc/bitmap.h @@ -359,7 +359,7 @@ struct GTY(()) bitmap_head { /* Global data */ extern bitmap_element bitmap_zero_bits; /* Zero bitmap element */ -extern bitmap_obstack bitmap_default_obstack; /* Default bitmap obstack */ +extern __thread bitmap_obstack bitmap_default_obstack; /* Default bitmap obstack */ /* Change the view of the bitmap to list, or tree. */ void bitmap_list_view (bitmap); ``` then recompile. Just a side note, while I was writting this, Richi found the issue. The obstack_depth also needs to be marked as TLS. Why it only crashed LTO is a mistery, through. > > > As for IPA phase, you mean the expand_ipa or the IPA itself? If > > expand_ipa falls in Inter Process otimization scheme the current > > strategy will work, of course. > > I meant IPA itself. > > > As for IPA itself, I am still not sure if it is parallelizable yet, > > however, I've seen some academic works parallelizing Dataflow > > Analysis, which can give interesting ideas about how to handle the call > > graphs (?). But this may be something interesting as the LTO WHOPR is > > entirely sequential AFIK. > > > > > > > > input_location shouldn't really be used... but oh well. > > > > > > As of per-pass global state I expect that eventually all > > > global variables in the individual passes are a problem > > > and in nearly all cases they are global out of laziness > > > to pass down state across functions. > > > > > > You identified some global state in infrastructure code > > > which are the more interesting cases, most relevant for > > > GIMPLE are the ones in tree-ssa-operands.c, tree-cfg.c > > > and et-forest.c I guess. > > > > > > For individual passes a first step would be to wrap > > > all globals into a struct we can allocate at ::execute > > > time and pass down as pointer. That's slightly less > > > intrusive than wrapping all of the pass in a class > > > but functionally equivalent. > > > > Sounds really good, but this will also require a lot of code change. > > Yeah, it's really a monkeys job ;) One reason I originally proposed > the pipeline approach to avoid the need to fix all of these. > > > Hopefully most of it can be solved without any unpleasant surprises. > > I would guess so, it's also a generally desirable cleanup. > > > > > > > <<< > > > 1. The GCC `object_pool_allocator` > > > > > > There is also the GCC object_pool_allocator, which is used to > > > allocate > > > some > > > objects. Since these objects may be used later in the compilation by > > > other > > > threads, we can't simply make them private to each thread. Therefore > > > I > > > added a > > > threadsafe_object_pool_allocator object that currently uses locks > > > guarantee > > > safety, however I am not able to check its correctness. This is also > > > not efficient and might require a better approach later. > > > >>> > > > > > > I guess the same applies to the GC allocator - to make these > > > more efficient we'd have a per-thread freelist we can allocate > > > from without locking and which we'd, once empty, fill from the > > > main pool in larger chunks with locking. At thread finalization > > > we have to return the freelist to the main allocator then > > > and for the GC allocator possibly at garbage collection time. > > > > This also sounds good. If I understeand correctly, the sequential > > allocation state will be amortized, as we could ask for an object which > > is 2x bigger than each thread currently has. (I am assuming allocation > > is O(1)) > > Yeah, it's mostly trading (freelist) memory for speed. > > > > > > > This scheme may also work for the bitmap default obstack > > > and its freelist. I would also suggest when you run into > > > a specific issue with the default obstack to use a separate > > > obstack in the respective area. You mention issues with LTO, > > > are the reproducible on the branch? I suppose you are > > > currently testing GCC with num_threads = 1 to make sure you > > > are not introducing non-threading related issues? > > > > Yes. The issue with LTO is that obstack, as I mentioned earlier. It is > > reproducible in my branch, just compile it without any changes, then > > declare the obstack TLS, recompile the changes and run gcc.dg tests. > > Trying that but cannot reproduce any failure sofar. Did you adjust > bitmap_default_obstack_depth to be TLS as well? No. You found the issue :P > > Richard.