>On Wed, Mar 21, 2018 at 9:50 AM, Sebastiaan Peters ><sebasp...@hotmail.com> wrote: >>>On Tue, Mar 20, 2018 at 3:49 PM, David Malcolm <dmalc...@redhat.com> wrote: >>>> On Tue, 2018-03-20 at 14:02 +0100, Richard Biener wrote: >>>>> On Mon, Mar 19, 2018 at 9:55 PM, Richard Biener >>>>> <richard.guent...@gmail.com> wrote: >>>>> > On March 19, 2018 8:09:32 PM GMT+01:00, Sebastiaan Peters <sebaspe9 >>>>> > 7...@hotmail.com> wrote: >>>>> > > > The goal should be to extend TU wise parallelism via make to >>>>> > > > function >>>>> > > >>>>> > > wise parallelism within GCC. >>>>> > > >>>>> > > Could you please elaborate more on this? >>>>> > >>>>> > In the abstract sense you'd view the compilation process separated >>>>> > into N stages, each function being processed by each. You'd assign >>>>> > a thread to each stage and move the work items (the functions) >>>>> > across the set of threads honoring constraints such as an IPA stage >>>>> > needing all functions completed the previous stage. That allows you >>>>> > to easier model the constraints due to shared state (like no pass >>>>> > operating on two functions at the same time) compared to a model >>>>> > where you assign a thread to each function. >>>>> > >>>>> > You'll figure that the easiest point in the pipeline to try this >>>>> > 'pipelining' is after IPA has completed and until RTL is generated. >>>>> > >>>>> > Ideally the pipelining would start as early as the front ends >>>>> > finished parsing a function and ideally we'd have multiple >>>>> > functions in the RTL pipeline. >>>>> > >>>>> > The main obstacles will be the global state in the compiler of >>>>> > which there is the least during the GIMPLE passes (mostly cfun and >>>>> > current_function_decl plus globals in the individual passes which >>>>> > is easiest dealt with by not allowing a single pass to run at the >>>>> > same time in multiple threads). TLS can be used for some of the >>>>> > global state plus of course some global data structures need >>>>> > locking. >> >> This would mean that all the passes have to be individually analyzed for >> which global state >> they use, and lock/schedule them accordingly? > >Their private global state would be excempt by assuring that a pass never >runs twice at the same time. > >The global state that remains should be the same for all passes we are talking >about (during the late GIMPLE optimization phase which I'd tackle). > >> If this is the case, is there any documentation that describes the pre-reqs >> for each pass? >> I have looked at the internal documentation, however it seems that all of >> this still has to be created? > >The prereqs are actually the same and not very well documented (if at all). >There's the global GC memory pool where we allocate statements and >stuff like that from (and luckyly statements themselves are function private). >Then there's global trees like types ('int') where modification needs to be >appropriately guarded. Note that "modification" means for example >building a new type for the address of 'int' given that all different >pointer types >to 'int' are chained in a list rooted in the tree for 'int'. That >means (a few?) >tree building helpers need to be guarded with locks. I don't have a great >idea how to identify those apart from knowing them in advance or running >into races ... my gut feeling is that there's not a lot to guard but I may >be wrong ;)
What does it mean to be a node of a type tree? Does it describe information about that type, or does it keep a reference to where something of that type has been declared? >> As to how this would be implemented, it seems the easiest way would be to >> extend the macros to >> accept a pre-req pass. This would encourage more documentation since the >> dependencies >> become explicit instead of the current implicit ordering. > >Actually the order is quite explicit. Maybe I now understand your >question - no, >passes do not "communicate" between each other via global state, all such >state is per function and the execution order of passes on a given function >is hard-coded in passes.def. > >> Assuming the dependencies for the all the tree-ssa passes have to be >> individually analyzed. >> Currently I have this as my timeline: >> - Parallelize the execution of the post-IPA pre-RTL, and a few tree-ssa >> passes (mid-may - early june) >> - Test for possible reproducibility issues for the binary/debug info >> (early june - late june) >> - Parallelize the rest of tree-ssa (late june - late july) >> - Update documentation and test again for reproducibility issues (late >> july - early august) >> >> Would this be acceptable? > >Sounds ambitious ;) But yes, it sounds reasonable. I don't exactly >understand what "Parallelize the rest of tree-ssa" means though. If >you want to tackle a tiny bit more I'd rather include "RTL" by treating >the whole RTL part as a single "pass" (as said only one function can >be in RTL right now). > I was under the assumption that passes had to be indivdually analysed when I wrote that. The timeline is now updated to include some extra time in the beginning to familiarize myself a bit more with the project, and added the RTL as an optional deliverable:) I added a draft of the proposal[0], any feedback would be highly appreciated. >>>>> Oh, and just to mention - there are a few things that may block >>>>> adoption in the end >>>>> like whether builds are still reproducible (we allocate things like >>>>> DECL_UID from >>>>> global pools and doing that somewhat randomly because of threading >>>>> might - but not >>>>> must - change code generation). Or that some diagnostics will appear >>>>> in >>>>> non-deterministic order, or that dump files are messed up (both >>>>> issues could be >>>>> solved by code dealing with the issue, like buffering and doing a re- >>>>> play in >>>>> program order). I guess reproducability is important when it comes >>>>> down to >>>>> debugging code-generation issues - I'd prefer to debug gcc when it >>>>> doesn't run >>>>> threaded but if that doesn't reproduce an issue that's bad. >>>>> >>>>> So the most important "milestone" of this project is to identify such >>>>> issues and >>>>> document them somewhere. >>>> >>>> One issue would be the garbage-collector: there are plenty of places in >>>> GCC that have hidden assumptions that "a collection can't happen here" >>>> (where we have temporaries that reference GC-managed objects, but which >>>> aren't tracked by GC-roots). >>>> >>>> I had some patches for that back in 2014 that I think I managed to drop >>>> on the floor (sorry): >>>> https://gcc.gnu.org/ml/gcc-patches/2014-11/msg01300.html >>>> https://gcc.gnu.org/ml/gcc-patches/2014-11/msg01340.html >>>> https://gcc.gnu.org/ml/gcc-patches/2014-11/msg01510.html >> >> Would there be a way to easily create a static analyzer to find these >> untracked temporaries? > >I don't think so. > >> A quick look at registered passes makes me count ~135 tree-ssa passes, >> So your code on analyzing what globals are referenced where might come in >> handy while analyzing if passes are easily parallelized. > >I think the "solution" to the garbage collecting issue is to simply keep >that serialized. It's currently done on-demand anyway at certain >safe collection points so the work scheduler can simply hold off >scheduling more work when the collector would want to run, waiting for >running jobs to complete. > >Richard. > >>>> The GC's allocator is used almost everywhere, and is probably not >>>> thread-safe yet. >>>Yes. There's also global tree modification like chaining new >>>pointer types into TYPE_POINTER_TO and friends so some >>>helpers in tree.c need to be guarded as well. >>>> FWIW I gave a talk at Cauldron 2013 about global state in GCC. Beware: >>>> it's five years out-of-date, but maybe is still relevant in places? >>>> https://dmalcolm.fedorapeople.org/gcc/global-state/ >>>> https://gcc.gnu.org/ml/gcc/2013-05/msg00015.html >>>> (I tackled this for libgccjit by instead introducing a mutex, a "big >>>> compiler lock", jit_mutex in gcc/jit/jit-playback.c, held by whichever >>>> thread is calling into the rest of the compiler sources). >>>> >>>> Hope this is helpful >>>> Dave >>>> >>>> [0] https://docs.google.com/document/d/1YkKYI3J-pKFfHLdxljWVoJ89l2oVGibxslIW3ikluz8/edit?usp=sharing