>On Wed, Mar 21, 2018 at 8:04 PM, Sebastiaan Peters ><sebasp...@hotmail.com> wrote: >>>On Wed, Mar 21, 2018 at 9:50 AM, Sebastiaan Peters >>><sebasp...@hotmail.com> wrote: >>>>>On Tue, Mar 20, 2018 at 3:49 PM, David Malcolm <dmalc...@redhat.com> wrote: >>>>>> On Tue, 2018-03-20 at 14:02 +0100, Richard Biener wrote: >>>>>>> On Mon, Mar 19, 2018 at 9:55 PM, Richard Biener >>>>>>> <richard.guent...@gmail.com> wrote: >>>>>>> > On March 19, 2018 8:09:32 PM GMT+01:00, Sebastiaan Peters <sebaspe9 >>>>>>> > 7...@hotmail.com> wrote: >>>>>>> > > > The goal should be to extend TU wise parallelism via make to >>>>>>> > > > function >>>>>>> > > >>>>>>> > > wise parallelism within GCC. >>>>>>> > > >>>>>>> > > Could you please elaborate more on this? >>>>>>> > >>>>>>> > In the abstract sense you'd view the compilation process separated >>>>>>> > into N stages, each function being processed by each. You'd assign >>>>>>> > a thread to each stage and move the work items (the functions) >>>>>>> > across the set of threads honoring constraints such as an IPA stage >>>>>>> > needing all functions completed the previous stage. That allows you >>>>>>> > to easier model the constraints due to shared state (like no pass >>>>>>> > operating on two functions at the same time) compared to a model >>>>>>> > where you assign a thread to each function. >>>>>>> > >>>>>>> > You'll figure that the easiest point in the pipeline to try this >>>>>>> > 'pipelining' is after IPA has completed and until RTL is generated. >>>>>>> > >>>>>>> > Ideally the pipelining would start as early as the front ends >>>>>>> > finished parsing a function and ideally we'd have multiple >>>>>>> > functions in the RTL pipeline. >>>>>>> > >>>>>>> > The main obstacles will be the global state in the compiler of >>>>>>> > which there is the least during the GIMPLE passes (mostly cfun and >>>>>>> > current_function_decl plus globals in the individual passes which >>>>>>> > is easiest dealt with by not allowing a single pass to run at the >>>>>>> > same time in multiple threads). TLS can be used for some of the >>>>>>> > global state plus of course some global data structures need >>>>>>> > locking. >>>> >>>> This would mean that all the passes have to be individually analyzed for >>>> which global state >>>> they use, and lock/schedule them accordingly? >>> >>>Their private global state would be excempt by assuring that a pass never >>>runs twice at the same time. >>> >>>The global state that remains should be the same for all passes we are >>>talking >>>about (during the late GIMPLE optimization phase which I'd tackle). >>> >>>> If this is the case, is there any documentation that describes the pre-reqs >>>> for each pass? >>>> I have looked at the internal documentation, however it seems that all of >>>> this still has to be created? >>> >>>The prereqs are actually the same and not very well documented (if at all). >>>There's the global GC memory pool where we allocate statements and >>>stuff like that from (and luckyly statements themselves are function >>>private). >>>Then there's global trees like types ('int') where modification needs to be >>>appropriately guarded. Note that "modification" means for example >>>building a new type for the address of 'int' given that all different >>>pointer types >>>to 'int' are chained in a list rooted in the tree for 'int'. That >>>means (a few?) >>>tree building helpers need to be guarded with locks. I don't have a great >>>idea how to identify those apart from knowing them in advance or running >>>into races ... my gut feeling is that there's not a lot to guard but I may >>>be wrong ;) >> >> What does it mean to be a node of a type tree? >> Does it describe information about that type, >> or does it keep a reference to where something >> of that type has been declared? > >On the GIMPLE level we have two main kinds of data objects, one is >'gimple' (and derived clases) for statements and 'tree' for operands, >types, declarations. 'tree's is also what GENERIC is represented in, the >IL that gets produced by the frontends which is lowered to GIMPLE by >the gimplifier. > >On GENERIC everything is a 'tree' (there's a class hierarchy of trees >implemented in C ways via unions), types are, and so are decls >and expressions (in GENERIC). You can look at tree.h (and tree-core.h >for implementation details) and for example see > >#define TYPE_POINTER_TO(NODE) (TYPE_CHECK (NODE)->type_common.pointer_to) > >where tree.c:build_pointer_type_for_mode does > >/* First, if we already have a type for pointers to TO_TYPE and it's >the proper mode, use it. */ >for (t = TYPE_POINTER_TO (to_type); t; t = TYPE_NEXT_PTR_TO (t)) >if (TYPE_MODE (t) == mode && TYPE_REF_CAN_ALIAS_ALL (t) == can_alias_all) >return t; > >so if a pass builds a new pointer type that doesn't already exist we >modify this list. >This means an existing type tree is modified even if the type itself >isn't modified. > >>>> As to how this would be implemented, it seems the easiest way would be to >>>> extend the macros to >>>> accept a pre-req pass. This would encourage more documentation since the >>>> dependencies >>>> become explicit instead of the current implicit ordering. >>> >>>Actually the order is quite explicit. Maybe I now understand your >>>question - no, >>>passes do not "communicate" between each other via global state, all such >>>state is per function and the execution order of passes on a given function >>>is hard-coded in passes.def. >>> >>>> Assuming the dependencies for the all the tree-ssa passes have to be >>>> individually analyzed. >>>> Currently I have this as my timeline: >>>> - Parallelize the execution of the post-IPA pre-RTL, and a few tree-ssa >>>> passes (mid-may - early june) >>>> - Test for possible reproducibility issues for the binary/debug info >>>> (early june - late june) >>>> - Parallelize the rest of tree-ssa (late june - late july) >>>> - Update documentation and test again for reproducibility issues (late >>>> july - early august) >>>> >>>> Would this be acceptable? >>> >>>Sounds ambitious ;) But yes, it sounds reasonable. I don't exactly >>>understand what "Parallelize the rest of tree-ssa" means though. If >>>you want to tackle a tiny bit more I'd rather include "RTL" by treating >>>the whole RTL part as a single "pass" (as said only one function can >>>be in RTL right now). >>> >> >> I was under the assumption that passes had to be indivdually analysed >> when I wrote that. The timeline is now updated to include some extra >> time in the beginning to familiarize myself a bit more with the project, >> and added the RTL as an optional deliverable:) >> then >> I added a draft of the proposal[0], any feedback would be highly appreciated. > >The poposal looks good to me! > >Thanks, >Richard.
Awesome, I'll submit this is the coming days. Thanks for all your help, it has helped tremendously. I'm looking forward to (hopefully) work with you this summer. Kind regards, Sebastiaan Peters >>>>>>> Oh, and just to mention - there are a few things that may block >>>>>>> adoption in the end >>>>>>> like whether builds are still reproducible (we allocate things like >>>>>>> DECL_UID from >>>>>>> global pools and doing that somewhat randomly because of threading >>>>>>> might - but not >>>>>>> must - change code generation). Or that some diagnostics will appear >>>>>>> in >>>>>>> non-deterministic order, or that dump files are messed up (both >>>>>>> issues could be >>>>>>> solved by code dealing with the issue, like buffering and doing a re- >>>>>>> play in >>>>>>> program order). I guess reproducability is important when it comes >>>>>>> down to >>>>>>> debugging code-generation issues - I'd prefer to debug gcc when it >>>>>>> doesn't run >>>>>>> threaded but if that doesn't reproduce an issue that's bad. >>>>>>> >>>>>>> So the most important "milestone" of this project is to identify such >>>>>>> issues and >>>>>>> document them somewhere. >>>>>> >>>>>> One issue would be the garbage-collector: there are plenty of places in >>>>>> GCC that have hidden assumptions that "a collection can't happen here" >>>>>> (where we have temporaries that reference GC-managed objects, but which >>>>>> aren't tracked by GC-roots). >>>>>> >>>>>> I had some patches for that back in 2014 that I think I managed to drop >>>>>> on the floor (sorry): >>>>>> https://gcc.gnu.org/ml/gcc-patches/2014-11/msg01300.html >>>>>> https://gcc.gnu.org/ml/gcc-patches/2014-11/msg01340.html >>>>>> https://gcc.gnu.org/ml/gcc-patches/2014-11/msg01510.html >>>> >>>> Would there be a way to easily create a static analyzer to find these >>>> untracked temporaries? >>> >>>I don't think so. >>> >>>> A quick look at registered passes makes me count ~135 tree-ssa passes, >>>> So your code on analyzing what globals are referenced where might come in >>>> handy while analyzing if passes are easily parallelized. >>> >>>I think the "solution" to the garbage collecting issue is to simply keep >>>that serialized. It's currently done on-demand anyway at certain >>>safe collection points so the work scheduler can simply hold off >>>scheduling more work when the collector would want to run, waiting for >>>running jobs to complete. >>> >>>Richard. >>> >>>>>> The GC's allocator is used almost everywhere, and is probably not >>>>>> thread-safe yet. >>>>>Yes. There's also global tree modification like chaining new >>>>>pointer types into TYPE_POINTER_TO and friends so some >>>>>helpers in tree.c need to be guarded as well. >>>>>> FWIW I gave a talk at Cauldron 2013 about global state in GCC. Beware: >>>>>> it's five years out-of-date, but maybe is still relevant in places? >>>>>> https://dmalcolm.fedorapeople.org/gcc/global-state/ >>>>>> https://gcc.gnu.org/ml/gcc/2013-05/msg00015.html >>>>>> (I tackled this for libgccjit by instead introducing a mutex, a "big >>>>>> compiler lock", jit_mutex in gcc/jit/jit-playback.c, held by whichever >>>>>> thread is calling into the rest of the compiler sources). >>>>>> >>>>>> Hope this is helpful >>>>>> Dave >>>>>> >>>>>> >> >> [0] >> https://docs.google.com/document/d/1YkKYI3J-pKFfHLdxljWVoJ89l2oVGibxslIW3ikluz8/edit?usp=sharing <https://docs.google.com/document/d/1YkKYI3J-pKFfHLdxljWVoJ89l2oVGibxslIW3ikluz8/edit?usp=sharing>