Re: GSOC Question about the parallelization project
On Mon, Mar 19, 2018 at 9:55 PM, Richard Biener wrote: > On March 19, 2018 8:09:32 PM GMT+01:00, Sebastiaan Peters > wrote: >>>The goal should be to extend TU wise parallelism via make to function >>wise parallelism within GCC. >> >>Could you please elaborate more on this? > > In the abstract sense you'd view the compilation process separated into N > stages, each function being processed by each. You'd assign a thread to each > stage and move the work items (the functions) across the set of threads > honoring constraints such as an IPA stage needing all functions completed the > previous stage. That allows you to easier model the constraints due to shared > state (like no pass operating on two functions at the same time) compared to > a model where you assign a thread to each function. > > You'll figure that the easiest point in the pipeline to try this 'pipelining' > is after IPA has completed and until RTL is generated. > > Ideally the pipelining would start as early as the front ends finished > parsing a function and ideally we'd have multiple functions in the RTL > pipeline. > > The main obstacles will be the global state in the compiler of which there is > the least during the GIMPLE passes (mostly cfun and current_function_decl > plus globals in the individual passes which is easiest dealt with by not > allowing a single pass to run at the same time in multiple threads). TLS can > be used for some of the global state plus of course some global data > structures need locking. Oh, and just to mention - there are a few things that may block adoption in the end like whether builds are still reproducible (we allocate things like DECL_UID from global pools and doing that somewhat randomly because of threading might - but not must - change code generation). Or that some diagnostics will appear in non-deterministic order, or that dump files are messed up (both issues could be solved by code dealing with the issue, like buffering and doing a re-play in program order). I guess reproducability is important when it comes down to debugging code-generation issues - I'd prefer to debug gcc when it doesn't run threaded but if that doesn't reproduce an issue that's bad. So the most important "milestone" of this project is to identify such issues and document them somewhere. Richard. > Richard. > >> >>From: Richard Biener >>Sent: Monday, March 19, 2018 18:37 >>To: Sebastiaan Peters >>Cc: gcc@gcc.gnu.org >>Subject: Re: GSOC Question about the parallelization project >> >>On March 19, 2018 4:27:58 PM GMT+01:00, Sebastiaan Peters >> wrote: >>>Thank you for your quick response. >>> >>>Does the GIMPLE optimization pipeline include only the Tree SSA passes >>>or also the RTL passes? >> >>Yes, it only includes only Tree SSA passes. The RTL part of the >>pipeline hasn't been audited to work with multiple functions in RTL >>Form in the same time. >> >>The only parallelized part of the compiler is LTO byte code write-out >>at WPA stage which is done in a "fork-and-forget" mode. >> >>The goal should be to extend TU wise parallelism via make to function >>wise parallelism within GCC. >> >>Richard. >> >>>Are the currently other parts of the compiler that have been >>>parallelized? >>> >>>Kind regards, >>> >>>Sebastiaan Peters >
Re: GSOC Question about the parallelization project
On Tue, 2018-03-20 at 14:02 +0100, Richard Biener wrote: > On Mon, Mar 19, 2018 at 9:55 PM, Richard Biener > wrote: > > On March 19, 2018 8:09:32 PM GMT+01:00, Sebastiaan Peters > 7...@hotmail.com> wrote: > > > > The goal should be to extend TU wise parallelism via make to > > > > function > > > > > > wise parallelism within GCC. > > > > > > Could you please elaborate more on this? > > > > In the abstract sense you'd view the compilation process separated > > into N stages, each function being processed by each. You'd assign > > a thread to each stage and move the work items (the functions) > > across the set of threads honoring constraints such as an IPA stage > > needing all functions completed the previous stage. That allows you > > to easier model the constraints due to shared state (like no pass > > operating on two functions at the same time) compared to a model > > where you assign a thread to each function. > > > > You'll figure that the easiest point in the pipeline to try this > > 'pipelining' is after IPA has completed and until RTL is generated. > > > > Ideally the pipelining would start as early as the front ends > > finished parsing a function and ideally we'd have multiple > > functions in the RTL pipeline. > > > > The main obstacles will be the global state in the compiler of > > which there is the least during the GIMPLE passes (mostly cfun and > > current_function_decl plus globals in the individual passes which > > is easiest dealt with by not allowing a single pass to run at the > > same time in multiple threads). TLS can be used for some of the > > global state plus of course some global data structures need > > locking. > > Oh, and just to mention - there are a few things that may block > adoption in the end > like whether builds are still reproducible (we allocate things like > DECL_UID from > global pools and doing that somewhat randomly because of threading > might - but not > must - change code generation). Or that some diagnostics will appear > in > non-deterministic order, or that dump files are messed up (both > issues could be > solved by code dealing with the issue, like buffering and doing a re- > play in > program order). I guess reproducability is important when it comes > down to > debugging code-generation issues - I'd prefer to debug gcc when it > doesn't run > threaded but if that doesn't reproduce an issue that's bad. > > So the most important "milestone" of this project is to identify such > issues and > document them somewhere. One issue would be the garbage-collector: there are plenty of places in GCC that have hidden assumptions that "a collection can't happen here" (where we have temporaries that reference GC-managed objects, but which aren't tracked by GC-roots). I had some patches for that back in 2014 that I think I managed to drop on the floor (sorry): https://gcc.gnu.org/ml/gcc-patches/2014-11/msg01300.html https://gcc.gnu.org/ml/gcc-patches/2014-11/msg01340.html https://gcc.gnu.org/ml/gcc-patches/2014-11/msg01510.html The GC's allocator is used almost everywhere, and is probably not thread-safe yet. FWIW I gave a talk at Cauldron 2013 about global state in GCC. Beware: it's five years out-of-date, but maybe is still relevant in places? https://dmalcolm.fedorapeople.org/gcc/global-state/ https://gcc.gnu.org/ml/gcc/2013-05/msg00015.html (I tackled this for libgccjit by instead introducing a mutex, a "big compiler lock", jit_mutex in gcc/jit/jit-playback.c, held by whichever thread is calling into the rest of the compiler sources). Hope this is helpful Dave [...]
Re: GSOC Question about the parallelization project
On Tue, Mar 20, 2018 at 3:49 PM, David Malcolm wrote: > On Tue, 2018-03-20 at 14:02 +0100, Richard Biener wrote: >> On Mon, Mar 19, 2018 at 9:55 PM, Richard Biener >> wrote: >> > On March 19, 2018 8:09:32 PM GMT+01:00, Sebastiaan Peters > > 7...@hotmail.com> wrote: >> > > > The goal should be to extend TU wise parallelism via make to >> > > > function >> > > >> > > wise parallelism within GCC. >> > > >> > > Could you please elaborate more on this? >> > >> > In the abstract sense you'd view the compilation process separated >> > into N stages, each function being processed by each. You'd assign >> > a thread to each stage and move the work items (the functions) >> > across the set of threads honoring constraints such as an IPA stage >> > needing all functions completed the previous stage. That allows you >> > to easier model the constraints due to shared state (like no pass >> > operating on two functions at the same time) compared to a model >> > where you assign a thread to each function. >> > >> > You'll figure that the easiest point in the pipeline to try this >> > 'pipelining' is after IPA has completed and until RTL is generated. >> > >> > Ideally the pipelining would start as early as the front ends >> > finished parsing a function and ideally we'd have multiple >> > functions in the RTL pipeline. >> > >> > The main obstacles will be the global state in the compiler of >> > which there is the least during the GIMPLE passes (mostly cfun and >> > current_function_decl plus globals in the individual passes which >> > is easiest dealt with by not allowing a single pass to run at the >> > same time in multiple threads). TLS can be used for some of the >> > global state plus of course some global data structures need >> > locking. >> >> Oh, and just to mention - there are a few things that may block >> adoption in the end >> like whether builds are still reproducible (we allocate things like >> DECL_UID from >> global pools and doing that somewhat randomly because of threading >> might - but not >> must - change code generation). Or that some diagnostics will appear >> in >> non-deterministic order, or that dump files are messed up (both >> issues could be >> solved by code dealing with the issue, like buffering and doing a re- >> play in >> program order). I guess reproducability is important when it comes >> down to >> debugging code-generation issues - I'd prefer to debug gcc when it >> doesn't run >> threaded but if that doesn't reproduce an issue that's bad. >> >> So the most important "milestone" of this project is to identify such >> issues and >> document them somewhere. > > One issue would be the garbage-collector: there are plenty of places in > GCC that have hidden assumptions that "a collection can't happen here" > (where we have temporaries that reference GC-managed objects, but which > aren't tracked by GC-roots). > > I had some patches for that back in 2014 that I think I managed to drop > on the floor (sorry): > https://gcc.gnu.org/ml/gcc-patches/2014-11/msg01300.html > https://gcc.gnu.org/ml/gcc-patches/2014-11/msg01340.html > https://gcc.gnu.org/ml/gcc-patches/2014-11/msg01510.html > > The GC's allocator is used almost everywhere, and is probably not > thread-safe yet. Yes. There's also global tree modification like chaining new pointer types into TYPE_POINTER_TO and friends so some helpers in tree.c need to be guarded as well. > FWIW I gave a talk at Cauldron 2013 about global state in GCC. Beware: > it's five years out-of-date, but maybe is still relevant in places? > https://dmalcolm.fedorapeople.org/gcc/global-state/ > https://gcc.gnu.org/ml/gcc/2013-05/msg00015.html > (I tackled this for libgccjit by instead introducing a mutex, a "big > compiler lock", jit_mutex in gcc/jit/jit-playback.c, held by whichever > thread is calling into the rest of the compiler sources). > > Hope this is helpful > Dave > > [...]
How can compiler speed-up postgresql database?
Hi. I did similar stats for postgresql server, more precisely for pgbench: pgbench -s100 & 10 runs of pgbench -t1 -v Martin pgbench-gcc-test.pdf.bz2 Description: application/bzip pgbench-gcc-test.ods Description: application/vnd.oasis.opendocument.spreadsheet
Remove *.mirror.babylon.network
Dear, For the foreseeable future we will not be able to provide our mirrors anymore. Could you please remove: nl.mirror.babylon.network fr.mirror.babylon.network Thanks! -- Tim Semeijn Babylon Network PGP: 0x2A540FA5 / 3DF3 13FA 4B60 E48A E755 9663 B187 0310 2A54 0FA5 signature.asc Description: OpenPGP digital signature