Re: GSOC Question about the parallelization project

2018-03-20 Thread Richard Biener
On Mon, Mar 19, 2018 at 9:55 PM, Richard Biener
 wrote:
> On March 19, 2018 8:09:32 PM GMT+01:00, Sebastiaan Peters 
>  wrote:
>>>The goal should be to extend TU wise parallelism via make to function
>>wise parallelism within GCC.
>>
>>Could you please elaborate more on this?
>
> In the abstract sense you'd view the compilation process separated into N 
> stages, each function being processed by each. You'd assign a thread to each 
> stage and move the work items (the functions) across the set of threads 
> honoring constraints such as an IPA stage needing all functions completed the 
> previous stage. That allows you to easier model the constraints due to shared 
> state (like no pass operating on two functions at the same time) compared to 
> a model where you assign a thread to each function.
>
> You'll figure that the easiest point in the pipeline to try this 'pipelining' 
> is after IPA has completed and until RTL is generated.
>
> Ideally the pipelining would start as early as the front ends finished 
> parsing a function and ideally we'd have multiple functions in the RTL 
> pipeline.
>
> The main obstacles will be the global state in the compiler of which there is 
> the least during the GIMPLE passes (mostly cfun and current_function_decl 
> plus globals in the individual passes which is easiest dealt with by not 
> allowing a single pass to run at the same time in multiple threads). TLS can 
> be used for some of the global state plus of course some global data 
> structures need locking.

Oh, and just to mention - there are a few things that may block
adoption in the end
like whether builds are still reproducible (we allocate things like
DECL_UID from
global pools and doing that somewhat randomly because of threading
might - but not
must - change code generation).  Or that some diagnostics will appear in
non-deterministic order, or that dump files are messed up (both issues could be
solved by code dealing with the issue, like buffering and doing a re-play in
program order).  I guess reproducability is important when it comes down to
debugging code-generation issues - I'd prefer to debug gcc when it doesn't run
threaded but if that doesn't reproduce an issue that's bad.

So the most important "milestone" of this project is to identify such issues and
document them somewhere.

Richard.

> Richard.
>
>>
>>From: Richard Biener 
>>Sent: Monday, March 19, 2018 18:37
>>To: Sebastiaan Peters
>>Cc: gcc@gcc.gnu.org
>>Subject: Re: GSOC Question about the parallelization project
>>
>>On March 19, 2018 4:27:58 PM GMT+01:00, Sebastiaan Peters
>> wrote:
>>>Thank you for your quick response.
>>>
>>>Does the GIMPLE optimization pipeline include only the Tree SSA passes
>>>or also the RTL passes?
>>
>>Yes, it only includes only Tree SSA passes. The RTL part of the
>>pipeline hasn't been audited to work with multiple functions in RTL
>>Form in the same time.
>>
>>The only parallelized part of the compiler is LTO byte code write-out
>>at WPA stage which is done in a "fork-and-forget" mode.
>>
>>The goal should be to extend TU wise parallelism via make to function
>>wise parallelism within GCC.
>>
>>Richard.
>>
>>>Are the currently other parts of the compiler that have been
>>>parallelized?
>>>
>>>Kind regards,
>>>
>>>Sebastiaan Peters
>


Re: GSOC Question about the parallelization project

2018-03-20 Thread David Malcolm
On Tue, 2018-03-20 at 14:02 +0100, Richard Biener wrote:
> On Mon, Mar 19, 2018 at 9:55 PM, Richard Biener
>  wrote:
> > On March 19, 2018 8:09:32 PM GMT+01:00, Sebastiaan Peters  > 7...@hotmail.com> wrote:
> > > > The goal should be to extend TU wise parallelism via make to
> > > > function
> > > 
> > > wise parallelism within GCC.
> > > 
> > > Could you please elaborate more on this?
> > 
> > In the abstract sense you'd view the compilation process separated
> > into N stages, each function being processed by each. You'd assign
> > a thread to each stage and move the work items (the functions)
> > across the set of threads honoring constraints such as an IPA stage
> > needing all functions completed the previous stage. That allows you
> > to easier model the constraints due to shared state (like no pass
> > operating on two functions at the same time) compared to a model
> > where you assign a thread to each function.
> > 
> > You'll figure that the easiest point in the pipeline to try this
> > 'pipelining' is after IPA has completed and until RTL is generated.
> > 
> > Ideally the pipelining would start as early as the front ends
> > finished parsing a function and ideally we'd have multiple
> > functions in the RTL pipeline.
> > 
> > The main obstacles will be the global state in the compiler of
> > which there is the least during the GIMPLE passes (mostly cfun and
> > current_function_decl plus globals in the individual passes which
> > is easiest dealt with by not allowing a single pass to run at the
> > same time in multiple threads). TLS can be used for some of the
> > global state plus of course some global data structures need
> > locking.
> 
> Oh, and just to mention - there are a few things that may block
> adoption in the end
> like whether builds are still reproducible (we allocate things like
> DECL_UID from
> global pools and doing that somewhat randomly because of threading
> might - but not
> must - change code generation).  Or that some diagnostics will appear
> in
> non-deterministic order, or that dump files are messed up (both
> issues could be
> solved by code dealing with the issue, like buffering and doing a re-
> play in
> program order).  I guess reproducability is important when it comes
> down to
> debugging code-generation issues - I'd prefer to debug gcc when it
> doesn't run
> threaded but if that doesn't reproduce an issue that's bad.
> 
> So the most important "milestone" of this project is to identify such
> issues and
> document them somewhere.

One issue would be the garbage-collector: there are plenty of places in
GCC that have hidden assumptions that "a collection can't happen here"
(where we have temporaries that reference GC-managed objects, but which
aren't tracked by GC-roots).

I had some patches for that back in 2014 that I think I managed to drop
on the floor (sorry):
  https://gcc.gnu.org/ml/gcc-patches/2014-11/msg01300.html
  https://gcc.gnu.org/ml/gcc-patches/2014-11/msg01340.html
  https://gcc.gnu.org/ml/gcc-patches/2014-11/msg01510.html

The GC's allocator is used almost everywhere, and is probably not
thread-safe yet.

FWIW I gave a talk at Cauldron 2013 about global state in GCC.  Beware:
it's five years out-of-date, but maybe is still relevant in places?
  https://dmalcolm.fedorapeople.org/gcc/global-state/
  https://gcc.gnu.org/ml/gcc/2013-05/msg00015.html
(I tackled this for libgccjit by instead introducing a mutex, a "big
compiler lock", jit_mutex in gcc/jit/jit-playback.c, held by whichever
thread is calling into the rest of the compiler sources).

Hope this is helpful
Dave

[...]


Re: GSOC Question about the parallelization project

2018-03-20 Thread Richard Biener
On Tue, Mar 20, 2018 at 3:49 PM, David Malcolm  wrote:
> On Tue, 2018-03-20 at 14:02 +0100, Richard Biener wrote:
>> On Mon, Mar 19, 2018 at 9:55 PM, Richard Biener
>>  wrote:
>> > On March 19, 2018 8:09:32 PM GMT+01:00, Sebastiaan Peters > > 7...@hotmail.com> wrote:
>> > > > The goal should be to extend TU wise parallelism via make to
>> > > > function
>> > >
>> > > wise parallelism within GCC.
>> > >
>> > > Could you please elaborate more on this?
>> >
>> > In the abstract sense you'd view the compilation process separated
>> > into N stages, each function being processed by each. You'd assign
>> > a thread to each stage and move the work items (the functions)
>> > across the set of threads honoring constraints such as an IPA stage
>> > needing all functions completed the previous stage. That allows you
>> > to easier model the constraints due to shared state (like no pass
>> > operating on two functions at the same time) compared to a model
>> > where you assign a thread to each function.
>> >
>> > You'll figure that the easiest point in the pipeline to try this
>> > 'pipelining' is after IPA has completed and until RTL is generated.
>> >
>> > Ideally the pipelining would start as early as the front ends
>> > finished parsing a function and ideally we'd have multiple
>> > functions in the RTL pipeline.
>> >
>> > The main obstacles will be the global state in the compiler of
>> > which there is the least during the GIMPLE passes (mostly cfun and
>> > current_function_decl plus globals in the individual passes which
>> > is easiest dealt with by not allowing a single pass to run at the
>> > same time in multiple threads). TLS can be used for some of the
>> > global state plus of course some global data structures need
>> > locking.
>>
>> Oh, and just to mention - there are a few things that may block
>> adoption in the end
>> like whether builds are still reproducible (we allocate things like
>> DECL_UID from
>> global pools and doing that somewhat randomly because of threading
>> might - but not
>> must - change code generation).  Or that some diagnostics will appear
>> in
>> non-deterministic order, or that dump files are messed up (both
>> issues could be
>> solved by code dealing with the issue, like buffering and doing a re-
>> play in
>> program order).  I guess reproducability is important when it comes
>> down to
>> debugging code-generation issues - I'd prefer to debug gcc when it
>> doesn't run
>> threaded but if that doesn't reproduce an issue that's bad.
>>
>> So the most important "milestone" of this project is to identify such
>> issues and
>> document them somewhere.
>
> One issue would be the garbage-collector: there are plenty of places in
> GCC that have hidden assumptions that "a collection can't happen here"
> (where we have temporaries that reference GC-managed objects, but which
> aren't tracked by GC-roots).
>
> I had some patches for that back in 2014 that I think I managed to drop
> on the floor (sorry):
>   https://gcc.gnu.org/ml/gcc-patches/2014-11/msg01300.html
>   https://gcc.gnu.org/ml/gcc-patches/2014-11/msg01340.html
>   https://gcc.gnu.org/ml/gcc-patches/2014-11/msg01510.html
>
> The GC's allocator is used almost everywhere, and is probably not
> thread-safe yet.

Yes.  There's also global tree modification like chaining new
pointer types into TYPE_POINTER_TO and friends so some
helpers in tree.c need to be guarded as well.

> FWIW I gave a talk at Cauldron 2013 about global state in GCC.  Beware:
> it's five years out-of-date, but maybe is still relevant in places?
>   https://dmalcolm.fedorapeople.org/gcc/global-state/
>   https://gcc.gnu.org/ml/gcc/2013-05/msg00015.html
> (I tackled this for libgccjit by instead introducing a mutex, a "big
> compiler lock", jit_mutex in gcc/jit/jit-playback.c, held by whichever
> thread is calling into the rest of the compiler sources).
>
> Hope this is helpful
> Dave
>
> [...]


How can compiler speed-up postgresql database?

2018-03-20 Thread Martin Liška

Hi.

I did similar stats for postgresql server, more precisely for pgbench:
pgbench -s100 & 10 runs of pgbench -t1 -v

Martin


pgbench-gcc-test.pdf.bz2
Description: application/bzip


pgbench-gcc-test.ods
Description: application/vnd.oasis.opendocument.spreadsheet


Remove *.mirror.babylon.network

2018-03-20 Thread Tim Semeijn
Dear,

For the foreseeable future we will not be able to provide our mirrors
anymore. Could you please remove:

nl.mirror.babylon.network
fr.mirror.babylon.network

Thanks!

-- 
Tim Semeijn
Babylon Network

PGP: 0x2A540FA5 / 3DF3 13FA 4B60 E48A E755 9663 B187 0310 2A54 0FA5



signature.asc
Description: OpenPGP digital signature