Re: [gomp4 00/14] NVPTX: further porting

2015-10-29 Thread Alexander Monakov
On Thu, 29 Oct 2015, Jakub Jelinek wrote: > > rather provide a dummy 'usleep' under #ifdef __nvptx__. WDYT? > > Such ifdefs aren't really easily possible in OpenMP right now, the > preprocessing is done with the host compiler only, you'd need to arrange for > usleep being defined only in the PTX

Re: [gomp4 00/14] NVPTX: further porting

2015-10-29 Thread Jakub Jelinek
On Wed, Oct 28, 2015 at 08:19:19PM +0300, Alexander Monakov wrote: > > > On Wed, 21 Oct 2015, Jakub Jelinek wrote: > > > On Wed, Oct 21, 2015 at 12:16:35PM +0300, Alexander Monakov wrote: > > > > Of course that doesn't help the thread-limit-2.c testcase. > > > > > > Why not? > > > > Because th

Re: [gomp4 00/14] NVPTX: further porting

2015-10-28 Thread Alexander Monakov
On Wed, 21 Oct 2015, Jakub Jelinek wrote: > On Wed, Oct 21, 2015 at 12:16:35PM +0300, Alexander Monakov wrote: > > > Of course that doesn't help the thread-limit-2.c testcase. > > > > Why not? > > Because the compiler can be configured for multiple offloading devices, > and PTX might not be th

Re: [gomp4 00/14] NVPTX: further porting

2015-10-23 Thread Alexander Monakov
On Fri, 23 Oct 2015, Jakub Jelinek wrote: > Thus, if .shared function local is allowed, we'd need to emit two copies of > foo, one which assumes it is run in the teams context and one which assumes > it is run in the parallel context. If automatic vars can be only .local, > we are just in big trou

Re: [gomp4 00/14] NVPTX: further porting

2015-10-23 Thread Bernd Schmidt
On 10/23/2015 12:16 PM, Jakub Jelinek wrote: On Thu, Oct 22, 2015 at 07:16:49PM +0200, Bernd Schmidt wrote: I'm not really familiar with OpenMP and what it allows, so take all my comments with a grain of salt. So [snip - really good example] Thanks! So what I was trying to describe as a

Re: [gomp4 00/14] NVPTX: further porting

2015-10-23 Thread Jakub Jelinek
On Thu, Oct 22, 2015 at 07:16:49PM +0200, Bernd Schmidt wrote: > I'm not really familiar with OpenMP and what it allows, so take all my > comments with a grain of salt. The OpenMP execution/data sharing model for the target regions is very roughly that variables referenced in the various construct

Re: [gomp4 00/14] NVPTX: further porting

2015-10-23 Thread Jakub Jelinek
On Thu, Oct 22, 2015 at 07:16:49PM +0200, Bernd Schmidt wrote: > I'm not really familiar with OpenMP and what it allows, so take all my > comments with a grain of salt. > > On 10/22/2015 06:41 PM, Alexander Monakov wrote: > >The second approach is to run all threads in the warp all the time, makin

Re: [gomp4 00/14] NVPTX: further porting

2015-10-23 Thread Jakub Jelinek
On Thu, Oct 22, 2015 at 07:16:49PM +0200, Bernd Schmidt wrote: > I'm not really familiar with OpenMP and what it allows, so take all my > comments with a grain of salt. > > On 10/22/2015 06:41 PM, Alexander Monakov wrote: > >The second approach is to run all threads in the warp all the time, makin

Re: [gomp4 00/14] NVPTX: further porting

2015-10-22 Thread Alexander Monakov
On Thu, 22 Oct 2015, Bernd Schmidt wrote: > On 10/22/2015 08:08 PM, Alexander Monakov wrote: > > On Thu, 22 Oct 2015, Bernd Schmidt wrote: > > > > > I'm not really familiar with OpenMP and what it allows, so take all my > > > comments with a grain of salt. > > > > > > On 10/22/2015 06:41 PM, Ale

Re: [gomp4 00/14] NVPTX: further porting

2015-10-22 Thread Bernd Schmidt
On 10/22/2015 08:08 PM, Alexander Monakov wrote: On Thu, 22 Oct 2015, Bernd Schmidt wrote: I'm not really familiar with OpenMP and what it allows, so take all my comments with a grain of salt. On 10/22/2015 06:41 PM, Alexander Monakov wrote: The second approach is to run all threads in the wa

Re: [gomp4 00/14] NVPTX: further porting

2015-10-22 Thread Alexander Monakov
On Thu, 22 Oct 2015, Julian Brown wrote: > > The second approach is to run all threads in the warp all the time, > > making sure they execute the same code with the same data, and thus > > build up the same local state. In this case we'd need to ensure this > > invariant: if threads in the warp ha

Re: [gomp4 00/14] NVPTX: further porting

2015-10-22 Thread Alexander Monakov
On Thu, 22 Oct 2015, Bernd Schmidt wrote: > I'm not really familiar with OpenMP and what it allows, so take all my > comments with a grain of salt. > > On 10/22/2015 06:41 PM, Alexander Monakov wrote: > > The second approach is to run all threads in the warp all the time, making > > sure they exe

Re: [gomp4 00/14] NVPTX: further porting

2015-10-22 Thread Bernd Schmidt
I'm not really familiar with OpenMP and what it allows, so take all my comments with a grain of salt. On 10/22/2015 06:41 PM, Alexander Monakov wrote: The second approach is to run all threads in the warp all the time, making sure they execute the same code with the same data, and thus build up

Re: [gomp4 00/14] NVPTX: further porting

2015-10-22 Thread Julian Brown
On Thu, 22 Oct 2015 19:41:51 +0300 Alexander Monakov wrote: > On Thu, 22 Oct 2015, Jakub Jelinek wrote: > > Does that apply also to threads within a warp? I.e. is .local > > local to each thread in the warp, or to the whole warp, and if the > > former, how can say at the start of a SIMD region o

Re: [gomp4 00/14] NVPTX: further porting

2015-10-22 Thread Alexander Monakov
On Thu, 22 Oct 2015, Jakub Jelinek wrote: > Does that apply also to threads within a warp? I.e. is .local local to each > thread in the warp, or to the whole warp, and if the former, how can say at > the start of a SIMD region or at its end the local vars be broadcast to > other threads and collec

Re: [gomp4 00/14] NVPTX: further porting

2015-10-22 Thread Jakub Jelinek
On Wed, Oct 21, 2015 at 06:18:25PM +0300, Alexander Monakov wrote: > On Wed, 21 Oct 2015, Bernd Schmidt wrote: > > > On 10/20/2015 08:34 PM, Alexander Monakov wrote: > > > This patch series ports enough of libgomp.c to get warp-level parallelism > > > working for OpenMP offloading. The overall ap

Re: [gomp4 00/14] NVPTX: further porting

2015-10-21 Thread Bernd Schmidt
On 10/21/2015 05:18 PM, Alexander Monakov wrote: On Wed, 21 Oct 2015, Bernd Schmidt wrote: On 10/20/2015 08:34 PM, Alexander Monakov wrote: This patch series ports enough of libgomp.c to get warp-level parallelism working for OpenMP offloading. The overall approach is as follows. Could you

Re: [gomp4 00/14] NVPTX: further porting

2015-10-21 Thread Alexander Monakov
On Wed, 21 Oct 2015, Bernd Schmidt wrote: > On 10/20/2015 08:34 PM, Alexander Monakov wrote: > > This patch series ports enough of libgomp.c to get warp-level parallelism > > working for OpenMP offloading. The overall approach is as follows. > > Could you elaborate a bit what you mean by this ju

Re: [gomp4 00/14] NVPTX: further porting

2015-10-21 Thread Bernd Schmidt
On 10/20/2015 08:34 PM, Alexander Monakov wrote: This patch series ports enough of libgomp.c to get warp-level parallelism working for OpenMP offloading. The overall approach is as follows. Could you elaborate a bit what you mean by this just so we understand each other in terms of terminolog

Re: [gomp4 00/14] NVPTX: further porting

2015-10-21 Thread Jakub Jelinek
On Wed, Oct 21, 2015 at 12:16:35PM +0300, Alexander Monakov wrote: > > Of course that doesn't help the thread-limit-2.c testcase. > > Why not? Because the compiler can be configured for multiple offloading devices, and PTX might not be the first device. So, you'd need to have a tcl test whether

Re: [gomp4 00/14] NVPTX: further porting

2015-10-21 Thread Alexander Monakov
On Wed, 21 Oct 2015, Jakub Jelinek wrote: > > time (libcudadevrt.a), and imposes overhead at run time. The last point > > might > > But if this is the case, that is really serious issue. Is that really > something that isn't available in a shared library? > E.g. with my distro GCC maintainer ha

Re: [gomp4 00/14] NVPTX: further porting

2015-10-21 Thread Jakub Jelinek
On Tue, Oct 20, 2015 at 09:34:22PM +0300, Alexander Monakov wrote: > I've opted not to use dynamic parallelism. It increases the hardware > requirement from sm_30 to sm_35, needs a library from CUDA Toolkit at link I'll try to add the thread_limit/num_teams arguments to GOMP_target_41 soon (toget

Re: [gomp4 00/14] NVPTX: further porting

2015-10-21 Thread Martin Jambor
Hi, On Tue, Oct 20, 2015 at 09:34:22PM +0300, Alexander Monakov wrote: > Hello, > > This patch series moves libgomp/nvptx porting further along to get initial > bits of parallel execution working, mostly unbreaking the testsuite. Please > have a look! I'm interested in feedback, and would like

[gomp4 00/14] NVPTX: further porting

2015-10-20 Thread Alexander Monakov
Hello, This patch series moves libgomp/nvptx porting further along to get initial bits of parallel execution working, mostly unbreaking the testsuite. Please have a look! I'm interested in feedback, and would like to know if it's suitable to become a part of a branch. This patch series ports en