On Thu, 29 Oct 2015, Jakub Jelinek wrote:
> > rather provide a dummy 'usleep' under #ifdef __nvptx__. WDYT?
>
> Such ifdefs aren't really easily possible in OpenMP right now, the
> preprocessing is done with the host compiler only, you'd need to arrange for
> usleep being defined only in the PTX
On Wed, Oct 28, 2015 at 08:19:19PM +0300, Alexander Monakov wrote:
>
>
> On Wed, 21 Oct 2015, Jakub Jelinek wrote:
>
> > On Wed, Oct 21, 2015 at 12:16:35PM +0300, Alexander Monakov wrote:
> > > > Of course that doesn't help the thread-limit-2.c testcase.
> > >
> > > Why not?
> >
> > Because th
On Wed, 21 Oct 2015, Jakub Jelinek wrote:
> On Wed, Oct 21, 2015 at 12:16:35PM +0300, Alexander Monakov wrote:
> > > Of course that doesn't help the thread-limit-2.c testcase.
> >
> > Why not?
>
> Because the compiler can be configured for multiple offloading devices,
> and PTX might not be th
On Fri, 23 Oct 2015, Jakub Jelinek wrote:
> Thus, if .shared function local is allowed, we'd need to emit two copies of
> foo, one which assumes it is run in the teams context and one which assumes
> it is run in the parallel context. If automatic vars can be only .local,
> we are just in big trou
On 10/23/2015 12:16 PM, Jakub Jelinek wrote:
On Thu, Oct 22, 2015 at 07:16:49PM +0200, Bernd Schmidt wrote:
I'm not really familiar with OpenMP and what it allows, so take all my
comments with a grain of salt.
So
[snip - really good example]
Thanks!
So what I was trying to describe as a
On Thu, Oct 22, 2015 at 07:16:49PM +0200, Bernd Schmidt wrote:
> I'm not really familiar with OpenMP and what it allows, so take all my
> comments with a grain of salt.
The OpenMP execution/data sharing model for the target regions
is very roughly that variables referenced in the various construct
On Thu, Oct 22, 2015 at 07:16:49PM +0200, Bernd Schmidt wrote:
> I'm not really familiar with OpenMP and what it allows, so take all my
> comments with a grain of salt.
>
> On 10/22/2015 06:41 PM, Alexander Monakov wrote:
> >The second approach is to run all threads in the warp all the time, makin
On Thu, Oct 22, 2015 at 07:16:49PM +0200, Bernd Schmidt wrote:
> I'm not really familiar with OpenMP and what it allows, so take all my
> comments with a grain of salt.
>
> On 10/22/2015 06:41 PM, Alexander Monakov wrote:
> >The second approach is to run all threads in the warp all the time, makin
On Thu, 22 Oct 2015, Bernd Schmidt wrote:
> On 10/22/2015 08:08 PM, Alexander Monakov wrote:
> > On Thu, 22 Oct 2015, Bernd Schmidt wrote:
> >
> > > I'm not really familiar with OpenMP and what it allows, so take all my
> > > comments with a grain of salt.
> > >
> > > On 10/22/2015 06:41 PM, Ale
On 10/22/2015 08:08 PM, Alexander Monakov wrote:
On Thu, 22 Oct 2015, Bernd Schmidt wrote:
I'm not really familiar with OpenMP and what it allows, so take all my
comments with a grain of salt.
On 10/22/2015 06:41 PM, Alexander Monakov wrote:
The second approach is to run all threads in the wa
On Thu, 22 Oct 2015, Julian Brown wrote:
> > The second approach is to run all threads in the warp all the time,
> > making sure they execute the same code with the same data, and thus
> > build up the same local state. In this case we'd need to ensure this
> > invariant: if threads in the warp ha
On Thu, 22 Oct 2015, Bernd Schmidt wrote:
> I'm not really familiar with OpenMP and what it allows, so take all my
> comments with a grain of salt.
>
> On 10/22/2015 06:41 PM, Alexander Monakov wrote:
> > The second approach is to run all threads in the warp all the time, making
> > sure they exe
I'm not really familiar with OpenMP and what it allows, so take all my
comments with a grain of salt.
On 10/22/2015 06:41 PM, Alexander Monakov wrote:
The second approach is to run all threads in the warp all the time, making
sure they execute the same code with the same data, and thus build up
On Thu, 22 Oct 2015 19:41:51 +0300
Alexander Monakov wrote:
> On Thu, 22 Oct 2015, Jakub Jelinek wrote:
> > Does that apply also to threads within a warp? I.e. is .local
> > local to each thread in the warp, or to the whole warp, and if the
> > former, how can say at the start of a SIMD region o
On Thu, 22 Oct 2015, Jakub Jelinek wrote:
> Does that apply also to threads within a warp? I.e. is .local local to each
> thread in the warp, or to the whole warp, and if the former, how can say at
> the start of a SIMD region or at its end the local vars be broadcast to
> other threads and collec
On Wed, Oct 21, 2015 at 06:18:25PM +0300, Alexander Monakov wrote:
> On Wed, 21 Oct 2015, Bernd Schmidt wrote:
>
> > On 10/20/2015 08:34 PM, Alexander Monakov wrote:
> > > This patch series ports enough of libgomp.c to get warp-level parallelism
> > > working for OpenMP offloading. The overall ap
On 10/21/2015 05:18 PM, Alexander Monakov wrote:
On Wed, 21 Oct 2015, Bernd Schmidt wrote:
On 10/20/2015 08:34 PM, Alexander Monakov wrote:
This patch series ports enough of libgomp.c to get warp-level parallelism
working for OpenMP offloading. The overall approach is as follows.
Could you
On Wed, 21 Oct 2015, Bernd Schmidt wrote:
> On 10/20/2015 08:34 PM, Alexander Monakov wrote:
> > This patch series ports enough of libgomp.c to get warp-level parallelism
> > working for OpenMP offloading. The overall approach is as follows.
>
> Could you elaborate a bit what you mean by this ju
On 10/20/2015 08:34 PM, Alexander Monakov wrote:
This patch series ports enough of libgomp.c to get warp-level parallelism
working for OpenMP offloading. The overall approach is as follows.
Could you elaborate a bit what you mean by this just so we understand
each other in terms of terminolog
On Wed, Oct 21, 2015 at 12:16:35PM +0300, Alexander Monakov wrote:
> > Of course that doesn't help the thread-limit-2.c testcase.
>
> Why not?
Because the compiler can be configured for multiple offloading devices,
and PTX might not be the first device. So, you'd need to have a tcl
test whether
On Wed, 21 Oct 2015, Jakub Jelinek wrote:
> > time (libcudadevrt.a), and imposes overhead at run time. The last point
> > might
>
> But if this is the case, that is really serious issue. Is that really
> something that isn't available in a shared library?
> E.g. with my distro GCC maintainer ha
On Tue, Oct 20, 2015 at 09:34:22PM +0300, Alexander Monakov wrote:
> I've opted not to use dynamic parallelism. It increases the hardware
> requirement from sm_30 to sm_35, needs a library from CUDA Toolkit at link
I'll try to add the thread_limit/num_teams arguments to GOMP_target_41
soon (toget
Hi,
On Tue, Oct 20, 2015 at 09:34:22PM +0300, Alexander Monakov wrote:
> Hello,
>
> This patch series moves libgomp/nvptx porting further along to get initial
> bits of parallel execution working, mostly unbreaking the testsuite. Please
> have a look! I'm interested in feedback, and would like
Hello,
This patch series moves libgomp/nvptx porting further along to get initial
bits of parallel execution working, mostly unbreaking the testsuite. Please
have a look! I'm interested in feedback, and would like to know if it's
suitable to become a part of a branch.
This patch series ports en
24 matches
Mail list logo