Re: libgomp nvptx plugin: rework initialisation and support the proposed load/unload hooks (was: Merge current set of OpenACC changes from gomp-4_0-branch)

Julian Brown Wed, 25 Feb 2015 04:00:07 -0800

On Wed, 25 Feb 2015 10:36:08 +0100
Thomas Schwinge <tho...@codesourcery.com> wrote:


> Hi!
> 
> On Tue, 24 Feb 2015 11:29:51 +0000, Julian Brown
> <jul...@codesourcery.com> wrote:
> > Test results look OK, barring a suspected harness issue (lib-83
> > failing with a timeout for nvptx
> 
> However, I'm seeing a class of testsuite regressions: all variants of
> libgomp.oacc-fortran/lib-5.f90 and libgomp.oacc-fortran/lib-7.f90
> FAIL: »libgomp: cuMemFreeHost error: invalid value«.  I see these two
> test cases contain a lot of acc_get_num_devices and similar calls --
> I've been testing this on our nvidiak20-2 system, which contains two
> Nvidia K20 cards, so maybe there's something wrong in that regard.
> (But why is this failing only for Fortran -- are we missing C/C++
> tests in that area?) Can you have a look, or want me to?

I can have a look at that.

> > --- a/gcc/config/nvptx/mkoffload.c
> > +++ b/gcc/config/nvptx/mkoffload.c
> > @@ -850,16 +851,17 @@ process (FILE *in, FILE *out)
> 
> >    fprintf (out, "static const void *target_data[] = {\n");
> > -  fprintf (out, "  ptx_code, var_mappings, func_mappings\n");
> > +  fprintf (out, "  ptx_code, (void*) %u, var_mappings, (void*) %u,
> > "
> > +           "func_mappings\n", nvars, nfuncs);
> >    fprintf (out, "};\n\n");
> 
> I wondered if it's maybe more elegant to just separate those by NULL
> delimiters instead of the size integers casted to void * (spaces
> missing)?  But then, that'd need "double scanning" in the consumer,
> libgomp/plugin/plugin-nvptx.c:GOMP_OFFLOAD_load_image, because we
> need to allocate an appropriately sized array, so maybe your more
> expressive approach is better indeed.

Yeah, I considered both: there's probably not much to choose between
the approaches. They use the same amount of space.

> > --- a/libgomp/oacc-async.c
> > +++ b/libgomp/oacc-async.c
> > @@ -34,44 +34,68 @@
> >  int
> >  acc_async_test (int async)
> >  {
> > +  struct goacc_thread *thr = goacc_thread ();
> > +
> >    if (async < acc_async_sync)
> >      gomp_fatal ("invalid async argument: %d", async);
> >  
> > -  return base_dev->openacc.async_test_func (async);
> > +  assert (thr->dev);
> > +
> > +  return thr->dev->openacc.async_test_func (async);
> >  }

> Here, and in several other places: is this code conforming to the
> OpenACC specification?  Do we need to (lazily) initialize in all
> these places, or in goacc_thread, or gracefully fail (see below) if
> not initialized (basically in all places where you currently assert
> (thr->dev)?
> 
>     #include <openacc.h>
>     
>     int main(int argc, char *argv[])
>     {
>       return acc_async_test(0);
>     }
> 
> [sigsegv]

Whether it conforms to the spec or not is a hard question to answer,
because a lot of behaviour is left undefined. But here are two
possibly-useful made-up guidelines:

1. Does the program work the same with OpenACC disabled?

2. Does some strange use of OpenACC functionality (including library
   calls, etc.) probably indicate user error?

Much of the lazy initialisation code is there so that (1) can be true
-- i.e., a program can use OpenACC directives without making an
explicit call to "acc_init" or other API-specific initialisation code.

But this case is an explicit call to the OpenACC runtime library, so the
program can't work without -fopenacc enabled, so we can follow
guideline (2) instead. And in this case, it's meaningless to test for
completion of async operation when no device is active.

Of course though, this should be an actual error rather than a crash.
But, I don't think we want to lazily-initialise here.

> Also, I'm not sure what the expected outcome of this code sequence is:
> 
>     acc_init(acc_device_nvidia);
>     acc_shutdown(acc_device_nvidia);
>     acc_async_test(0);
> 
>     a.out: [...]/source-gcc/libgomp/oacc-async.c:42: acc_async_test:
> Assertion `thr->dev' failed. Aborted (core dumped)
> 
> If the OpenACC specification can be read such that all this indeed is
> "undefined behavior", then aborting/crashing is OK, of course.

Again, this would probably indicate user error in a real program, so it
should raise a (real) error message.

> > --- a/libgomp/oacc-cuda.c
> > +++ b/libgomp/oacc-cuda.c
> > @@ -34,51 +34,53 @@
> >  void *
> >  acc_get_current_cuda_device (void)
> >  {
> > -  void *p = NULL;
> > +  struct goacc_thread *thr = goacc_thread ();
> >  
> > -  if (base_dev && base_dev->openacc.cuda.get_current_device_func)
> > -    p = base_dev->openacc.cuda.get_current_device_func ();
> > +  if (thr && thr->dev &&
> > thr->dev->openacc.cuda.get_current_device_func)
> > +    return thr->dev->openacc.cuda.get_current_device_func ();
> >  
> > -  return p;
> > +  return NULL;
> >  }
> 
> Here, and in other places, it looks as if we'd fail gracefully.

Not sure about this (maybe it should be an error too?), but...

> >  int
> >  acc_set_cuda_stream (int async, void *stream)
> >  {
> > -  int s = -1;
> > +  struct goacc_thread *thr;
> >  
> >    if (async < 0 || stream == NULL)
> >      return 0;
> >  
> >    goacc_lazy_initialize ();
> >  
> > -  if (base_dev && base_dev->openacc.cuda.set_stream_func)
> > -    s = base_dev->openacc.cuda.set_stream_func (async, stream);
> > +  thr = goacc_thread ();
> > +
> > +  if (thr && thr->dev && thr->dev->openacc.cuda.set_stream_func)
> > +    return thr->dev->openacc.cuda.set_stream_func (async, stream);
> >  
> > -  return s;
> > +  return -1;
> >  }
> 
> This one does have a goacc_lazy_initialize call.

This one might indeed be a reasonable way of initialising the OpenACC
runtime: the user is already using CUDA or some other CUDA consumer,
and wishes to use OpenACC also. So this can reasonably be the first
"OpenACC" call in a program.

> > --- a/libgomp/oacc-init.c
> > +++ b/libgomp/oacc-init.c
> 
> > +static const char *
> > +name_of_acc_device_t (enum acc_device_t type)
> > +{
> > +  switch (type)
> > +    {
> > +    case acc_device_none: return "none";
> > +    case acc_device_default: return "default";
> > +    case acc_device_host: return "host";
> > +    case acc_device_host_nonshm: return "host_nonshm";
> > +    case acc_device_not_host: return "not_host";
> > +    case acc_device_nvidia: return "nvidia";
> > +    default: return "<unknown>";
> > +    }
> > +}
> 
> I'd have made the default case abort.  (Does a missing case actually
> trigger a compile-time error?)

We're fixing the list of available offloading plugins at build time,
yes? In that case failing on the default case is reasonable.

> > +  ptx_devices[n] = nvptx_open_device (n);
> > +  instantiated_devices |= 1 << n;
> 
> Here, and also in several other places: do we have to care about big
> values of n?

I wondered that: we could equally just use the ptx_devices array and
remove the bitmask. I'll fix that.

> >  int
> > -GOMP_OFFLOAD_get_table (int n __attribute__ ((unused)),
> > -                   struct mapping_table **tablep)
> > +GOMP_OFFLOAD_load_image (int ord, void *target_data,
> > +                    struct addr_pair **target_table)
> >  {
> >    CUmodule module;
> > -  void **fn_table;
> > -  char **fn_names;
> > -  int fn_entries, i;
> > +  char **fn_names, **var_names;
> > +  unsigned int fn_entries, var_entries, i, j;
> >    CUresult r;
> >    struct targ_fn_descriptor *targ_fns;
> > +  void **img_header = (void **) target_data;
> > +  struct ptx_image_data *new_image;
> > +
> > +  nvptx_attach_host_thread_to_device (ord);
> >  
> >    if (nvptx_init () <= 0)
> >      return 0;
> 
> Need to adapt to the interface change of nvptx_init.  Also, missing
> locking of ptx_dev_lock.

and these other couple of bits too.

Thanks,

Julian

Re: libgomp nvptx plugin: rework initialisation and support the proposed load/unload hooks (was: Merge current set of OpenACC changes from gomp-4_0-branch)

Reply via email to