On Thu, 15 Oct 2015, Jakub Jelinek wrote:
> >   - this functionality doesn't currently work through CUDA MPS 
> > ("multi-process
> >     server", for funneling CUDA calls from different processes through a
> >     single "server" process, avoiding context-switch overhead on the device,
> >     sometimes used for CUDA-with-MPI applications);
> 
> That shouldn't be an issue for the OpenMP 4.5 / PTX offloading, right?

I think it can be an issue for applications employing MPI for (coarse-grain)
parallelism and OpenMP for simd/offloading.  It can be a non-issue if PTX
offloading conflicts with MPS in some other way, but at the moment I'm not
aware of such (as long as dynamic parallelism is not a hard requirement).

> >   - it is explicitely forbidden to invoke CUDA API calls from the callback;
> >     perhaps understandable, as the callback may be running in a 
> > signal-handler
> >     context (unlikely), or, more plausibly, in a different thread than the 
> > one
> >     that registered the callback.
> 
> So, is it run from async signal handlers, or just could be?

The documentation doesn't tell.  I could find out experimentally, but then it
would tell how the current implementation behaves; it could change in the
future.  Like I said in the quote, I expect it runs asynchronously in a
different thread, rather than in an async signal context.

> Spawning a helper thread is very expensive and we need something to be done
> upon completion pretty much always.  Perhaps we can optimize and somehow
> deal with merging multiple async tasks that are waiting on each other, but
> the user could have intermixed the offloading tasks with host tasks and have
> dependencies in between them, plus there are all the various spots where
> user wants to wait for both host and offloading tasks, or e.g. offloading
> tasks from two different devices, or multiple offloading tasks from the same
> devices (multiple streams), etc.

I think we should avoid involving the host in "reasonable" cases, and for the
rest just have something minimally acceptable (either with callbacks, or
polling).

Alexander

Reply via email to