Barry's been holding this up too long because of his stubborn dislike of 
adding new Async versions of some routines. I guess he should just give up and 
accept them for now until something better comes along.


> On Feb 15, 2022, at 3:05 PM, Junchao Zhang <[email protected]> wrote:
> 
> Besides the MPI synchronization issue,  we need new async APIs like 
> VecAXPYAsync() to pass scalars produced on device.
> 
> --Junchao Zhang
> 
> 
> On Tue, Feb 15, 2022 at 10:11 AM Jed Brown <[email protected] 
> <mailto:[email protected]>> wrote:
> Note that operations that don't have communication (like VecAXPY and 
> VecPointwiseMult) are already non-blocking on streams. (A recent Thrust 
> update helped us recover what had silently become blocking in a previous 
> release.) For multi-rank, operations like MatMult require communication and 
> MPI doesn't have a way to make it nonblocking. We've had some issues/bugs 
> with NVSHMEM to bypass MPI.
> 
> MPI implementors have been really skeptical of placing MPI operations on 
> streams (like NCCL/RCCL or NVSHMEM). Cray's MPI doesn't have anything to do 
> with streams, device memory is cachable on the host, and RDMA operations are 
> initiated on the host without device logic being involved. I feel like it's 
> going to take company investment or a very enterprising systems researcher to 
> make the case for getting messaging to play well with streams. Perhaps it's a 
> better use of time to focus on reducing latency of notifying the host when 
> RDMA completes and reducing kernel launch time. In short, there are many 
> unanswered questions regarding truly asynchronous Krylov solvers. But in the 
> most obvious places for async, it works currently.
> 
> Jacob Faibussowitsch <[email protected] <mailto:[email protected]>> 
> writes:
> 
> > New code can (and absolutely should) use it right away, PetscDeviceContext 
> > has been fully functional since its merger. Remember though that it works 
> > on a “principled parallelism” model; the caller is responsible for proper 
> > serialization.
> >
> > Existing code? Not so much. In broad strokes the following sections need 
> > support before parallelism can be achieved from user-code:
> >
> > 1. Vec     - WIP (feature complete, now in bug-fixing stage)
> > 2. PetscSF - TODO
> > 3. Mat     - TODO
> > 4. KSP/PC  - TODO
> >
> > Seeing as each MR thus far for this has taken me roughly 3-4 months to 
> > merge, and with the later sections requiring enormous rewrites and API 
> > changes I don’t expect this to be finished for at least 2 years… Once the 
> > Vec MR is merged you could theoretically run with 
> > -device_context_stream_type default_blocking and achieve “asynchronous” 
> > compute but nothing would work properly as every other part of petsc 
> > expects to be synchronous.
> >
> > That being said I would be happy to give a demo to people on how they can 
> > integrate PetscDeviceContext into their code on the next developers 
> > meeting. It would go a long way to cutting down the timeline.
> >
> >> On Feb 15, 2022, at 02:02, Stefano Zampini <[email protected] 
> >> <mailto:[email protected]>> wrote:
> >> 
> >> Jacob
> >> 
> >> what is the current status of the async support in PETSc?
> >> Can you summarize here? Is there any documentation available?
> >> 
> >> Thanks
> >> -- 
> >> Stefano

Reply via email to