Barry's been holding this up too long because of his stubborn dislike of adding new Async versions of some routines. I guess he should just give up and accept them for now until something better comes along.
> On Feb 15, 2022, at 3:05 PM, Junchao Zhang <[email protected]> wrote: > > Besides the MPI synchronization issue, we need new async APIs like > VecAXPYAsync() to pass scalars produced on device. > > --Junchao Zhang > > > On Tue, Feb 15, 2022 at 10:11 AM Jed Brown <[email protected] > <mailto:[email protected]>> wrote: > Note that operations that don't have communication (like VecAXPY and > VecPointwiseMult) are already non-blocking on streams. (A recent Thrust > update helped us recover what had silently become blocking in a previous > release.) For multi-rank, operations like MatMult require communication and > MPI doesn't have a way to make it nonblocking. We've had some issues/bugs > with NVSHMEM to bypass MPI. > > MPI implementors have been really skeptical of placing MPI operations on > streams (like NCCL/RCCL or NVSHMEM). Cray's MPI doesn't have anything to do > with streams, device memory is cachable on the host, and RDMA operations are > initiated on the host without device logic being involved. I feel like it's > going to take company investment or a very enterprising systems researcher to > make the case for getting messaging to play well with streams. Perhaps it's a > better use of time to focus on reducing latency of notifying the host when > RDMA completes and reducing kernel launch time. In short, there are many > unanswered questions regarding truly asynchronous Krylov solvers. But in the > most obvious places for async, it works currently. > > Jacob Faibussowitsch <[email protected] <mailto:[email protected]>> > writes: > > > New code can (and absolutely should) use it right away, PetscDeviceContext > > has been fully functional since its merger. Remember though that it works > > on a “principled parallelism” model; the caller is responsible for proper > > serialization. > > > > Existing code? Not so much. In broad strokes the following sections need > > support before parallelism can be achieved from user-code: > > > > 1. Vec - WIP (feature complete, now in bug-fixing stage) > > 2. PetscSF - TODO > > 3. Mat - TODO > > 4. KSP/PC - TODO > > > > Seeing as each MR thus far for this has taken me roughly 3-4 months to > > merge, and with the later sections requiring enormous rewrites and API > > changes I don’t expect this to be finished for at least 2 years… Once the > > Vec MR is merged you could theoretically run with > > -device_context_stream_type default_blocking and achieve “asynchronous” > > compute but nothing would work properly as every other part of petsc > > expects to be synchronous. > > > > That being said I would be happy to give a demo to people on how they can > > integrate PetscDeviceContext into their code on the next developers > > meeting. It would go a long way to cutting down the timeline. > > > >> On Feb 15, 2022, at 02:02, Stefano Zampini <[email protected] > >> <mailto:[email protected]>> wrote: > >> > >> Jacob > >> > >> what is the current status of the async support in PETSc? > >> Can you summarize here? Is there any documentation available? > >> > >> Thanks > >> -- > >> Stefano
