nccl

Karl Rupp Tue, 16 Jun 2020 20:20:16 -0700

From a practical standpoint it seems to me that NCCL is an offering toa community that isn't used to MPI. It's categorized as 'Deep LearningSoftware' on the NVIDIA page ;-)


The section 'NCCL and MPI' has some interesting bits:
 https://docs.nvidia.com/deeplearning/nccl/user-guide/docs/mpi.html


At the bottom of the page there is

"Using NCCL to perform inter-GPU communication concurrently withCUDA-aware MPI may create deadlocks. (...) Using both MPI and NCCL toperform transfers between the same sets of CUDA devices concurrently istherefore not guaranteed to be safe."

While I'm impressed that NVIDIA even 'reinvents' MPI for their GPUs toserve the deep learning community, I don't think NCCL provides enoughbeyond MPI for PETSc.


Best regards,
Karli





On 6/17/20 4:13 AM, Junchao Zhang wrote:

It should be renamed as NCL (NVIDIA Communications Library) as it addspoint-to-point, in addition to collectives. I am not sure whether toimplement it in petsc as none exscale machine uses nvidia GPUs.
--Junchao Zhang
On Tue, Jun 16, 2020 at 6:44 PM Matthew Knepley <[email protected]<mailto:[email protected]>> wrote:
    It would seem to make more sense to just reverse-engineering this as
    another MPI impl.

        Matt

    On Tue, Jun 16, 2020 at 6:22 PM Barry Smith <[email protected]
    <mailto:[email protected]>> wrote:
--What most experimenters take for granted before they begin their
    experiments is infinitely more interesting than any results to which
    their experiments lead.
    -- Norbert Wiener

    https://www.cse.buffalo.edu/~knepley/
    <http://www.cse.buffalo.edu/~knepley/>

Re: [petsc-dev] https://developer.nvidia.com/nccl

Reply via email to