vel.
Thanks,
Justin
From: users-boun...@open-mpi.org<mailto:users-boun...@open-mpi.org>
[users-boun...@open-mpi.org] On Behalf Of Rolf vandeVaart
[rvandeva...@nvidia.com]
Sent: Thursday, December 13, 2012 6:18 AM
To: Open MPI Users
Subject: Re: [OMPI users] S
PI Users
Subject: Re: [OMPI users] Stream interactions in CUDA
Hi Justin:
I assume you are running on a single node. In that case, Open MPI is supposed
to take advantage of the CUDA IPC support. This will be used only when
messages are larger than 4K, which yours are. In that case, I would hav
.@open-mpi.org]
>On Behalf Of Jens Glaser
>Sent: Wednesday, December 12, 2012 8:12 PM
>To: Open MPI Users
>Subject: Re: [OMPI users] Stream interactions in CUDA
>
>Hi Justin
>
>from looking at your code it seems you are receiving more bytes from the
>processors then you se
Hi Justin
from looking at your code it seems you are receiving more bytes from the
processors then you send (I assume MAX_RECV_SIZE_PER_PE > send_sizes[p]).
I don't think this is valid. Your transfers should have matched sizes on the
sending and receiving side. To achieve this, either communicat
Hi Justin,
Quick grepping reveals several cuMemcpy calls in OpenMPI. Some of them are
even synchronous, meaning stream0.
I think the best way of exploring this sort of behavior is to execute
OpenMPI runtime (thanks to its open-source nature!) under debugger. Rebuild
OpenMPI with -g -O0, add some
Hello,
I'm working on an application using OpenMPI with CUDA and GPUDirect. I would
like to get the MPI transfers to overlap with computation on the CUDA device.
To do this I need to ensure that all memory transfers do not go to stream 0.
In this application I have one step that performs an