Re: [OMPI users] cuIpcOpenMemHandle failure when using OpenMPI 1.8.5 with CUDA 7.0 and Multi-Process Service

Rolf vandeVaart Thu, 21 May 2015 15:04:33 -0400 (EDT)

Answers below...
>-----Original Message-----
>From: users [mailto:users-boun...@open-mpi.org] On Behalf Of Lev Givon
>Sent: Thursday, May 21, 2015 2:19 PM
>To: Open MPI Users
>Subject: Re: [OMPI users] cuIpcOpenMemHandle failure when using
>OpenMPI 1.8.5 with CUDA 7.0 and Multi-Process Service
>
>Received from Lev Givon on Thu, May 21, 2015 at 11:32:33AM EDT:
>> Received from Rolf vandeVaart on Wed, May 20, 2015 at 07:48:15AM EDT:
>>
>> (snip)
>>
>> > I see that you mentioned you are starting 4 MPS daemons.  Are you
>> > following the instructions here?
>> >
>> > http://cudamusing.blogspot.de/2013/07/enabling-cuda-multi-process-se
>> > rvice-mps.html
>>
>> Yes - also
>>
>https://docs.nvidia.com/deploy/pdf/CUDA_Multi_Process_Service_Overvie
>w
>> .pdf
>>
>> > This relies on setting CUDA_VISIBLE_DEVICES which can cause problems
>> > for CUDA IPC. Since you are using CUDA 7 there is no more need to
>> > start multiple daemons. You simply leave CUDA_VISIBLE_DEVICES
>> > untouched and start a single MPS control daemon which will handle all
>GPUs.  Can you try that?
>>
>> I assume that this means that only one CUDA_MPS_PIPE_DIRECTORY value
>> should be passed to all MPI processes.
There is no need to do anything with CUDA_MPS_PIPE_DIRECTORY with CUDA 7.


>>
>> Several questions related to your comment above:
>>
>> - Should the MPI processes select and initialize the GPUs they respectively 
>> need
>>   to access as they normally would when MPS is not in use?
Yes.  

>> - Can CUDA_VISIBLE_DEVICES be used to control what GPUs are visible to MPS 
>> (and
>>   hence the client processes)? I ask because SLURM uses CUDA_VISIBLE_DEVICES 
>> to
>>   control GPU resource allocation, and I would like to run my program (and 
>> the
>>   MPS control daemon) on a cluster via SLURM.
Yes, I believe that is true.  

>> - Does the clash between setting CUDA_VISIBLE_DEVICES and CUDA IPC imply that
>>   MPS and CUDA IPC cannot reliably be used simultaneously in a multi-GPU 
>> setting
>>   with CUDA 6.5 even when one starts multiple MPS control daemons as  
>> described
>>   in the aforementioned blog post?
>
>Using a single control daemon with CUDA_VISIBLE_DEVICES unset appears to
>solve the problem when IPC is enabled.
>--
Glad to see this worked.  And you are correct that CUDA IPC will not work 
between devices if they are segregated by the use of CUDA_VISIBLE_DEVICES as we 
do with MPS in 6.5.

Rolf
-----------------------------------------------------------------------------------
This email message is for the sole use of the intended recipient(s) and may 
contain
confidential information.  Any unauthorized review, use, disclosure or 
distribution
is prohibited.  If you are not the intended recipient, please contact the 
sender by
reply email and destroy all copies of the original message.
-----------------------------------------------------------------------------------

Re: [OMPI users] cuIpcOpenMemHandle failure when using OpenMPI 1.8.5 with CUDA 7.0 and Multi-Process Service

Reply via email to