Hi,

Are the recent peer to peer capabilities of cuda leveraged by Open MPI
when eg you're running a rank per gpu on the one workstation?

It seems in my testing that I only get in the order of about 1GB/s as
per http://www.open-mpi.org/community/lists/users/2011/03/15823.php,
whereas nvidia's simpleP2P test indicates ~6 GB/s.

Also, I ran into a problem just trying to test.  It seems you have to
do cudaSetDevice/cuCtxCreate with the appropriate gpu id which I was
wanting to derive from the rank.  You don't however know the rank
until after MPI_Init() and you need to initialize cuda before.  Not
sure if there's a standard way to do it?  I have a workaround atm.

Thanks,
Chris

Reply via email to