I'm running OpenMPI 4.0.0 built with gdrcopy 1.3 and UCX 1.4 per the
instructions at https://www.open-mpi.org/faq/?category=buildcuda, built
against CUDA 10.0 on RHEL 7.  I'm running on a p2.xlarge instance in AWS
(single NVIDIA K80 GPU).  OpenMPI reports CUDA support:
$ ompi_info --parsable --all | grep mpi_built_with_cuda_support:value
mca:mpi:base:param:mpi_built_with_cuda_support:value:true

I'm attempting to use MPI_Ialltoall() to overlap a block of GPU
computations with network transfers, using MPI_Test() to nudge async
transfers along.  Based on the table 5 I see in
https://www.open-mpi.org/faq/?category=runcuda, MPI_Ialltoall() should be
supported (though I don't see MPI_Test() called out as supported or not
supported... though my example crashes with or without it).  The behavior
I'm seeing is that when running with a small number of elements, everything
runs without issue.  However, for a larger number of elements (where
"large" is just a few hundred), I start to get errors like this
"cma_ep.c:113  UCX  ERROR process_vm_readv delivered 0 instead of 16000,
error message Bad address".  Changing to synchronous MPI_alltoall() results
in the program running successfully.

I tried boiling my issue down to the simplest problem I could that
recreates the crash.  Note that this needs to be compiled with
"--std=c++11".  Running "mpirun -np 2 mpi_test_ialltoall 200 256 10" runs
successfully; changing the 200 to a 400 results in a crash after a few
blocks.  Thanks for any thoughts.

Code sample:
https://gist.github.com/asylvest/7c9d5c15a3a044a0a2338cf9c828d2c3

-Adam
_______________________________________________
users mailing list
users@lists.open-mpi.org
https://lists.open-mpi.org/mailman/listinfo/users

Reply via email to