FYI for others that have run into the same problem, see
https://github.com/openucx/ucx/issues/3359. In short:
1. Use UCX 1.5 rather than 1.4 (I recommend updating
https://www.open-mpi.org/faq/?category=buildcuda)
2. Dynamically link in the cudart library (by default nvcc will statically
link it).
I'm running OpenMPI 4.0.0 built with gdrcopy 1.3 and UCX 1.4 per the
instructions at https://www.open-mpi.org/faq/?category=buildcuda, built
against CUDA 10.0 on RHEL 7. I'm running on a p2.xlarge instance in AWS
(single NVIDIA K80 GPU). OpenMPI reports CUDA support:
$ ompi_info --parsable --all