Hi all, In our cluster the nodes are interconnected with RoCE and I want to set up OpenMPI to run on it via SLURM. I initially compiled OpenMPI 1.10.2 only with IB verbs support and I have no problem making it run over RoCE. Then I have successfully built it with SLURM support as follows:
./configure --with-slurm --with-pmi=/usr/scheduler/slurm --with-verbs - -with-hwloc The problem is that I cannot let it use the RoCE network when I'm using srun. I also tried to export the OpenMPI runtime options but still I cannot correctly initialize the network: $ echo $OMPI_MCA_btl openib,self,sm $ echo $OMPI_MCA_btl_openib_cpc_include rdmacm $ srun -n 2 --mpi=pmi2 ./osu_latency --------------------------------------------------------------------- ----- No OpenFabrics connection schemes reported that they were able to be used on a specific port. As such, the openib BTL (OpenFabrics support) will be disabled for this port. Local host: test-vmp1245 Local device: mlx4_0 Local port: 2 CPCs attempted: udcm --------------------------------------------------------------------- ----- --------------------------------------------------------------------- ----- No OpenFabrics connection schemes reported that they were able to be used on a specific port. As such, the openib BTL (OpenFabrics support) will be disabled for this port. Local host: test-vmp1244 Local device: mlx4_0 Local port: 2 CPCs attempted: udcm --------------------------------------------------------------------- ----- --------------------------------------------------------------------- ----- At least one pair of MPI processes are unable to reach each other for MPI communications. This means that no Open MPI device has indicated that it can be used to communicate between these processes. This is an error; Open MPI requires that all MPI processes be able to reach each other. This error can sometimes be the result of forgetting to specify the "self" BTL. Process 1 ([[27,4],0]) is on host: test-vmp1244 Process 2 ([[27,4],1]) is on host: test-vmp1245 BTLs attempted: self Your MPI job is now going to abort; sorry. --------------------------------------------------------------------- ----- --------------------------------------------------------------------- ----- MPI_INIT has failed because at least one MPI process is unreachable from another. This *usually* means that an underlying communication plugin -- such as a BTL or an MTL -- has either not loaded or not allowed itself to be used. Your MPI job will now abort. You may wish to try to narrow down the problem; * Check the output of ompi_info to see which BTL/MTL plugins are available. * Run your application with MPI_THREAD_SINGLE. * Set the MCA parameter btl_base_verbose to 100 (or mtl_base_verbose, if using MTL-based communications) to see exactly which communication plugins were considered and/or discarded. --------------------------------------------------------------------- ----- *** An error occurred in MPI_Init *** on a NULL communicator *** MPI_ERRORS_ARE_FATAL (processes in this communicator will now abort, *** and potentially your MPI job) [test-vmp1245:3603] Local abort before MPI_INIT completed successfully; not able to aggregate error messages, and not able to guarantee that all other processes were killed! srun: error: test-vmp1244: task 0: Exited with exit code 1 srun: error: test-vmp1245: task 1: Exited with exit code 1 Any suggestion? Thanks! Davide