You can also get info specifically on the openib params by: ompi_info —param btl openib —level 9
Your error indicates that udcm may not be enabled on your infiniband network, and I don’t see it listed in your IB modules - we require it for use of the opneib btl. Ralph > On Oct 2, 2015, at 2:49 PM, Surivinta Surivinta <surivi...@gmail.com> wrote: > > Maybe here you found what need: > https://www.open-mpi.org/faq/?category=tuning#setting-mca-params > <https://www.open-mpi.org/faq/?category=tuning#setting-mca-params> > also you can try run gdb via mpirun to get debug info: > mpirun -np 2 xterm -e gdb ./your_program > > > 2015-09-28 14:43 GMT+03:00 Sven Schumacher <schumac...@tfd.uni-hannover.de > <mailto:schumac...@tfd.uni-hannover.de>>: > Hello, > > I've set up our new cluster using Infiniband using a combination of: > Debian, Torque/Maui, BeeGeeFS (formerly FHGFS) > > Every node has two infiniband-ports, both of them having an IP-Adress. > One port shall be used for BeeGeeFS (which is working well) and the > other one for MPI-Communication. > > I'm using openmpi in version 1.8.5, compiled with gcc/gfortran 4.9.2 and > ibverbs support. > Configure command was the following: > > Output of "ompi_info --parsable -a -c" is attached as txt-file (all > nodes are configured the same) > > > The following infiniband-related kernel-modules are loaded: > > mlx4_core 206165 1 mlx4_ib > > rdma_ucm 22055 0 > > ib_uverbs 44693 1 rdma_ucm > > rdma_cm 39518 2 ib_iser,rdma_ucm > > iw_cm 31011 1 rdma_cm > > ib_umad 17311 0 > > mlx4_ib 136293 0 > > ib_cm 39055 3 rdma_cm,ib_srp,ib_ipoib > > ib_sa 26986 6 > > rdma_cm,ib_cm,mlx4_ib,ib_srp,rdma_ucm,ib_ipoib > > ib_mad 39969 4 ib_cm,ib_sa,mlx4_ib,ib_umad > > ib_core 68904 12 > > rdma_cm,ib_cm,ib_sa,iw_cm,mlx4_ib,ib_mad,ib_srp,ib_iser,ib_umad,ib_uverbs,rdma_ucm,ib_ipoib > > ib_addr 17148 3 rdma_cm,ib_core,rdma_ucm > > ib_iser 44204 0 > > iscsi_tcp 17580 0 > > libiscsi_tcp 21554 1 iscsi_tcp > > libiscsi 48004 3 libiscsi_tcp,iscsi_tcp,ib_iser > > scsi_transport_iscsi 77478 4 iscsi_tcp,ib_iser,libiscsi > > ib_ipoib 85167 0 > > ib_srp 39710 0 > > scsi_transport_srp 18194 1 ib_srp > > scsi_tgt 17698 1 scsi_transport_srp > > When using mpiexec to execute a job running on a single node using 8 > cores everything works fine, but when mpiexec has to start a second > process on another node it doesn't start that process. > What I already did: > > Testing ssh-logins: Works (without a password using ssh-keys). > Testing name-resolution: works > > Used a "hello Word"-mpi-Program: > > #include <mpi.h> > > #include <stdio.h> > > > > int main(int argc, char** argv) { > > // Initialize the MPI environment > > MPI_Init(NULL, NULL); > > > > // Get the number of processes > > int world_size; > > MPI_Comm_size(MPI_COMM_WORLD, &world_size); > > > > // Get the rank of the process > > int world_rank; > > MPI_Comm_rank(MPI_COMM_WORLD, &world_rank); > > > > // Get the name of the processor > > char processor_name[MPI_MAX_PROCESSOR_NAME]; > > int name_len; > > MPI_Get_processor_name(processor_name, &name_len); > > > > // Print off a hello world message > > printf("Hello world from processor %s, rank %d" > > " out of %d processors\n", > > processor_name, world_rank, world_size); > > > > // Finalize the MPI environment. > > MPI_Finalize(); > > } > > > This throws an error (on a single node it produces the following error > messages, but doesn't produce any output , when run on two nodes): > > [hydra001:20324] 1 more process has sent help message > > help-mpi-btl-openib-cpc-base.txt / no cpcs for port > > [hydra001:20324] Set MCA parameter "orte_base_help_aggregate" to 0 to > > see all help / error messages > > > -------------------------------------------------------------------------- > > No OpenFabrics connection schemes reported that they were able to be > > used on a specific port. As such, the openib BTL (OpenFabrics > > support) will be disabled for this port. > > > > Local host: hydra001 > > Local device: mlx4_0 > > Local port: 1 > > CPCs attempted: udcm > > -------------------------------------------------------------------------- > > Hello world from processor hydra001, rank 0 out of 1 processors > > So, where can I find a documented list of all these MCA parameters? It > doesn't seem there is such a list on open-mpi.org <http://open-mpi.org/> or I > didn't find it... > so thanks in advance for directing me to right place > > Sven Schumacher > > > > > > > -- > Sven Schumacher - Systemadministrator Tel: (0511)762-2753 > Leibniz Universitaet Hannover > Institut für Turbomaschinen und Fluid-Dynamik - TFD > Appelstraße 9 - 30167 Hannover > Institut für Kraftwerkstechnik und Wärmeübertragung - IKW > Callinstraße 36 - 30167 Hannover > > > _______________________________________________ > users mailing list > us...@open-mpi.org <mailto:us...@open-mpi.org> > Subscription: http://www.open-mpi.org/mailman/listinfo.cgi/users > <http://www.open-mpi.org/mailman/listinfo.cgi/users> > Link to this post: > http://www.open-mpi.org/community/lists/users/2015/09/27695.php > <http://www.open-mpi.org/community/lists/users/2015/09/27695.php> > > > > -- > -- > С уважением. > _______________________________________________ > users mailing list > us...@open-mpi.org > Subscription: http://www.open-mpi.org/mailman/listinfo.cgi/users > Link to this post: > http://www.open-mpi.org/community/lists/users/2015/10/27773.php