Hi, I did make sure at the beginning that only eth0 was activated on all the nodes. Nevertheless, I am currently verifying the NIC configuration on all the nodes and making sure things are as expected.
While trying different things, I did come across this peculiar error which I had detailed in one of my previous mails in this thread. I am testing the helloWorld program in the following trivial case: mpirun -np 1 -host localhost /main/mpiHelloWorld Which works fine. But, mpirun -np 1 -host localhost --debug-daemons /main/mpiHelloWorld always fails as follows: Daemon [0,0,1] checking in as pid 2059 on host localhost [idx1:02059] [0,0,1] orted: received launch callback idx1 is node 0 of 1 ranks sum to 0 [idx1:02059] [0,0,1] orted_recv_pls: received message from [0,0,0] [idx1:02059] [0,0,1] orted_recv_pls: received exit [idx1:02059] *** Process received signal *** [idx1:02059] Signal: Segmentation fault (11) [idx1:02059] Signal code: (128) [idx1:02059] Failing at address: (nil) [idx1:02059] [ 0] /lib/libpthread.so.0 [0x2afa8c597f30] [idx1:02059] [ 1] /usr/lib64/libopen-rte.so.0(orte_pls_base_close+0x18) [0x2afa8be8e2a2] [idx1:02059] [ 2] /usr/lib64/libopen-rte.so.0(orte_system_finalize+0x70) [0x2afa8be795ac] [idx1:02059] [ 3] /usr/lib64/libopen-rte.so.0(orte_finalize+0x20) [0x2afa8be7675c] [idx1:02059] [ 4] orted(main+0x8a6) [0x4024ae] [idx1:02059] *** End of error message *** The failure happens with more verbose output when using the -d flag. Does this point to some bug in OpenMPI or am I missing something here? I have attached ompi_info output on this node. Regards, Prasanna.
ompi_info.txt
Description: Binary data