Is the connection from node1 to the head node a direct one, or is there a difference in the ethernet subnets between them? Can you show us the output of ifconfig from each node?
> On Sep 20, 2015, at 12:19 PM, Jorge D'Elia <jde...@intec.unl.edu.ar> wrote: > > Hi all, > > We have used the Open MPI distributions up to the 1.8.7 version > without any problem in a small LINUX cluster built with diskless > nodes (x86_64, Fedora 17, Linux version 4.1.1 (gcc version 4.7.2 > 20120921 (Red Hat 4.7.2-2) (GCC))). > > However, from the 1.8.8 version, we have a problem with the > mpirun command. > > For instance, with the 1.10.0 Open MPI version, we can launch MPI > jobs across multiple node hosts and server sucesfully only if they > are launched from any node but not from the server. In order to > fix, following the hints given in > > http://www.open-mpi.org/faq/?category=running#diagnose-multi-host-problems > > we have tried a simple test: > > [jdelia@coyote ~]$ which mpirun > /usr/beta/openmpi/bin/mpirun > [jdelia@coyote ~]$ mpirun --version > mpirun (Open MPI) 1.10.0 > [jdelia@coyote ~]$ hostname > coyote > [jdelia@coyote ~]$ ssh node1 > [jdelia@node1 ~]$ mpirun --host coyote hostname > coyote > [jdelia@node1 ~]$ exit > logout > Connection to node1 closed. > [jdelia@coyote ~]$ mpirun --host node1 hostname > [node1:17861] [[8026,0],1] tcp_peer_send_blocking: send() to socket 9 failed: > Broken pipe (32) > -------------------------------------------------------------------------- > ORTE was unable to reliably start one or more daemons. > This usually is caused by: > ... snip ... > -------------------------------------------------------------------------- > > The PATH and LD_LIBRARY_PATH in coyote (server) and node1 > were reduced to > > [jdelia@coyote ]$ ssh coyote env | grep -i PATH > LD_LIBRARY_PATH=/usr/beta/openmpi/lib:/usr/beta/gcc-trunk/lib:/usr/beta/gcc-trunk/lib64:/usr/lib:/usr/lib64:/usr/local/lib:/usr/local/lib64 > PATH=.:/usr/beta/openmpi/bin:/usr/beta/gcc-trunk/bin:/usr/lib64/ccache:/usr/bin:/usr/sbin/usr/local/bin:/usr/local/sbin > MODULEPATH=/usr/share/Modules/modulefiles:/etc/modulefiles > QT_PLUGIN_PATH=/usr/lib64/kde4/plugins:/usr/lib/kde4/plugins > > [jdelia@coyote ]$ ssh node1 env | grep -i PATH > LD_LIBRARY_PATH=/usr/beta/openmpi/lib:/usr/beta/gcc-trunk/lib:/usr/beta/gcc-trunk/lib64:/usr/lib:/usr/lib64:/usr/local/lib:/usr/local/lib64 > PATH=.:/usr/beta/openmpi/bin:/usr/beta/gcc-trunk/bin:/usr/lib64/ccache:/usr/bin:/usr/sbin/usr/local/bin:/usr/local/sbin > MODULEPATH=/usr/share/Modules/modulefiles:/etc/modulefiles > > Until the 1.8.7 version these tests were all OK. Then, several > openmpi distributions were rebuilt using the gcc compilers, > both with the system version > > gcc (GCC) 4.7.2 20120921 (Red Hat 4.7.2-2) > > as with the experimental one > > $ gcc --version > gcc (GCC) 6.0.0 20150919 (experimental) > > but without luck. Again, if we go back to 1.8.7 version, and > using the same environment variables, all tests are OK. > > Please, any clue in order to fix this trouble? > > We try to attach the configure log files of the 1.8.7 > and 1.8.10 versions using the beta gcc compiler. > > Thanks in advance. > > Regards, > Jorge. > -- > CIMEC (UNL-CONICET), http://www.cimec.org.ar/ > Predio CONICET-Santa Fe, Colec. Ruta Nac. 168, > Paraje El Pozo, S3000GLN, Santa Fe, ARGENTINA > Univ Nac Litoral (UNL). Cons Nac Inv Científ y Técn (CONICET) > <make-logs.tgz>_______________________________________________ > users mailing list > us...@open-mpi.org > Subscription: http://www.open-mpi.org/mailman/listinfo.cgi/users > Link to this post: > http://www.open-mpi.org/community/lists/users/2015/09/27633.php