Hi Ralph, Many thanks for your fast answer!
----- Mensaje original ----- > De: "Ralph Castain" <r...@open-mpi.org> > Para: "Open MPI Users" <us...@open-mpi.org> > Enviado: Domingo, 20 de Septiembre 2015 18:16:56 > Asunto: Re: [OMPI users] send() to socket 9 failed with the 1.10.0 version > but not with 1.8.7 one. > > Is the connection from node1 to the head node a direct one, > or is there a difference in the ethernet subnets between them? The connection from node1 to the head node is a direct one, i.e. from the head node to the switch and from the switch to the computing nodes. > Can you show us the output of ifconfig from each node? Yes of course! Please see attached tgz file that also contains the ompi_info logs. Thanks. Jorge. > > On Sep 20, 2015, at 12:19 PM, Jorge D'Elia <jde...@intec.unl.edu.ar> wrote: > > > > Hi all, > > > > We have used the Open MPI distributions up to the 1.8.7 version > > without any problem in a small LINUX cluster built with diskless > > nodes (x86_64, Fedora 17, Linux version 4.1.1 (gcc version 4.7.2 > > 20120921 (Red Hat 4.7.2-2) (GCC))). > > > > However, from the 1.8.8 version, we have a problem with the > > mpirun command. > > > > For instance, with the 1.10.0 Open MPI version, we can launch MPI > > jobs across multiple node hosts and server sucesfully only if they > > are launched from any node but not from the server. In order to > > fix, following the hints given in > > > > http://www.open-mpi.org/faq/?category=running#diagnose-multi-host-problems > > > > we have tried a simple test: > > > > [jdelia@coyote ~]$ which mpirun > > /usr/beta/openmpi/bin/mpirun > > [jdelia@coyote ~]$ mpirun --version > > mpirun (Open MPI) 1.10.0 > > [jdelia@coyote ~]$ hostname > > coyote > > [jdelia@coyote ~]$ ssh node1 > > [jdelia@node1 ~]$ mpirun --host coyote hostname > > coyote > > [jdelia@node1 ~]$ exit > > logout > > Connection to node1 closed. > > [jdelia@coyote ~]$ mpirun --host node1 hostname > > [node1:17861] [[8026,0],1] tcp_peer_send_blocking: send() to socket 9 > > failed: Broken pipe (32) > > -------------------------------------------------------------------------- > > ORTE was unable to reliably start one or more daemons. > > This usually is caused by: > > ... snip ... > > -------------------------------------------------------------------------- > > > > The PATH and LD_LIBRARY_PATH in coyote (server) and node1 > > were reduced to > > > > [jdelia@coyote ]$ ssh coyote env | grep -i PATH > > LD_LIBRARY_PATH=/usr/beta/openmpi/lib:/usr/beta/gcc-trunk/lib:/usr/beta/gcc-trunk/lib64:/usr/lib:/usr/lib64:/usr/local/lib:/usr/local/lib64 > > PATH=.:/usr/beta/openmpi/bin:/usr/beta/gcc-trunk/bin:/usr/lib64/ccache:/usr/bin:/usr/sbin/usr/local/bin:/usr/local/sbin > > MODULEPATH=/usr/share/Modules/modulefiles:/etc/modulefiles > > QT_PLUGIN_PATH=/usr/lib64/kde4/plugins:/usr/lib/kde4/plugins > > > > [jdelia@coyote ]$ ssh node1 env | grep -i PATH > > LD_LIBRARY_PATH=/usr/beta/openmpi/lib:/usr/beta/gcc-trunk/lib:/usr/beta/gcc-trunk/lib64:/usr/lib:/usr/lib64:/usr/local/lib:/usr/local/lib64 > > PATH=.:/usr/beta/openmpi/bin:/usr/beta/gcc-trunk/bin:/usr/lib64/ccache:/usr/bin:/usr/sbin/usr/local/bin:/usr/local/sbin > > MODULEPATH=/usr/share/Modules/modulefiles:/etc/modulefiles > > > > Until the 1.8.7 version these tests were all OK. Then, several > > openmpi distributions were rebuilt using the gcc compilers, > > both with the system version > > > > gcc (GCC) 4.7.2 20120921 (Red Hat 4.7.2-2) > > > > as with the experimental one > > > > $ gcc --version > > gcc (GCC) 6.0.0 20150919 (experimental) > > > > but without luck. Again, if we go back to 1.8.7 version, and > > using the same environment variables, all tests are OK. > > > > Please, any clue in order to fix this trouble? > > > > We try to attach the configure log files of the 1.8.7 > > and 1.8.10 versions using the beta gcc compiler. > > > > Thanks in advance. > > > > Regards, > > Jorge. > > -- > > CIMEC (UNL-CONICET), http://www.cimec.org.ar/ > > Predio CONICET-Santa Fe, Colec. Ruta Nac. 168, > > Paraje El Pozo, S3000GLN, Santa Fe, ARGENTINA > > Univ Nac Litoral (UNL). Cons Nac Inv Científ y Técn (CONICET) > > logs.tgz>_______________________________________________ > > users mailing list > > us...@open-mpi.org > > Subscription: http://www.open-mpi.org/mailman/listinfo.cgi/users > > Link to this post: > > http://www.open-mpi.org/community/lists/users/2015/09/27633.php > > _______________________________________________ > users mailing list > us...@open-mpi.org > Subscription: http://www.open-mpi.org/mailman/listinfo.cgi/users > Link to this post: > http://www.open-mpi.org/community/lists/users/2015/09/27636.php
ifconfig-ompi-info-log.tgz
Description: application/compressed-tar