Hi Ralph,

Many thanks for your fast answer!

----- Mensaje original -----
> De: "Ralph Castain" <r...@open-mpi.org>
> Para: "Open MPI Users" <us...@open-mpi.org>
> Enviado: Domingo, 20 de Septiembre 2015 18:16:56
> Asunto: Re: [OMPI users] send() to socket 9 failed with the 1.10.0 version 
> but not with 1.8.7 one.
> 
> Is the connection from node1 to the head node a direct one, 
> or is there a difference in the ethernet subnets between them? 

The connection from node1 to the head node is a direct one, 
i.e. from the head node to the switch and from the switch to 
the computing nodes.

> Can you show us the output of ifconfig from each node?

Yes of course! Please see attached tgz file that also 
contains the ompi_info logs.

Thanks.
Jorge.
 
> > On Sep 20, 2015, at 12:19 PM, Jorge D'Elia <jde...@intec.unl.edu.ar> wrote:
> > 
> > Hi all,
> > 
> > We have used the Open MPI distributions up to the 1.8.7 version
> > without any problem in a small LINUX cluster built with diskless
> > nodes (x86_64, Fedora 17, Linux version 4.1.1 (gcc version 4.7.2
> > 20120921 (Red Hat 4.7.2-2) (GCC))).
> > 
> > However, from the 1.8.8 version, we have a problem with the
> > mpirun command.
> > 
> > For instance, with the 1.10.0 Open MPI version, we can launch MPI
> > jobs across multiple node hosts and server sucesfully only if they
> > are launched from any node but not from the server. In order to
> > fix, following the hints given in
> > 
> > http://www.open-mpi.org/faq/?category=running#diagnose-multi-host-problems
> > 
> > we have tried a simple test:
> > 
> > [jdelia@coyote ~]$ which mpirun
> > /usr/beta/openmpi/bin/mpirun
> > [jdelia@coyote ~]$ mpirun --version
> > mpirun (Open MPI) 1.10.0
> > [jdelia@coyote ~]$ hostname
> > coyote
> > [jdelia@coyote ~]$ ssh node1
> > [jdelia@node1 ~]$ mpirun --host coyote hostname
> > coyote
> > [jdelia@node1 ~]$ exit
> > logout
> > Connection to node1 closed.
> > [jdelia@coyote ~]$ mpirun --host node1 hostname
> > [node1:17861] [[8026,0],1] tcp_peer_send_blocking: send() to socket 9
> > failed: Broken pipe (32)
> > --------------------------------------------------------------------------
> > ORTE was unable to reliably start one or more daemons.
> > This usually is caused by:
> > ... snip ...
> > --------------------------------------------------------------------------
> > 
> > The PATH and LD_LIBRARY_PATH in coyote (server) and node1
> > were reduced to
> > 
> > [jdelia@coyote ]$ ssh coyote env | grep -i PATH
> > LD_LIBRARY_PATH=/usr/beta/openmpi/lib:/usr/beta/gcc-trunk/lib:/usr/beta/gcc-trunk/lib64:/usr/lib:/usr/lib64:/usr/local/lib:/usr/local/lib64
> > PATH=.:/usr/beta/openmpi/bin:/usr/beta/gcc-trunk/bin:/usr/lib64/ccache:/usr/bin:/usr/sbin/usr/local/bin:/usr/local/sbin
> > MODULEPATH=/usr/share/Modules/modulefiles:/etc/modulefiles
> > QT_PLUGIN_PATH=/usr/lib64/kde4/plugins:/usr/lib/kde4/plugins
> > 
> > [jdelia@coyote ]$ ssh node1  env | grep -i PATH
> > LD_LIBRARY_PATH=/usr/beta/openmpi/lib:/usr/beta/gcc-trunk/lib:/usr/beta/gcc-trunk/lib64:/usr/lib:/usr/lib64:/usr/local/lib:/usr/local/lib64
> > PATH=.:/usr/beta/openmpi/bin:/usr/beta/gcc-trunk/bin:/usr/lib64/ccache:/usr/bin:/usr/sbin/usr/local/bin:/usr/local/sbin
> > MODULEPATH=/usr/share/Modules/modulefiles:/etc/modulefiles
> > 
> > Until the 1.8.7 version these tests were all OK. Then, several
> > openmpi distributions were rebuilt using the gcc compilers,
> > both with the system version
> > 
> > gcc (GCC) 4.7.2 20120921 (Red Hat 4.7.2-2)
> > 
> > as with the experimental one
> > 
> > $ gcc --version
> > gcc (GCC) 6.0.0 20150919 (experimental)
> > 
> > but without luck. Again, if we go back to 1.8.7 version, and
> > using the same environment variables, all tests are OK.
> > 
> > Please, any clue in order to fix this trouble?
> > 
> > We try to attach the configure log files of the 1.8.7
> > and 1.8.10 versions using the beta gcc compiler.
> > 
> > Thanks in advance.
> > 
> > Regards,
> > Jorge.
> > --
> > CIMEC (UNL-CONICET), http://www.cimec.org.ar/
> > Predio CONICET-Santa Fe, Colec. Ruta Nac. 168,
> > Paraje El Pozo, S3000GLN, Santa Fe, ARGENTINA
> > Univ Nac Litoral (UNL). Cons Nac Inv Científ y Técn (CONICET)
> > logs.tgz>_______________________________________________
> > users mailing list
> > us...@open-mpi.org
> > Subscription: http://www.open-mpi.org/mailman/listinfo.cgi/users
> > Link to this post:
> > http://www.open-mpi.org/community/lists/users/2015/09/27633.php
> 
> _______________________________________________
> users mailing list
> us...@open-mpi.org
> Subscription: http://www.open-mpi.org/mailman/listinfo.cgi/users
> Link to this post:
> http://www.open-mpi.org/community/lists/users/2015/09/27636.php

Attachment: ifconfig-ompi-info-log.tgz
Description: application/compressed-tar

Reply via email to