Is the connection from node1 to the head node a direct one, or is there a 
difference in the ethernet subnets between them? Can you show us the output of 
ifconfig from each node?


> On Sep 20, 2015, at 12:19 PM, Jorge D'Elia <jde...@intec.unl.edu.ar> wrote:
> 
> Hi all,
> 
> We have used the Open MPI distributions up to the 1.8.7 version 
> without any problem in a small LINUX cluster built with diskless 
> nodes (x86_64, Fedora 17, Linux version 4.1.1 (gcc version 4.7.2 
> 20120921 (Red Hat 4.7.2-2) (GCC))). 
> 
> However, from the 1.8.8 version, we have a problem with the 
> mpirun command. 
> 
> For instance, with the 1.10.0 Open MPI version, we can launch MPI 
> jobs across multiple node hosts and server sucesfully only if they 
> are launched from any node but not from the server. In order to 
> fix, following the hints given in
> 
> http://www.open-mpi.org/faq/?category=running#diagnose-multi-host-problems
> 
> we have tried a simple test:
> 
> [jdelia@coyote ~]$ which mpirun 
> /usr/beta/openmpi/bin/mpirun
> [jdelia@coyote ~]$ mpirun --version
> mpirun (Open MPI) 1.10.0
> [jdelia@coyote ~]$ hostname
> coyote
> [jdelia@coyote ~]$ ssh node1
> [jdelia@node1 ~]$ mpirun --host coyote hostname
> coyote
> [jdelia@node1 ~]$ exit
> logout
> Connection to node1 closed.
> [jdelia@coyote ~]$ mpirun --host node1 hostname
> [node1:17861] [[8026,0],1] tcp_peer_send_blocking: send() to socket 9 failed: 
> Broken pipe (32)
> --------------------------------------------------------------------------
> ORTE was unable to reliably start one or more daemons.
> This usually is caused by:
> ... snip ...
> --------------------------------------------------------------------------
> 
> The PATH and LD_LIBRARY_PATH in coyote (server) and node1 
> were reduced to
> 
> [jdelia@coyote ]$ ssh coyote env | grep -i PATH
> LD_LIBRARY_PATH=/usr/beta/openmpi/lib:/usr/beta/gcc-trunk/lib:/usr/beta/gcc-trunk/lib64:/usr/lib:/usr/lib64:/usr/local/lib:/usr/local/lib64
> PATH=.:/usr/beta/openmpi/bin:/usr/beta/gcc-trunk/bin:/usr/lib64/ccache:/usr/bin:/usr/sbin/usr/local/bin:/usr/local/sbin
> MODULEPATH=/usr/share/Modules/modulefiles:/etc/modulefiles
> QT_PLUGIN_PATH=/usr/lib64/kde4/plugins:/usr/lib/kde4/plugins
> 
> [jdelia@coyote ]$ ssh node1  env | grep -i PATH
> LD_LIBRARY_PATH=/usr/beta/openmpi/lib:/usr/beta/gcc-trunk/lib:/usr/beta/gcc-trunk/lib64:/usr/lib:/usr/lib64:/usr/local/lib:/usr/local/lib64
> PATH=.:/usr/beta/openmpi/bin:/usr/beta/gcc-trunk/bin:/usr/lib64/ccache:/usr/bin:/usr/sbin/usr/local/bin:/usr/local/sbin
> MODULEPATH=/usr/share/Modules/modulefiles:/etc/modulefiles
> 
> Until the 1.8.7 version these tests were all OK. Then, several 
> openmpi distributions were rebuilt using the gcc compilers, 
> both with the system version 
> 
> gcc (GCC) 4.7.2 20120921 (Red Hat 4.7.2-2)
> 
> as with the experimental one
> 
> $ gcc --version
> gcc (GCC) 6.0.0 20150919 (experimental)
> 
> but without luck. Again, if we go back to 1.8.7 version, and 
> using the same environment variables, all tests are OK. 
> 
> Please, any clue in order to fix this trouble?
> 
> We try to attach the configure log files of the 1.8.7 
> and 1.8.10 versions using the beta gcc compiler.
> 
> Thanks in advance.
> 
> Regards,
> Jorge.
> -- 
> CIMEC (UNL-CONICET), http://www.cimec.org.ar/
> Predio CONICET-Santa Fe, Colec. Ruta Nac. 168, 
> Paraje El Pozo, S3000GLN, Santa Fe, ARGENTINA
> Univ Nac Litoral (UNL). Cons Nac Inv Científ y Técn (CONICET)
> <make-logs.tgz>_______________________________________________
> users mailing list
> us...@open-mpi.org
> Subscription: http://www.open-mpi.org/mailman/listinfo.cgi/users
> Link to this post: 
> http://www.open-mpi.org/community/lists/users/2015/09/27633.php

Reply via email to