Hi all,

We have used the Open MPI distributions up to the 1.8.7 version 
without any problem in a small LINUX cluster built with diskless 
nodes (x86_64, Fedora 17, Linux version 4.1.1 (gcc version 4.7.2 
20120921 (Red Hat 4.7.2-2) (GCC))). 

However, from the 1.8.8 version, we have a problem with the 
mpirun command. 

For instance, with the 1.10.0 Open MPI version, we can launch MPI 
jobs across multiple node hosts and server sucesfully only if they 
are launched from any node but not from the server. In order to 
fix, following the hints given in

http://www.open-mpi.org/faq/?category=running#diagnose-multi-host-problems

we have tried a simple test:

[jdelia@coyote ~]$ which mpirun 
/usr/beta/openmpi/bin/mpirun
[jdelia@coyote ~]$ mpirun --version
mpirun (Open MPI) 1.10.0
[jdelia@coyote ~]$ hostname
coyote
[jdelia@coyote ~]$ ssh node1
[jdelia@node1 ~]$ mpirun --host coyote hostname
coyote
[jdelia@node1 ~]$ exit
logout
Connection to node1 closed.
[jdelia@coyote ~]$ mpirun --host node1 hostname
[node1:17861] [[8026,0],1] tcp_peer_send_blocking: send() to socket 9 failed: 
Broken pipe (32)
--------------------------------------------------------------------------
ORTE was unable to reliably start one or more daemons.
This usually is caused by:
... snip ...
--------------------------------------------------------------------------

The PATH and LD_LIBRARY_PATH in coyote (server) and node1 
were reduced to
 
[jdelia@coyote ]$ ssh coyote env | grep -i PATH
LD_LIBRARY_PATH=/usr/beta/openmpi/lib:/usr/beta/gcc-trunk/lib:/usr/beta/gcc-trunk/lib64:/usr/lib:/usr/lib64:/usr/local/lib:/usr/local/lib64
PATH=.:/usr/beta/openmpi/bin:/usr/beta/gcc-trunk/bin:/usr/lib64/ccache:/usr/bin:/usr/sbin/usr/local/bin:/usr/local/sbin
MODULEPATH=/usr/share/Modules/modulefiles:/etc/modulefiles
QT_PLUGIN_PATH=/usr/lib64/kde4/plugins:/usr/lib/kde4/plugins

[jdelia@coyote ]$ ssh node1  env | grep -i PATH
LD_LIBRARY_PATH=/usr/beta/openmpi/lib:/usr/beta/gcc-trunk/lib:/usr/beta/gcc-trunk/lib64:/usr/lib:/usr/lib64:/usr/local/lib:/usr/local/lib64
PATH=.:/usr/beta/openmpi/bin:/usr/beta/gcc-trunk/bin:/usr/lib64/ccache:/usr/bin:/usr/sbin/usr/local/bin:/usr/local/sbin
MODULEPATH=/usr/share/Modules/modulefiles:/etc/modulefiles

Until the 1.8.7 version these tests were all OK. Then, several 
openmpi distributions were rebuilt using the gcc compilers, 
both with the system version 

gcc (GCC) 4.7.2 20120921 (Red Hat 4.7.2-2)

as with the experimental one

$ gcc --version
gcc (GCC) 6.0.0 20150919 (experimental)

but without luck. Again, if we go back to 1.8.7 version, and 
using the same environment variables, all tests are OK. 

Please, any clue in order to fix this trouble?

We try to attach the configure log files of the 1.8.7 
and 1.8.10 versions using the beta gcc compiler.

Thanks in advance.

Regards,
Jorge.
-- 
CIMEC (UNL-CONICET), http://www.cimec.org.ar/
Predio CONICET-Santa Fe, Colec. Ruta Nac. 168, 
Paraje El Pozo, S3000GLN, Santa Fe, ARGENTINA
Univ Nac Litoral (UNL). Cons Nac Inv Científ y Técn (CONICET)

Attachment: make-logs.tgz
Description: application/compressed-tar

Reply via email to