----- Mensaje original -----
> De: "Ralph Castain" <r...@open-mpi.org>
> Para: "Open MPI Users" <us...@open-mpi.org>
> Enviado: Lunes, 21 de Septiembre 2015 1:42:08
> Asunto: Re: [OMPI users] send() to socket 9 failed with the 1.10.0 version 
> but not with 1.8.7 one.
> 
> Okay, let’s try doing this:
> 
> mpirun -mca oob_tcp_if_include br0 …
> 
> This will restrict us to the br0 interface that is common to the two nodes. 

It works fine! Here I copy and paste a session using 
the hello_usempi_f08.f90 sample:

[jdelia@coyote 1.10.0]$ mpifort --version 
GNU Fortran (GCC) 6.0.0 20150919 (experimental)
Copyright (C) 2015 Free Software Foundation, Inc.

[jdelia@coyote 1.10.0]$ mpirun --version 
mpirun (Open MPI) 1.10.0
Report bugs to http://www.open-mpi.org/community/help/

[jdelia@coyote 1.10.0]$ mpifort -o hello_usempi_f08.exe hello_usempi_f08.f90

[jdelia@coyote 1.10.0]$ cat ~/machi-openmpi.dat
coyote slots=2 max_slots=2
node1  slots=2 max_slots=6
node2  slots=2 max_slots=8

[jdelia@coyote 1.10.0]$ mpirun --mca btl self,tcp --map-by node 
  --mca oob_tcp_if_include br0 --np 5 --report-bindings --machinefile 
  ~/machi-openmpi.dat hello_usempi_f08.exe 
[coyote:28957] MCW rank 3 is not bound (or bound to all available processors)
[coyote:28957] MCW rank 0 is not bound (or bound to all available processors)
[node2:11855] MCW rank 2 is not bound (or bound to all available processors)
[node1:24048] MCW rank 4 is not bound (or bound to all available processors)
[node1:24048] MCW rank 1 is not bound (or bound to all available processors)

Hello, world, I am  0 of  5: Open MPI v1.10, package: Open MPI jdelia@coyote 
Distribution, ident: 1.10.0, repo rev: v1.10-dev-293-gf694355, Aug 24, 2015

Hello, world, I am  3 of  5: Open MPI v1.10, package: Open MPI jdelia@coyote 
Distribution, ident: 1.10.0, repo rev: v1.10-dev-293-gf694355, Aug 24, 2015
      
Hello, world, I am  2 of  5: Open MPI v1.10, package: Open MPI jdelia@coyote 
Distribution, ident: 1.10.0, repo rev: v1.10-dev-293-gf694355, Aug 24, 2015     
 

Hello, world, I am  1 of  5: Open MPI v1.10, package: Open MPI jdelia@coyote 
Distribution, ident: 1.10.0, repo rev: v1.10-dev-293-gf694355, Aug 24, 2015

Hello, world, I am  4 of  5: Open MPI v1.10, package: Open MPI jdelia@coyote 
Distribution, ident: 1.10.0, repo rev: v1.10-dev-293-gf694355, Aug 24, 2015


> I note that your “node1” has two interfaces on the same subnet (192.168.1),
> which is usually a “no-no” that can cause trouble. Let’s see if removing
> that confusion helps.

OK. Thanks for noticing. We will try to remove it and will let you know.

Regards,
Jorge.
 
> > On Sep 20, 2015, at 3:45 PM, Jorge D'Elia <jde...@intec.unl.edu.ar> wrote:
> > 
> > Hi Ralph,
> > 
> > Many thanks for your fast answer!
> > 
> > ----- Mensaje original -----
> >> De: "Ralph Castain" <r...@open-mpi.org <mailto:r...@open-mpi.org>>
> >> Para: "Open MPI Users" <us...@open-mpi.org <mailto:us...@open-mpi.org>>
> >> Enviado: Domingo, 20 de Septiembre 2015 18:16:56
> >> Asunto: Re: [OMPI users] send() to socket 9 failed with the 1.10.0 version
> >> but not with 1.8.7 one.
> >> 
> >> Is the connection from node1 to the head node a direct one,
> >> or is there a difference in the ethernet subnets between them?
> > 
> > The connection from node1 to the head node is a direct one,
> > i.e. from the head node to the switch and from the switch to
> > the computing nodes.
> > 
> >> Can you show us the output of ifconfig from each node?
> > 
> > Yes of course! Please see attached tgz file that also
> > contains the ompi_info logs.
> > 
> > Thanks.
> > Jorge.
> > 
> >>> On Sep 20, 2015, at 12:19 PM, Jorge D'Elia <jde...@intec.unl.edu.ar>
> >>> wrote:
> >>> 
> >>> Hi all,
> >>> 
> >>> We have used the Open MPI distributions up to the 1.8.7 version
> >>> without any problem in a small LINUX cluster built with diskless
> >>> nodes (x86_64, Fedora 17, Linux version 4.1.1 (gcc version 4.7.2
> >>> 20120921 (Red Hat 4.7.2-2) (GCC))).
> >>> 
> >>> However, from the 1.8.8 version, we have a problem with the
> >>> mpirun command.
> >>> 
> >>> For instance, with the 1.10.0 Open MPI version, we can launch MPI
> >>> jobs across multiple node hosts and server sucesfully only if they
> >>> are launched from any node but not from the server. In order to
> >>> fix, following the hints given in
> >>> 
> >>> http://www.open-mpi.org/faq/?category=running#diagnose-multi-host-problems
> >>> 
> >>> we have tried a simple test:
> >>> 
> >>> [jdelia@coyote ~]$ which mpirun
> >>> /usr/beta/openmpi/bin/mpirun
> >>> [jdelia@coyote ~]$ mpirun --version
> >>> mpirun (Open MPI) 1.10.0
> >>> [jdelia@coyote ~]$ hostname
> >>> coyote
> >>> [jdelia@coyote ~]$ ssh node1
> >>> [jdelia@node1 ~]$ mpirun --host coyote hostname
> >>> coyote
> >>> [jdelia@node1 ~]$ exit
> >>> logout
> >>> Connection to node1 closed.
> >>> [jdelia@coyote ~]$ mpirun --host node1 hostname
> >>> [node1:17861] [[8026,0],1] tcp_peer_send_blocking: send() to socket 9
> >>> failed: Broken pipe (32)
> >>> --------------------------------------------------------------------------
> >>> ORTE was unable to reliably start one or more daemons.
> >>> This usually is caused by:
> >>> ... snip ...
> >>> --------------------------------------------------------------------------
> >>> 
> >>> The PATH and LD_LIBRARY_PATH in coyote (server) and node1
> >>> were reduced to
> >>> 
> >>> [jdelia@coyote ]$ ssh coyote env | grep -i PATH
> >>> LD_LIBRARY_PATH=/usr/beta/openmpi/lib:/usr/beta/gcc-trunk/lib:/usr/beta/gcc-trunk/lib64:/usr/lib:/usr/lib64:/usr/local/lib:/usr/local/lib64
> >>> PATH=.:/usr/beta/openmpi/bin:/usr/beta/gcc-trunk/bin:/usr/lib64/ccache:/usr/bin:/usr/sbin/usr/local/bin:/usr/local/sbin
> >>> MODULEPATH=/usr/share/Modules/modulefiles:/etc/modulefiles
> >>> QT_PLUGIN_PATH=/usr/lib64/kde4/plugins:/usr/lib/kde4/plugins
> >>> 
> >>> [jdelia@coyote ]$ ssh node1  env | grep -i PATH
> >>> LD_LIBRARY_PATH=/usr/beta/openmpi/lib:/usr/beta/gcc-trunk/lib:/usr/beta/gcc-trunk/lib64:/usr/lib:/usr/lib64:/usr/local/lib:/usr/local/lib64
> >>> PATH=.:/usr/beta/openmpi/bin:/usr/beta/gcc-trunk/bin:/usr/lib64/ccache:/usr/bin:/usr/sbin/usr/local/bin:/usr/local/sbin
> >>> MODULEPATH=/usr/share/Modules/modulefiles:/etc/modulefiles
> >>> 
> >>> Until the 1.8.7 version these tests were all OK. Then, several
> >>> openmpi distributions were rebuilt using the gcc compilers,
> >>> both with the system version
> >>> 
> >>> gcc (GCC) 4.7.2 20120921 (Red Hat 4.7.2-2)
> >>> 
> >>> as with the experimental one
> >>> 
> >>> $ gcc --version
> >>> gcc (GCC) 6.0.0 20150919 (experimental)
> >>> 
> >>> but without luck. Again, if we go back to 1.8.7 version, and
> >>> using the same environment variables, all tests are OK.
> >>> 
> >>> Please, any clue in order to fix this trouble?
> >>> 
> >>> We try to attach the configure log files of the 1.8.7
> >>> and 1.8.10 versions using the beta gcc compiler.
> >>> 
> >>> Thanks in advance.
> >>> 
> >>> Regards,
> >>> Jorge.
> >>> --
> >>> CIMEC (UNL-CONICET), http://www.cimec.org.ar/
> >>> Predio CONICET-Santa Fe, Colec. Ruta Nac. 168,
> >>> Paraje El Pozo, S3000GLN, Santa Fe, ARGENTINA
> >>> Univ Nac Litoral (UNL). Cons Nac Inv Científ y Técn (CONICET)
> >>> logs.tgz>_______________________________________________
> >>> users mailing list
> >>> us...@open-mpi.org <mailto:us...@open-mpi.org>
> >>> Subscription: http://www.open-mpi.org/mailman/listinfo.cgi/users
> >>> <http://www.open-mpi.org/mailman/listinfo.cgi/users>
> >>> Link to this post:
> >>> http://www.open-mpi.org/community/lists/users/2015/09/27633.php
> >>> <http://www.open-mpi.org/community/lists/users/2015/09/27633.php>
> >> 
> >> _______________________________________________
> >> users mailing list
> >> us...@open-mpi.org <mailto:us...@open-mpi.org>
> >> Subscription: http://www.open-mpi.org/mailman/listinfo.cgi/users
> >> <http://www.open-mpi.org/mailman/listinfo.cgi/users>
> >> Link to this post:
> >> http://www.open-mpi.org/community/lists/users/2015/09/27636.php
> >> <http://www.open-mpi.org/community/lists/users/2015/09/27636.php>
> > <ifconfig-ompi-info-log.tgz>_______________________________________________
> > users mailing list
> > us...@open-mpi.org <mailto:us...@open-mpi.org>
> > Subscription: http://www.open-mpi.org/mailman/listinfo.cgi/users
> > <http://www.open-mpi.org/mailman/listinfo.cgi/users>
> > Link to this post:
> > http://www.open-mpi.org/community/lists/users/2015/09/27638.php
> > <http://www.open-mpi.org/community/lists/users/2015/09/27638.php>
> 
> _______________________________________________
> users mailing list
> us...@open-mpi.org
> Subscription: http://www.open-mpi.org/mailman/listinfo.cgi/users
> Link to this post:
> http://www.open-mpi.org/community/lists/users/2015/09/27641.php

Reply via email to