I don't see a hostfile on your command line - so I assume you are using a default hostfile? What is in it?
On Jun 19, 2013, at 1:49 AM, Sergio Maffioletti <sergio.maffiole...@gmail.com> wrote: > Hello, > > we have been hit observing a strange behavior with OpenMPI 1.6.3 > > strace -f /share/apps/openmpi/1.6.3/bin/mpiexec -n 2 > --nooversubscribe --display-allocation --display-map --tag-output > /share/apps/gamess/2011R1/gamess.2011R1.x > /state/partition1/rmurri/29515/exam01.F05 -scr > /state/partition1/rmurri/29515 > > ====================== ALLOCATED NODES ====================== > > Data for node: nh64-1-17.local Num slots: 0 Max slots: 0 > Data for node: nh64-1-17 Num slots: 2 Max slots: 0 > > ================================================================= > > ======================== JOB MAP ======================== > > Data for node: nh64-1-17 Num procs: 2 > Process OMPI jobid: [37108,1] Process rank: 0 > Process OMPI jobid: [37108,1] Process rank: 1 > > ============================================================= > > As you can see, the host file lists the *unqualified* local host name; > OpenMPI fails to recognize that as the same host where it is running, > and uses `ssh` to spawn a remote `orted`, as use of `strace -f` shows: > > Process 16552 attached > [pid 16552] execve("//usr/bin/ssh", ["/usr/bin/ssh", "-x", > "nh64-1-17", "OPAL_PREFIX=/share/apps/openmpi/1.6.3 ; export > OPAL_PREFIX; PATH=/share/apps/openmpi/1.6.3/bin:$PATH ; export PATH ; > LD_LIBRARY_PATH=/share/apps/openmpi/1.6.3/lib:$LD_LIBRARY_PATH ; > export LD_LIBRARY_PATH ; > DYLD_LIBRARY_PATH=/share/apps/openmpi/1.6.3/lib:$", "--daemonize", > "-mca", "ess", "env", "-mca", "orte_ess_jobid", "2431909888", "-mca", > "orte_ess_vpid", "1", "-mca", "orte_ess_num_procs", "2", "--hnp-uri", > "\"2431909888.0;tcp://10.1.255.237:33154\"", "-mca", "plm", "rsh"], > ["OLI235=/state/partition1/rmurri/29515/exam01.F235", ... > > If the machine file lists the FQDNs instead, `mpiexec` spawns the jobs > directly via fork()/exec(). > > This seems related to the fact that each compute node advertises > 127.0.1.1 as the IP address associated to its hostname: > > $ ssh nh64-1-17 getent hosts nh64-1-17 > 127.0.1.1 nh64-1-17.local nh64-1-17 > > Indeed, if I change /etc/hosts so that a compute node associates a > "real" IP with its hostname, `mpiexec` works as expected. > > Is this a known feature/bug/easter egg? > > For the record: using OpenMPI 1.6.3 on Rocks 5.2. > > Thanks, > on behalf of the GC3 Team > Sergio :) > > GC3: Grid Computing Competence Center > http://www.gc3.uzh.ch/ > University of Zurich > Winterthurerstrasse 190 > CH-8057 Zurich Switzerland > Tel: +41 44 635 4222 > Fax: +41 44 635 6888 > _______________________________________________ > users mailing list > us...@open-mpi.org > http://www.open-mpi.org/mailman/listinfo.cgi/users