Could you add --display-allocation to your cmd line? This will tell us if it found/read the default hostfile, or if the problem is with the mapper.
On Feb 1, 2012, at 7:58 AM, Reuti wrote: > Am 01.02.2012 um 15:38 schrieb Ralph Castain: > >> On Feb 1, 2012, at 3:49 AM, Reuti wrote: >> >>> Am 31.01.2012 um 21:25 schrieb Ralph Castain: >>> >>>> On Jan 31, 2012, at 12:58 PM, Reuti wrote: >>> >>> BTW: is there any default for a hostfile for Open MPI - I mean any in my >>> home directory or /etc? When I check `man orte_hosts`, and all possible >>> optiions are unset (like in a singleton run), it will only run local (Job >>> is co-located with mpirun). >> >> Yep - it is <prefix>/etc/openmpi-default-hostfile > > Thx for replying Ralph. > > I spotted it too, but this is not working for me. Neither for mpiexec from > the command line, nor any singleton. I also tried a plain /etc as location of > this file as well. > > reuti@pc15370:~> which mpicc > /home/reuti/local/openmpi-1.4.4-thread/bin/mpicc > reuti@pc15370:~> cat > /home/reuti/local/openmpi-1.4.4-thread/etc/openmpi-default-hostfile > pc15370 slots=2 > pc15381 slots=2 > reuti@pc15370:~> mpicc -o mpihello mpihello.c > reuti@pc15370:~> mpiexec -np 4 ./mpihello > Hello World from Node 0. > Hello World from Node 1. > Hello World from Node 2. > Hello World from Node 3. > > But all is local (no spawn here, traditional mpihello): > > 19503 ? Ss 0:00 /usr/sbin/sshd -o PidFile=/var/run/sshd.init.pid > 11583 ? Ss 0:00 \_ sshd: reuti [priv] > > 11585 ? S 0:00 | \_ sshd: reuti@pts/6 > > 11587 pts/6 Ss 0:00 | \_ -bash > 13470 pts/6 S+ 0:00 | \_ mpiexec -np 4 ./mpihello > 13471 pts/6 R+ 0:00 | \_ ./mpihello > 13472 pts/6 R+ 0:00 | \_ ./mpihello > 13473 pts/6 R+ 0:00 | \_ ./mpihello > 13474 pts/6 R+ 0:00 | \_ ./mpihello > > -- Reuti > > >>>> We probably aren't correctly marking the original singleton on that node, >>>> and so the mapper thinks there are still two slots available on the >>>> original node. >>> >>> Okay. There is something to discuss/fix. BTW: if started as singleton I get >>> an error at the end with the program the OP provided: >>> >>> [pc15381:25502] [[12435,0],1] routed:binomial: Connection to lifeline >>> [[12435,0],0] lost >> >> Okay, I'll take a look at it - but it may take awhile before I can address >> either issue as other priorities loom. >> >>> >>> It's not the case if run by mpiexec. >>> >>> -- Reuti >>> _______________________________________________ >>> users mailing list >>> us...@open-mpi.org >>> http://www.open-mpi.org/mailman/listinfo.cgi/users >> >> _______________________________________________ >> users mailing list >> us...@open-mpi.org >> http://www.open-mpi.org/mailman/listinfo.cgi/users > > > _______________________________________________ > users mailing list > us...@open-mpi.org > http://www.open-mpi.org/mailman/listinfo.cgi/users