FWIW: I have fixed this on the developer's trunk, and Jeff has scheduled it for release in the upcoming 1.6 release (when 1.5 series rolls over). I don't expect we'll backport it to 1.4 unless someone really needs it there.
Thanks! Ralph On Feb 1, 2012, at 9:31 AM, Ralph Castain wrote: > Ah - crud. Looks like the default-hostfile mca param isn't getting set to the > default value. Will resolve - thanks! > > On Feb 1, 2012, at 9:28 AM, Reuti wrote: > >> Am 01.02.2012 um 17:16 schrieb Ralph Castain: >> >>> Could you add --display-allocation to your cmd line? This will tell us if >>> it found/read the default hostfile, or if the problem is with the mapper. >> >> Sure: >> >> reuti@pc15370:~> mpiexec --display-allocation -np 4 ./mpihello >> >> ====================== ALLOCATED NODES ====================== >> >> Data for node: Name: pc15370 Num slots: 1 Max slots: 0 >> >> ================================================================= >> Hello World from Node 0. >> Hello World from Node 1. >> Hello World from Node 2. >> Hello World from Node 3. >> >> (Nothing in `strace` about accessing someting with "default") >> >> >> reuti@pc15370:~> mpiexec --default-hostfile >> local/openmpi-1.4.4-thread/etc/openmpi-default-hostfile --display-allocation >> -np 4 ./mpihello >> >> ====================== ALLOCATED NODES ====================== >> >> Data for node: Name: pc15370 Num slots: 2 Max slots: 0 >> Data for node: Name: pc15381 Num slots: 2 Max slots: 0 >> >> ================================================================= >> Hello World from Node 0. >> Hello World from Node 3. >> Hello World from Node 2. >> Hello World from Node 1. >> >> Specifying it works fine with correct distribution in `ps`. >> >> -- Reuti >> >> >>> On Feb 1, 2012, at 7:58 AM, Reuti wrote: >>> >>>> Am 01.02.2012 um 15:38 schrieb Ralph Castain: >>>> >>>>> On Feb 1, 2012, at 3:49 AM, Reuti wrote: >>>>> >>>>>> Am 31.01.2012 um 21:25 schrieb Ralph Castain: >>>>>> >>>>>>> On Jan 31, 2012, at 12:58 PM, Reuti wrote: >>>>>> >>>>>> BTW: is there any default for a hostfile for Open MPI - I mean any in my >>>>>> home directory or /etc? When I check `man orte_hosts`, and all possible >>>>>> optiions are unset (like in a singleton run), it will only run local >>>>>> (Job is co-located with mpirun). >>>>> >>>>> Yep - it is <prefix>/etc/openmpi-default-hostfile >>>> >>>> Thx for replying Ralph. >>>> >>>> I spotted it too, but this is not working for me. Neither for mpiexec from >>>> the command line, nor any singleton. I also tried a plain /etc as location >>>> of this file as well. >>>> >>>> reuti@pc15370:~> which mpicc >>>> /home/reuti/local/openmpi-1.4.4-thread/bin/mpicc >>>> reuti@pc15370:~> cat >>>> /home/reuti/local/openmpi-1.4.4-thread/etc/openmpi-default-hostfile >>>> pc15370 slots=2 >>>> pc15381 slots=2 >>>> reuti@pc15370:~> mpicc -o mpihello mpihello.c >>>> reuti@pc15370:~> mpiexec -np 4 ./mpihello >>>> Hello World from Node 0. >>>> Hello World from Node 1. >>>> Hello World from Node 2. >>>> Hello World from Node 3. >>>> >>>> But all is local (no spawn here, traditional mpihello): >>>> >>>> 19503 ? Ss 0:00 /usr/sbin/sshd -o PidFile=/var/run/sshd.init.pid >>>> 11583 ? Ss 0:00 \_ sshd: reuti [priv] >>>> >>>> 11585 ? S 0:00 | \_ sshd: reuti@pts/6 >>>> >>>> 11587 pts/6 Ss 0:00 | \_ -bash >>>> 13470 pts/6 S+ 0:00 | \_ mpiexec -np 4 ./mpihello >>>> 13471 pts/6 R+ 0:00 | \_ ./mpihello >>>> 13472 pts/6 R+ 0:00 | \_ ./mpihello >>>> 13473 pts/6 R+ 0:00 | \_ ./mpihello >>>> 13474 pts/6 R+ 0:00 | \_ ./mpihello >>>> >>>> -- Reuti >>>> >>>> >>>>>>> We probably aren't correctly marking the original singleton on that >>>>>>> node, and so the mapper thinks there are still two slots available on >>>>>>> the original node. >>>>>> >>>>>> Okay. There is something to discuss/fix. BTW: if started as singleton I >>>>>> get an error at the end with the program the OP provided: >>>>>> >>>>>> [pc15381:25502] [[12435,0],1] routed:binomial: Connection to lifeline >>>>>> [[12435,0],0] lost >>>>> >>>>> Okay, I'll take a look at it - but it may take awhile before I can >>>>> address either issue as other priorities loom. >>>>> >>>>>> >>>>>> It's not the case if run by mpiexec. >>>>>> >>>>>> -- Reuti >>>>>> _______________________________________________ >>>>>> users mailing list >>>>>> us...@open-mpi.org >>>>>> http://www.open-mpi.org/mailman/listinfo.cgi/users >>>>> >>>>> _______________________________________________ >>>>> users mailing list >>>>> us...@open-mpi.org >>>>> http://www.open-mpi.org/mailman/listinfo.cgi/users >>>> >>>> >>>> _______________________________________________ >>>> users mailing list >>>> us...@open-mpi.org >>>> http://www.open-mpi.org/mailman/listinfo.cgi/users >>> >>> >>> _______________________________________________ >>> users mailing list >>> us...@open-mpi.org >>> http://www.open-mpi.org/mailman/listinfo.cgi/users >> >> >> _______________________________________________ >> users mailing list >> us...@open-mpi.org >> http://www.open-mpi.org/mailman/listinfo.cgi/users >