Okay, I found it - fix coming in a bit. Thanks! Ralph
On Mar 21, 2013, at 4:02 PM, tmish...@jcity.maeda.co.jp wrote: > > > Hi Ralph, > > Sorry for late reply. Here is my result. > > mpirun -v -np 8 -hostfile pbs_hosts -x OMP_NUM_THREADS --display-allocation > -mca ras_base_verbose 5 -mca rmaps_base_verb > ose 5 /home/mishima/Ducom/testbed/mPre m02-ld > [node04.cluster:28175] mca:base:select:( ras) Querying component > [loadleveler] > [node04.cluster:28175] [[29518,0],0] ras:loadleveler: NOT available for > selection > [node04.cluster:28175] mca:base:select:( ras) Skipping component > [loadleveler]. Query failed to return a module > [node04.cluster:28175] mca:base:select:( ras) Querying component > [simulator] > [node04.cluster:28175] mca:base:select:( ras) Skipping component > [simulator]. Query failed to return a module > [node04.cluster:28175] mca:base:select:( ras) Querying component [slurm] > [node04.cluster:28175] [[29518,0],0] ras:slurm: NOT available for selection > [node04.cluster:28175] mca:base:select:( ras) Skipping component [slurm]. > Query failed to return a module > [node04.cluster:28175] mca:base:select:( ras) Querying component [tm] > [node04.cluster:28175] mca:base:select:( ras) Query of component [tm] set > priority to 100 > [node04.cluster:28175] mca:base:select:( ras) Selected component [tm] > [node04.cluster:28175] mca:rmaps:select: checking available component ppr > [node04.cluster:28175] mca:rmaps:select: Querying component [ppr] > [node04.cluster:28175] mca:rmaps:select: checking available component > rank_file > [node04.cluster:28175] mca:rmaps:select: Querying component [rank_file] > [node04.cluster:28175] mca:rmaps:select: checking available component > resilient > [node04.cluster:28175] mca:rmaps:select: Querying component [resilient] > [node04.cluster:28175] mca:rmaps:select: checking available component > round_robin > [node04.cluster:28175] mca:rmaps:select: Querying component [round_robin] > [node04.cluster:28175] mca:rmaps:select: checking available component seq > [node04.cluster:28175] mca:rmaps:select: Querying component [seq] > [node04.cluster:28175] [[29518,0],0]: Final mapper priorities > [node04.cluster:28175] Mapper: ppr Priority: 90 > [node04.cluster:28175] Mapper: seq Priority: 60 > [node04.cluster:28175] Mapper: resilient Priority: 40 > [node04.cluster:28175] Mapper: round_robin Priority: 10 > [node04.cluster:28175] Mapper: rank_file Priority: 0 > [node04.cluster:28175] [[29518,0],0] ras:base:allocate > [node04.cluster:28175] [[29518,0],0] ras:tm:allocate:discover: got hostname > node04 > [node04.cluster:28175] [[29518,0],0] ras:tm:allocate:discover: not found -- > added to list > [node04.cluster:28175] [[29518,0],0] ras:tm:allocate:discover: got hostname > node04 > [node04.cluster:28175] [[29518,0],0] ras:tm:allocate:discover: found -- > bumped slots to 2 > [node04.cluster:28175] [[29518,0],0] ras:tm:allocate:discover: got hostname > node04 > [node04.cluster:28175] [[29518,0],0] ras:tm:allocate:discover: found -- > bumped slots to 3 > [node04.cluster:28175] [[29518,0],0] ras:tm:allocate:discover: got hostname > node04 > [node04.cluster:28175] [[29518,0],0] ras:tm:allocate:discover: found -- > bumped slots to 4 > [node04.cluster:28175] [[29518,0],0] ras:tm:allocate:discover: got hostname > node03 > [node04.cluster:28175] [[29518,0],0] ras:tm:allocate:discover: not found -- > added to list > [node04.cluster:28175] [[29518,0],0] ras:tm:allocate:discover: got hostname > node03 > [node04.cluster:28175] [[29518,0],0] ras:tm:allocate:discover: found -- > bumped slots to 2 > [node04.cluster:28175] [[29518,0],0] ras:tm:allocate:discover: got hostname > node03 > [node04.cluster:28175] [[29518,0],0] ras:tm:allocate:discover: found -- > bumped slots to 3 > [node04.cluster:28175] [[29518,0],0] ras:tm:allocate:discover: got hostname > node03 > [node04.cluster:28175] [[29518,0],0] ras:tm:allocate:discover: found -- > bumped slots to 4 > [node04.cluster:28175] [[29518,0],0] ras:base:node_insert inserting 2 nodes > [node04.cluster:28175] [[29518,0],0] ras:base:node_insert updating HNP info > to 4 slots > [node04.cluster:28175] [[29518,0],0] ras:base:node_insert node node03 > > ====================== ALLOCATED NODES ====================== > > Data for node: node04 Num slots: 4 Max slots: 0 > Data for node: node03 Num slots: 4 Max slots: 0 > > ================================================================= > [node04.cluster:28175] HOSTFILE: CHECKING FILE NODE node04 VS LIST NODE > node03 > -------------------------------------------------------------------------- > A hostfile was provided that contains at least one node not > present in the allocation: > > hostfile: pbs_hosts > node: node04 > > If you are operating in a resource-managed environment, then only > nodes that are in the allocation can be used in the hostfile. You > may find relative node syntax to be a useful alternative to > specifying absolute node names see the orte_hosts man page for > further information. > -------------------------------------------------------------------------- > > Regards, > Tetsuya Mishima > >> Hmmm...okay, let's try one more thing. Can you please add the following > to your command line: >> >> -mca ras_base_verbose 5 -mca rmaps_base_verbose 5 >> >> Appreciate your patience. For some reason, we are losing your head node > from the allocation when we start trying to map processes. I'm trying to > track down where this is happening so we can figure >> out why. >> >> >> On Mar 20, 2013, at 10:32 PM, tmish...@jcity.maeda.co.jp wrote: >> >>> >>> >>> Hi Ralph, >>> >>> Here is the result on patched openmpi-1.7rc8. >>> >>> mpirun -v -np 8 -hostfile pbs_hosts -x OMP_NUM_THREADS >>> --display-allocation /home/mishima/Ducom/testbed/mPre m02-ld >>> >>> ====================== ALLOCATED NODES ====================== >>> >>> Data for node: node06 Num slots: 4 Max slots: 0 >>> Data for node: node05 Num slots: 4 Max slots: 0 >>> >>> ================================================================= >>> [node06.cluster:21149] HOSTFILE: CHECKING FILE NODE node06 VS LIST NODE >>> node05 >>> > -------------------------------------------------------------------------- >>> A hostfile was provided that contains at least one node not >>> present in the allocation: >>> >>> hostfile: pbs_hosts >>> node: node06 >>> >>> If you are operating in a resource-managed environment, then only >>> nodes that are in the allocation can be used in the hostfile. You >>> may find relative node syntax to be a useful alternative to >>> specifying absolute node names see the orte_hosts man page for >>> further information. >>> > -------------------------------------------------------------------------- >>> >>> Regards, >>> Tetsuya >>> >>> _______________________________________________ >>> users mailing list >>> us...@open-mpi.org >>> http://www.open-mpi.org/mailman/listinfo.cgi/users >> >> >> _______________________________________________ >> users mailing list >> us...@open-mpi.org >> http://www.open-mpi.org/mailman/listinfo.cgi/users >> > > _______________________________________________ > users mailing list > us...@open-mpi.org > http://www.open-mpi.org/mailman/listinfo.cgi/users