I noticed that too. Though the submitting host for torque is a different host (main head node, "shrek"), "node21" is the host that torque runs the batch script (and the mpirun command) it being the first node in the "dualcore" resource group.
Adding option... It fixed the hostname in the allocation map, though had no effect on the outcome. The allocation is still simply ignored. =======8<--------CUT HERE---------- PBS Job Number 9000 PBS batch run on node21.emperor Time it was started 2017-10-03_14:11:20 Current Directory /net/shrek.emperor/home/shrek/anthony Submitted work dir /home/shrek/anthony/mpi-pbs Number of Nodes 5 Nodefile List /var/lib/torque/aux//9000.shrek.emperor node21.emperor node25.emperor node24.emperor node23.emperor node22.emperor --------------------------------------- ====================== ALLOCATED NODES ====================== node21.emperor: slots=1 max_slots=0 slots_inuse=0 state=UP node25.emperor: slots=1 max_slots=0 slots_inuse=0 state=UP node24.emperor: slots=1 max_slots=0 slots_inuse=0 state=UP node23.emperor: slots=1 max_slots=0 slots_inuse=0 state=UP node22.emperor: slots=1 max_slots=0 slots_inuse=0 state=UP ================================================================= node21.emperor node21.emperor node21.emperor node21.emperor node21.emperor =======8<--------CUT HERE---------- Anthony Thyssen ( System Programmer ) <a.thys...@griffith.edu.au> -------------------------------------------------------------------------- The equivalent of an armoured car should always be used to protect any secret kept in a cardboard box. -- Anthony Thyssen, On the use of Encryption -------------------------------------------------------------------------- On Tue, Oct 3, 2017 at 2:00 PM, r...@open-mpi.org <r...@open-mpi.org> wrote: > One thing I can see is that the local host (where mpirun executed) shows > as “node21” in the allocation, while all others show their FQDN. This might > be causing some confusion. > > You might try adding "--mca orte_keep_fqdn_hostnames 1” to your cmd line > and see if that helps. > > > On Oct 2, 2017, at 8:14 PM, Anthony Thyssen <a.thys...@griffith.edu.au> > wrote: > > Update... Problem of all processes runing on first node (oversubscribed > dual-core machine) is NOT resolved. > > Changing the mpirun command in the Torque batch script ("pbs_hello" - See > previous) to > > mpirun --nooversubscribe --display-allocation hostname > > Then submitting to PBS/Torque using > > qsub -l nodes=5:ppn=1:dualcore pbs_hello > > To run on 5 dual-core machines. Produces the following result... > > =======8<--------CUT HERE---------- > PBS Job Number 8996 > PBS batch run on node21.emperor > Time it was started 2017-10-03_13:04:07 > Current Directory /net/shrek.emperor/home/shrek/anthony > Submitted work dir /home/shrek/anthony/mpi-pbs > Number of Nodes 5 > Nodefile List /var/lib/torque/aux//8996.shrek.emperor > node21.emperor > node25.emperor > node24.emperor > node23.emperor > node22.emperor > --------------------------------------- > > ====================== ALLOCATED NODES ====================== > node21: slots=1 max_slots=0 slots_inuse=0 state=UP > node25.emperor: slots=1 max_slots=0 slots_inuse=0 state=UP > node24.emperor: slots=1 max_slots=0 slots_inuse=0 state=UP > node23.emperor: slots=1 max_slots=0 slots_inuse=0 state=UP > node22.emperor: slots=1 max_slots=0 slots_inuse=0 state=UP > ================================================================= > node21.emperor > node21.emperor > node21.emperor > node21.emperor > node21.emperor > =======8<--------CUT HERE---------- > > The $PBS_NODE file shows torque requesting 5 processes on 5 separate > machines. > > The mpirun command's "ALLOCATED NODES" shows it picked up the request > correctly from torque. > > But the "hostname" output still shows ALL processes were run on the first > node only! > > Even though I requested it not to over subscribe. > > > I am at a complete loss as to how to solve this problem.. > > ANY and all suggestions, or even ways I can get other information as to > what is causing this will be most welcome. > > > Anthony Thyssen ( System Programmer ) <a.thys...@griffith.edu.au> > ----------------------------------------------------------- > --------------- > Using encryption on the Internet is the equivalent of arranging > an armored car to deliver credit-card information from someone > living in a cardboard box to someone living on a park bench. > -- Gene Spafford > ----------------------------------------------------------- > --------------- > > > > _______________________________________________ > users mailing list > users@lists.open-mpi.org > https://lists.open-mpi.org/mailman/listinfo/users > > >
_______________________________________________ users mailing list users@lists.open-mpi.org https://lists.open-mpi.org/mailman/listinfo/users