Thank you Gilles. At least I now have something to follow though with. As a FYI, the torque is the pre-built version from the Redhat Extras (EPEL) archive. torque-4.2.10-10.el7.x86_64
Normally pre-build packages have no problems, but in this case. On Tue, Oct 3, 2017 at 3:39 PM, Gilles Gouaillardet <gil...@rist.or.jp> wrote: > Anthony, > > > we had a similar issue reported some times ago (e.g. Open MPI ignores > torque allocation), > > and after quite some troubleshooting, we ended up with the same behavior > (e.g. pbsdsh is not working as expected). > > see https://www.mail-archive.com/users@lists.open-mpi.org/msg29952.html > for the last email. > > > from an Open MPI point of view, i would consider the root cause is with > your torque install. > > this case was reported at http://www.clusterresources.co > m/pipermail/torqueusers/2016-September/018858.html > > and no conclusion was reached. > > > Cheers, > > > Gilles > > > On 10/3/2017 2:02 PM, Anthony Thyssen wrote: > >> The stdin and stdout are saved to separate channels. >> >> It is interesting that the output from pbsdsh is node21.emperor 5 times, >> even though $PBS_NODES is the 5 individual nodes. >> >> Attached are the two compressed files, as well as the pbs_hello batch >> used. >> >> Anthony Thyssen ( System Programmer ) <a.thys...@griffith.edu.au >> <mailto:a.thys...@griffith.edu.au>> >> ----------------------------------------------------------- >> --------------- >> There are two types of encryption: >> One that will prevent your sister from reading your diary, and >> One that will prevent your government. -- Bruce Schneier >> ----------------------------------------------------------- >> --------------- >> >> >> >> >> On Tue, Oct 3, 2017 at 2:39 PM, Gilles Gouaillardet <gil...@rist.or.jp >> <mailto:gil...@rist.or.jp>> wrote: >> >> Anthony, >> >> >> in your script, can you >> >> >> set -x >> >> env >> >> pbsdsh hostname >> >> mpirun --display-map --display-allocation --mca ess_base_verbose >> 10 --mca plm_base_verbose 10 --mca ras_base_verbose 10 hostname >> >> >> and then compress and send the output ? >> >> >> Cheers, >> >> >> Gilles >> >> >> On 10/3/2017 1:19 PM, Anthony Thyssen wrote: >> >> I noticed that too. Though the submitting host for torque is >> a different host (main head node, "shrek"), "node21" is the >> host that torque runs the batch script (and the mpirun >> command) it being the first node in the "dualcore" resource group. >> >> Adding option... >> >> It fixed the hostname in the allocation map, though had no >> effect on the outcome. The allocation is still simply ignored. >> >> =======8<--------CUT HERE---------- >> PBS Job Number 9000 >> PBS batch run on node21.emperor >> Time it was started 2017-10-03_14:11:20 >> Current Directory /net/shrek.emperor/home/shrek/anthony >> Submitted work dir /home/shrek/anthony/mpi-pbs >> Number of Nodes 5 >> Nodefile List /var/lib/torque/aux//9000.shrek.emperor >> node21.emperor >> node25.emperor >> node24.emperor >> node23.emperor >> node22.emperor >> --------------------------------------- >> >> ====================== ALLOCATED NODES ====================== >> node21.emperor: slots=1 max_slots=0 slots_inuse=0 state=UP >> node25.emperor: slots=1 max_slots=0 slots_inuse=0 state=UP >> node24.emperor: slots=1 max_slots=0 slots_inuse=0 state=UP >> node23.emperor: slots=1 max_slots=0 slots_inuse=0 state=UP >> node22.emperor: slots=1 max_slots=0 slots_inuse=0 state=UP >> ================================================================= >> node21.emperor >> node21.emperor >> node21.emperor >> node21.emperor >> node21.emperor >> =======8<--------CUT HERE---------- >> >> >> Anthony Thyssen ( System Programmer ) >> <a.thys...@griffith.edu.au <mailto:a.thys...@griffith.edu.au> >> <mailto:a.thys...@griffith.edu.au >> <mailto:a.thys...@griffith.edu.au>>> >> ----------------------------------------------------------- >> --------------- >> The equivalent of an armoured car should always be used to >> protect any secret kept in a cardboard box. >> -- Anthony Thyssen, On the use of Encryption >> ----------------------------------------------------------- >> --------------- >> >> >> >> >> _______________________________________________ >> users mailing list >> users@lists.open-mpi.org <mailto:users@lists.open-mpi.org> >> https://lists.open-mpi.org/mailman/listinfo/users >> <https://lists.open-mpi.org/mailman/listinfo/users> >> >> >> _______________________________________________ >> users mailing list >> users@lists.open-mpi.org <mailto:users@lists.open-mpi.org> >> https://lists.open-mpi.org/mailman/listinfo/users >> <https://lists.open-mpi.org/mailman/listinfo/users> >> >> >> >> >> _______________________________________________ >> users mailing list >> users@lists.open-mpi.org >> https://lists.open-mpi.org/mailman/listinfo/users >> > > _______________________________________________ > users mailing list > users@lists.open-mpi.org > https://lists.open-mpi.org/mailman/listinfo/users >
_______________________________________________ users mailing list users@lists.open-mpi.org https://lists.open-mpi.org/mailman/listinfo/users