Thank you Gilles.  At least I now have something to follow though with.

As a FYI, the torque is the pre-built version from the Redhat Extras (EPEL)
archive.
torque-4.2.10-10.el7.x86_64

Normally pre-build packages have no problems, but in this case.




On Tue, Oct 3, 2017 at 3:39 PM, Gilles Gouaillardet <gil...@rist.or.jp>
wrote:

> Anthony,
>
>
> we had a similar issue reported some times ago (e.g. Open MPI ignores
> torque allocation),
>
> and after quite some troubleshooting, we ended up with the same behavior
> (e.g. pbsdsh is not working as expected).
>
> see https://www.mail-archive.com/users@lists.open-mpi.org/msg29952.html
> for the last email.
>
>
> from an Open MPI point of view, i would consider the root cause is with
> your torque install.
>
> this case was reported at http://www.clusterresources.co
> m/pipermail/torqueusers/2016-September/018858.html
>
> and no conclusion was reached.
>
>
> Cheers,
>
>
> Gilles
>
>
> On 10/3/2017 2:02 PM, Anthony Thyssen wrote:
>
>> The stdin and stdout are saved to separate channels.
>>
>> It is interesting that the output from pbsdsh is node21.emperor 5 times,
>> even though $PBS_NODES is the 5 individual nodes.
>>
>> Attached are the two compressed files, as well as the pbs_hello batch
>> used.
>>
>> Anthony Thyssen ( System Programmer )    <a.thys...@griffith.edu.au
>> <mailto:a.thys...@griffith.edu.au>>
>>  -----------------------------------------------------------
>> ---------------
>>   There are two types of encryption:
>>     One that will prevent your sister from reading your diary, and
>>     One that will prevent your government.           -- Bruce Schneier
>>  -----------------------------------------------------------
>> ---------------
>>
>>
>>
>>
>> On Tue, Oct 3, 2017 at 2:39 PM, Gilles Gouaillardet <gil...@rist.or.jp
>> <mailto:gil...@rist.or.jp>> wrote:
>>
>>     Anthony,
>>
>>
>>     in your script, can you
>>
>>
>>     set -x
>>
>>     env
>>
>>     pbsdsh hostname
>>
>>     mpirun --display-map --display-allocation --mca ess_base_verbose
>>     10 --mca plm_base_verbose 10 --mca ras_base_verbose 10 hostname
>>
>>
>>     and then compress and send the output ?
>>
>>
>>     Cheers,
>>
>>
>>     Gilles
>>
>>
>>     On 10/3/2017 1:19 PM, Anthony Thyssen wrote:
>>
>>         I noticed that too.  Though the submitting host for torque is
>>         a different host (main head node, "shrek"),  "node21" is the
>>         host that torque runs the batch script (and the mpirun
>>         command) it being the first node in the "dualcore" resource group.
>>
>>         Adding option...
>>
>>         It fixed the hostname in the allocation map, though had no
>>         effect on the outcome.  The allocation is still simply ignored.
>>
>>         =======8<--------CUT HERE----------
>>         PBS Job Number       9000
>>         PBS batch run on     node21.emperor
>>         Time it was started  2017-10-03_14:11:20
>>         Current Directory    /net/shrek.emperor/home/shrek/anthony
>>         Submitted work dir   /home/shrek/anthony/mpi-pbs
>>         Number of Nodes      5
>>         Nodefile List       /var/lib/torque/aux//9000.shrek.emperor
>>         node21.emperor
>>         node25.emperor
>>         node24.emperor
>>         node23.emperor
>>         node22.emperor
>>         ---------------------------------------
>>
>>         ======================  ALLOCATED NODES  ======================
>>         node21.emperor: slots=1 max_slots=0 slots_inuse=0 state=UP
>>         node25.emperor: slots=1 max_slots=0 slots_inuse=0 state=UP
>>         node24.emperor: slots=1 max_slots=0 slots_inuse=0 state=UP
>>         node23.emperor: slots=1 max_slots=0 slots_inuse=0 state=UP
>>         node22.emperor: slots=1 max_slots=0 slots_inuse=0 state=UP
>>         =================================================================
>>         node21.emperor
>>         node21.emperor
>>         node21.emperor
>>         node21.emperor
>>         node21.emperor
>>         =======8<--------CUT HERE----------
>>
>>
>>           Anthony Thyssen ( System Programmer )
>>         <a.thys...@griffith.edu.au <mailto:a.thys...@griffith.edu.au>
>>         <mailto:a.thys...@griffith.edu.au
>>         <mailto:a.thys...@griffith.edu.au>>>
>>          -----------------------------------------------------------
>> ---------------
>>            The equivalent of an armoured car should always be used to
>>            protect any secret kept in a cardboard box.
>>            -- Anthony Thyssen, On the use of Encryption
>>          -----------------------------------------------------------
>> ---------------
>>
>>
>>
>>
>>         _______________________________________________
>>         users mailing list
>>         users@lists.open-mpi.org <mailto:users@lists.open-mpi.org>
>>         https://lists.open-mpi.org/mailman/listinfo/users
>>         <https://lists.open-mpi.org/mailman/listinfo/users>
>>
>>
>>     _______________________________________________
>>     users mailing list
>>     users@lists.open-mpi.org <mailto:users@lists.open-mpi.org>
>>     https://lists.open-mpi.org/mailman/listinfo/users
>>     <https://lists.open-mpi.org/mailman/listinfo/users>
>>
>>
>>
>>
>> _______________________________________________
>> users mailing list
>> users@lists.open-mpi.org
>> https://lists.open-mpi.org/mailman/listinfo/users
>>
>
> _______________________________________________
> users mailing list
> users@lists.open-mpi.org
> https://lists.open-mpi.org/mailman/listinfo/users
>
_______________________________________________
users mailing list
users@lists.open-mpi.org
https://lists.open-mpi.org/mailman/listinfo/users

Reply via email to