Anthony,

we had a similar issue reported some times ago (e.g. Open MPI ignores torque allocation),

and after quite some troubleshooting, we ended up with the same behavior (e.g. pbsdsh is not working as expected).

see https://www.mail-archive.com/users@lists.open-mpi.org/msg29952.html for the last email.


from an Open MPI point of view, i would consider the root cause is with your torque install.

this case was reported at http://www.clusterresources.com/pipermail/torqueusers/2016-September/018858.html

and no conclusion was reached.


Cheers,


Gilles


On 10/3/2017 2:02 PM, Anthony Thyssen wrote:
The stdin and stdout are saved to separate channels.

It is interesting that the output from pbsdsh is node21.emperor 5 times, even though $PBS_NODES is the 5 individual nodes.

Attached are the two compressed files, as well as the pbs_hello batch used.

Anthony Thyssen ( System Programmer )    <a.thys...@griffith.edu.au <mailto:a.thys...@griffith.edu.au>>
 --------------------------------------------------------------------------
  There are two types of encryption:
    One that will prevent your sister from reading your diary, and
    One that will prevent your government.           -- Bruce Schneier
 --------------------------------------------------------------------------




On Tue, Oct 3, 2017 at 2:39 PM, Gilles Gouaillardet <gil...@rist.or.jp <mailto:gil...@rist.or.jp>> wrote:

    Anthony,


    in your script, can you


    set -x

    env

    pbsdsh hostname

    mpirun --display-map --display-allocation --mca ess_base_verbose
    10 --mca plm_base_verbose 10 --mca ras_base_verbose 10 hostname


    and then compress and send the output ?


    Cheers,


    Gilles


    On 10/3/2017 1:19 PM, Anthony Thyssen wrote:

        I noticed that too.  Though the submitting host for torque is
        a different host (main head node, "shrek"),  "node21" is the
        host that torque runs the batch script (and the mpirun
        command) it being the first node in the "dualcore" resource group.

        Adding option...

        It fixed the hostname in the allocation map, though had no
        effect on the outcome.  The allocation is still simply ignored.

        =======8<--------CUT HERE----------
        PBS Job Number       9000
        PBS batch run on     node21.emperor
        Time it was started  2017-10-03_14:11:20
        Current Directory    /net/shrek.emperor/home/shrek/anthony
        Submitted work dir   /home/shrek/anthony/mpi-pbs
        Number of Nodes      5
        Nodefile List       /var/lib/torque/aux//9000.shrek.emperor
        node21.emperor
        node25.emperor
        node24.emperor
        node23.emperor
        node22.emperor
        ---------------------------------------

        ======================  ALLOCATED NODES  ======================
        node21.emperor: slots=1 max_slots=0 slots_inuse=0 state=UP
        node25.emperor: slots=1 max_slots=0 slots_inuse=0 state=UP
        node24.emperor: slots=1 max_slots=0 slots_inuse=0 state=UP
        node23.emperor: slots=1 max_slots=0 slots_inuse=0 state=UP
        node22.emperor: slots=1 max_slots=0 slots_inuse=0 state=UP
        =================================================================
        node21.emperor
        node21.emperor
        node21.emperor
        node21.emperor
        node21.emperor
        =======8<--------CUT HERE----------


          Anthony Thyssen ( System Programmer )   
        <a.thys...@griffith.edu.au <mailto:a.thys...@griffith.edu.au>
        <mailto:a.thys...@griffith.edu.au
        <mailto:a.thys...@griffith.edu.au>>>
         
--------------------------------------------------------------------------
           The equivalent of an armoured car should always be used to
           protect any secret kept in a cardboard box.
           -- Anthony Thyssen, On the use of Encryption
         
--------------------------------------------------------------------------




        _______________________________________________
        users mailing list
        users@lists.open-mpi.org <mailto:users@lists.open-mpi.org>
        https://lists.open-mpi.org/mailman/listinfo/users
        <https://lists.open-mpi.org/mailman/listinfo/users>


    _______________________________________________
    users mailing list
    users@lists.open-mpi.org <mailto:users@lists.open-mpi.org>
    https://lists.open-mpi.org/mailman/listinfo/users
    <https://lists.open-mpi.org/mailman/listinfo/users>




_______________________________________________
users mailing list
users@lists.open-mpi.org
https://lists.open-mpi.org/mailman/listinfo/users

_______________________________________________
users mailing list
users@lists.open-mpi.org
https://lists.open-mpi.org/mailman/listinfo/users

Reply via email to