Re: [OMPI users] [sge::tight-integration] slot scheduling and resources handling

Reuti Tue, 25 May 2010 05:33:06 -0400

Hi,

Am 25.05.2010 um 09:14 schrieb Eloi Gaudry:


> I do no reset any environment variable during job submission or job handling.
> Is there a simple way to check that openmpi is working as expected with SGE 
> tight integration (as displaying environment variables, setting options on 
> the 
> command line, etc. ) ?

a) put a command:

env

in the jobscript and check the output for $JOB_ID and various $SGE_* variables.

b) to confirm the misbehavior: are the tasks on the slave nodes kids of 
sge_shepherd or any system sshd/rshd?

-- Reuti


> Regards,
> Eloi
> 
> 
> On Friday 21 May 2010 17:35:24 Reuti wrote:
>> Hi,
>> 
>> Am 21.05.2010 um 17:19 schrieb Eloi Gaudry:
>>> Hi Reuti,
>>> 
>>> Yes, the openmpi binaries used were build after having used the
>>> --with-sge during configure, and we only use those binaries on our
>>> cluster.
>>> 
>>> [eg@moe:~]$ /opt/openmpi-1.3.3/bin/ompi_info
>>> 
>>>                MCA ras: gridengine (MCA v2.0, API v2.0, Component
>>>                v1.3.3)
>> 
>> ok. As you have a Tight Integration as goal and set in your PE
>> "control_slaves TRUE", SGE wouldn't allow `qrsh -inherit ...` to nodes
>> which are not in the list of granted nodes. So it looks, like your job is
>> running outside of this Tight Integration with its own `rsh`or `ssh`.
>> 
>> Do you reset $JOB_ID or other environment variables in your jobscript,
>> which could trigger Open MPI to assume that it's not running inside SGE?
>> 
>> -- Reuti
>> 
>>> On Friday 21 May 2010 16:01:54 Reuti wrote:
>>>> Hi,
>>>> 
>>>> Am 21.05.2010 um 14:11 schrieb Eloi Gaudry:
>>>>> Hi there,
>>>>> 
>>>>> I'm observing something strange on our cluster managed by SGE6.2u4 when
>>>>> launching a parallel computation on several nodes, using OpenMPI/SGE
>>>>> tight- integration mode (OpenMPI-1.3.3). It seems that the SGE
>>>>> allocated slots are not used by OpenMPI, as if OpenMPI was doing is
>>>>> own
>>>>> round-robin allocation based on the allocated node hostnames.
>>>> 
>>>> you compiled Open MPI with --with-sge (and recompiled your
>>>> applications)? You are using the correct mpiexec?
>>>> 
>>>> -- Reuti
>>>> 
>>>>> Here is what I'm doing:
>>>>> - launch a parallel computation involving 8 processors, using for each
>>>>> of them 14GB of memory. I'm using a qsub command where i request
>>>>> memory_free resource and use tight integration with openmpi
>>>>> - 3 servers are available:
>>>>> . barney with 4 cores (4 slots) and 32GB
>>>>> . carl with 4 cores (4 slots) and 32GB
>>>>> . charlie with 8 cores (8 slots) and 64GB
>>>>> 
>>>>> Here is the output of the allocated nodes (OpenMPI output):
>>>>> ======================   ALLOCATED NODES   ======================
>>>>> 
>>>>> Data for node: Name: charlie   Launch id: -1 Arch: ffc91200  State: 2
>>>>> 
>>>>> Daemon: [[44332,0],0] Daemon launched: True
>>>>> Num slots: 4  Slots in use: 0
>>>>> Num slots allocated: 4  Max slots: 0
>>>>> Username on node: NULL
>>>>> Num procs: 0  Next node_rank: 0
>>>>> 
>>>>> Data for node: Name: carl.fft    Launch id: -1 Arch: 0 State: 2
>>>>> 
>>>>> Daemon: Not defined Daemon launched: False
>>>>> Num slots: 2  Slots in use: 0
>>>>> Num slots allocated: 2  Max slots: 0
>>>>> Username on node: NULL
>>>>> Num procs: 0  Next node_rank: 0
>>>>> 
>>>>> Data for node: Name: barney.fft    Launch id: -1 Arch: 0 State: 2
>>>>> 
>>>>> Daemon: Not defined Daemon launched: False
>>>>> Num slots: 2  Slots in use: 0
>>>>> Num slots allocated: 2  Max slots: 0
>>>>> Username on node: NULL
>>>>> Num procs: 0  Next node_rank: 0
>>>>> 
>>>>> =================================================================
>>>>> 
>>>>> Here is what I see when my computation is running on the cluster:
>>>>> #     rank       pid          hostname
>>>>> 
>>>>>       0     28112          charlie
>>>>>       1     11417          carl
>>>>>       2     11808          barney
>>>>>       3     28113          charlie
>>>>>       4     11418          carl
>>>>>       5     11809          barney
>>>>>       6     28114          charlie
>>>>>       7     11419          carl
>>>>> 
>>>>> Note that -the parallel environment used under SGE is defined as:
>>>>> [eg@moe:~]$ qconf -sp round_robin
>>>>> pe_name            round_robin
>>>>> slots              32
>>>>> user_lists         NONE
>>>>> xuser_lists        NONE
>>>>> start_proc_args    /bin/true
>>>>> stop_proc_args     /bin/true
>>>>> allocation_rule    $round_robin
>>>>> control_slaves     TRUE
>>>>> job_is_first_task  FALSE
>>>>> urgency_slots      min
>>>>> accounting_summary FALSE
>>>>> 
>>>>> I'm wondering why OpenMPI didn't use the allocated nodes chosen by SGE
>>>>> (cf. "ALLOCATED NODES" report) but instead allocate each job of the
>>>>> parallel computation at a time, using a round-robin method.
>>>>> 
>>>>> Note that I'm using the '--bynode' option in the orterun command line.
>>>>> If the behavior I'm observing is simply the consequence of using this
>>>>> option, please let me know. This would eventually mean that one need
>>>>> to state that SGE tight- integration has a lower priority on orterun
>>>>> behavior than the different command line options.
>>>>> 
>>>>> Any help would be appreciated,
>>>>> Thanks,
>>>>> Eloi
>>>> 
>>>> _______________________________________________
>>>> users mailing list
>>>> us...@open-mpi.org
>>>> http://www.open-mpi.org/mailman/listinfo.cgi/users
> 
> -- 
> 
> 
> Eloi Gaudry
> 
> Free Field Technologies
> Axis Park Louvain-la-Neuve
> Rue Emile Francqui, 1
> B-1435 Mont-Saint Guibert
> BELGIUM
> 
> Company Phone: +32 10 487 959
> Company Fax:   +32 10 454 626

Re: [OMPI users] [sge::tight-integration] slot scheduling and resources handling

Reply via email to