Re: [OMPI users] sge tight intregration leads to bad allocation

Eloi Gaudry Tue, 3 Apr 2012 11:24:41 -0400

-----Original Message-----
From: users-boun...@open-mpi.org [mailto:users-boun...@open-mpi.org] On Behalf 
Of Reuti
Sent: mardi 3 avril 2012 17:13
To: Open MPI Users
Subject: Re: [OMPI users] sge tight intregration leads to bad allocation


Am 03.04.2012 um 16:59 schrieb Eloi Gaudry:

> Hi Reuti,
> 
> I configured OpenMPI to support SGE tight integration and used the defined 
> below PE for submitting the job:
> 
> [16:36][eg@moe:~]$ qconf -sp fill_up
> pe_name            fill_up
> slots              80
> user_lists         NONE
> xuser_lists        NONE
> start_proc_args    /bin/true
> stop_proc_args     /bin/true
> allocation_rule    $fill_up

It should fill a host completely before moving to the next one with this 
definition.
[eg: ] yes, and it should also make sure that all hard requirements are met. 
Note that the allocation done by sge is correct here, this is what is finally 
done by openmpi at startup that is different (and incorrect).


> control_slaves     TRUE
> job_is_first_task  FALSE
> urgency_slots      min
> accounting_summary FALSE
> 
> Here are the allocation info retrieved from `qstat -g t` for the related job:

For me the output of `qstat -g t` shows MASTER and SLAVE entries but no 
variables. Is there any wrapper defined for `qstat` to reformat the output (or 
a ~/.sge_qstat defined)?

[eg: ] sorry, i forgot about sge_qstat being defined. As I don't have any slot 
available right now, I cannot relaunch the job to get the output updated.

And why is "num_proc=0" output everywhere - was it redefined (usually it's a 
load sensor set to the found cores in the machines and shoudn't be touched by 
hand making it a consumable complex).

[eg: ] my mistake i think, this was made a consumable complex so that we could 
easily schedule multithread and parallel job on the cluster. I guess I should 
define another complex (proc_available), make it consumable and consume from 
this complex instead of touching the num_proc sensor one then...

-- Reuti


> ---------------------------------------------------------------------------------
> smp...@barney.fft              BIP   0/1/4          0.70     lx-amd64
>        hc:num_proc=0
>        hl:mem_free=31.215G
>        hl:mem_used=280.996M
>        hc:mem_available=1.715G
>   1296 0.54786 semi_direc jj           r     04/03/2012 16:43:49     1
> ---------------------------------------------------------------------------------
> smp...@carl.fft                BIP   0/1/4          0.69     lx-amd64
>        hc:num_proc=0
>        hl:mem_free=30.764G
>        hl:mem_used=742.805M
>        hc:mem_available=1.715G
>   1296 0.54786 semi_direc jj           r     04/03/2012 16:43:49     1
> ---------------------------------------------------------------------------------
> smp...@charlie.fft             BIP   0/2/8          0.57     lx-amd64
>        hc:num_proc=0
>        hl:mem_free=62.234G
>        hl:mem_used=836.797M
>        hc:mem_available=4.018G
>   1296 0.54786 semi_direc jj           r     04/03/2012 16:43:49     2
> ----------------------------------------------------------------------
> -----------
> 
> Sge reports whatr pls_gridengine_report does, i.e. what was reserved.
> But here is the ouput of the current job (after started by openmpi):
> [charlie:05294] ras:gridengine: JOB_ID: 1296 [charlie:05294] 
> ras:gridengine: PE_HOSTFILE: 
> /opt/sge/default/spool/charlie/active_jobs/1296.1/pe_hostfile
> [charlie:05294] ras:gridengine: charlie.fft: PE_HOSTFILE shows slots=2 
> [charlie:05294] ras:gridengine: barney.fft: PE_HOSTFILE shows slots=1 
> [charlie:05294] ras:gridengine: carl.fft: PE_HOSTFILE shows slots=1
> 
> ======================   ALLOCATED NODES   ======================
> 
> Data for node: Name: charlie   Launch id: -1 Arch: ffc91200  State: 2
>  Num boards: 1 Num sockets/board: 2  Num cores/socket: 4
>  Daemon: [[54347,0],0] Daemon launched: True  Num slots: 2  Slots in 
> use: 0  Num slots allocated: 2  Max slots: 0  Username on node: NULL  
> Num procs: 0  Next node_rank: 0
> Data for node: Name: barney.fft    Launch id: -1 Arch: 0 State: 2
>  Num boards: 1 Num sockets/board: 2  Num cores/socket: 4
>  Daemon: Not defined Daemon launched: False  Num slots: 1  Slots in 
> use: 0  Num slots allocated: 1  Max slots: 0  Username on node: NULL  
> Num procs: 0  Next node_rank: 0
> Data for node: Name: carl.fft    Launch id: -1 Arch: 0 State: 2
>  Num boards: 1 Num sockets/board: 2  Num cores/socket: 4
>  Daemon: Not defined Daemon launched: False  Num slots: 1  Slots in 
> use: 0  Num slots allocated: 1  Max slots: 0  Username on node: NULL  
> Num procs: 0  Next node_rank: 0
> 
> =================================================================
> 
> Map generated by mapping policy: 0200
>  Npernode: 0 Oversubscribe allowed: TRUE CPU Lists: FALSE  Num new 
> daemons: 2  New daemon starting vpid 1  Num nodes: 3
> 
> Data for node: Name: charlie   Launch id: -1 Arch: ffc91200  State: 2
>  Num boards: 1 Num sockets/board: 2  Num cores/socket: 4
>  Daemon: [[54347,0],0] Daemon launched: True  Num slots: 2  Slots in 
> use: 2  Num slots allocated: 2  Max slots: 0  Username on node: NULL  
> Num procs: 2  Next node_rank: 2  Data for proc: [[54347,1],0]
>    Pid: 0  Local rank: 0 Node rank: 0
>    State: 0  App_context: 0  Slot list: NULL  Data for proc: 
> [[54347,1],3]
>    Pid: 0  Local rank: 1 Node rank: 1
>    State: 0  App_context: 0  Slot list: NULL
> Data for node: Name: barney.fft    Launch id: -1 Arch: 0 State: 2
>  Num boards: 1 Num sockets/board: 2  Num cores/socket: 4
>  Daemon: [[54347,0],1] Daemon launched: False  Num slots: 1  Slots in 
> use: 1  Num slots allocated: 1  Max slots: 0  Username on node: NULL  
> Num procs: 1  Next node_rank: 1  Data for proc: [[54347,1],1]
>    Pid: 0  Local rank: 0 Node rank: 0
>    State: 0  App_context: 0  Slot list: NULL
> 
> Data for node: Name: carl.fft    Launch id: -1 Arch: 0 State: 2
>  Num boards: 1 Num sockets/board: 2  Num cores/socket: 4
>  Daemon: [[54347,0],2] Daemon launched: False  Num slots: 1  Slots in 
> use: 1  Num slots allocated: 1  Max slots: 0  Username on node: NULL  
> Num procs: 1  Next node_rank: 1  Data for proc: [[54347,1],2]
>    Pid: 0  Local rank: 0 Node rank: 0
>    State: 0  App_context: 0  Slot list: NULL
> 
> Regards,
> Eloi
> 
> 
> 
> -----Original Message-----
> From: users-boun...@open-mpi.org [mailto:users-boun...@open-mpi.org] 
> On Behalf Of Reuti
> Sent: mardi 3 avril 2012 16:24
> To: Open MPI Users
> Subject: Re: [OMPI users] sge tight intregration leads to bad 
> allocation
> 
> Hi,
> 
> Am 03.04.2012 um 16:12 schrieb Eloi Gaudry:
> 
>> Thanks for your feedback.
>> No, this is the other way around, the "reserved" slots on all nodes are ok 
>> but the "used" slots are different.
>> 
>> Basically, I'm using SGE to schedule and book resources for a distributed 
>> job. When the job is finally launched, it uses a different allocation than 
>> the one that was reported by pls_gridengine_info.
>> 
>> pls_grid_engine_info report states that 3 nodes were booked: barney (1 
>> slot), carl (1 slot) and charlie (2 slots). This booking was done by sge 
>> depending on the memory requirements of the job (among others).
>> 
>> When orterun starts the jobs (i.e. when Sge finally start the scheduled 
>> job), it uses 3 nodes but the first one (barney: 2 slots instead of 1) is 
>> oversubscribed and the last one (carl: 1 slot instead of 2) is underused.
> 
> you configured Open MPI to support SGE tight integration and used a PE for 
> submitting the job? Can you please post the defintion of the PE.
> 
> What was the allocation you saw in SGE's `qstat -g t ` for the job?
> 
> -- Reuti
> 
> 
>> If you need further information, please let me know.
>> 
>> Eloi
>> 
>> From: users-boun...@open-mpi.org [mailto:users-boun...@open-mpi.org] 
>> On Behalf Of Ralph Castain
>> Sent: mardi 3 avril 2012 15:58
>> To: Open MPI Users
>> Subject: Re: [OMPI users] sge tight intregration leads to bad 
>> allocation
>> 
>> I'm afraid there isn't enough info here to help. Are you saying you only 
>> allocated one slot/node, so the two slots on charlie is in error?
>> 
>> Sent from my iPad
>> 
>> On Apr 3, 2012, at 6:23 AM, "Eloi Gaudry" <eloi.gau...@fft.be> wrote:
>> 
>> Hi,
>> 
>> I've observed a strange behavior during rank allocation on a distributed run 
>> schedule and submitted using Sge (Son of Grid Egine 8.0.0d) and 
>> OpenMPI-1.4.4.
>> Briefly, there is a one-slot difference between allocated rank/slot for Sge 
>> and OpenMPI. The issue here is that one node becomes oversubscribed at 
>> runtime.
>> 
>> Here is the output of the allocation done for gridengine:
>> 
>> ======================   ALLOCATED NODES   ======================
>> 
>> Data for node: Name: barney                 Launch id: -1      Arch: 
>> ffc91200   State: 2
>>               Num boards: 1  Num sockets/board: 2  Num cores/socket: 2
>>               Daemon: [[22904,0],0]  Daemon launched: True
>>               Num slots: 1      Slots in use: 0
>>               Num slots allocated: 1   Max slots: 0
>>               Username on node: NULL
>>               Num procs: 0     Next node_rank: 0
>> Data for node: Name: carl.fft                  Launch id: -1      Arch: 0  
>> State: 2
>>               Num boards: 1  Num sockets/board: 2  Num cores/socket: 2
>>               Daemon: Not defined   Daemon launched: False
>>               Num slots: 1      Slots in use: 0
>>               Num slots allocated: 1   Max slots: 0
>>               Username on node: NULL
>>               Num procs: 0     Next node_rank: 0
>> Data for node: Name: charlie.fft                            Launch id: -1    
>>   Arch: 0  State: 2
>>               Num boards: 1  Num sockets/board: 2  Num cores/socket: 2
>>               Daemon: Not defined   Daemon launched: False
>>               Num slots: 2      Slots in use: 0
>>               Num slots allocated: 2   Max slots: 0
>>               Username on node: NULL
>>               Num procs: 0     Next node_rank: 0
>> 
>> 
>> And here is the allocation finally used:
>> =================================================================
>> 
>> Map generated by mapping policy: 0200
>>               Npernode: 0      Oversubscribe allowed: TRUE   CPU Lists: FALSE
>>               Num new daemons: 2  New daemon starting vpid 1
>>               Num nodes: 3
>> 
>> Data for node: Name: barney                 Launch id: -1      Arch: 
>> ffc91200   State: 2
>>               Num boards: 1  Num sockets/board: 2  Num cores/socket: 2
>>               Daemon: [[22904,0],0]  Daemon launched: True
>>               Num slots: 1      Slots in use: 2
>>               Num slots allocated: 1   Max slots: 0
>>               Username on node: NULL
>>               Num procs: 2     Next node_rank: 2
>>               Data for proc: [[22904,1],0]
>>                              Pid: 0     Local rank: 0       Node rank: 0
>>                              State: 0                App_context: 0          
>>       Slot list: NULL
>>               Data for proc: [[22904,1],3]
>>                              Pid: 0     Local rank: 1       Node rank: 1
>>                              State: 0                App_context: 0          
>>       Slot list: NULL
>> 
>> Data for node: Name: carl.fft                  Launch id: -1      Arch: 0  
>> State: 2
>>               Num boards: 1  Num sockets/board: 2  Num cores/socket: 2
>>               Daemon: [[22904,0],1]  Daemon launched: False
>>               Num slots: 1      Slots in use: 1
>>               Num slots allocated: 1   Max slots: 0
>>               Username on node: NULL
>>               Num procs: 1     Next node_rank: 1
>>               Data for proc: [[22904,1],1]
>>                              Pid: 0     Local rank: 0       Node rank: 0
>>                              State: 0                App_context: 0          
>>       Slot list: NULL
>> 
>> Data for node: Name: charlie.fft                            Launch id: -1    
>>   Arch: 0  State: 2
>>               Num boards: 1  Num sockets/board: 2  Num cores/socket: 2
>>               Daemon: [[22904,0],2]  Daemon launched: False
>>               Num slots: 2      Slots in use: 1
>>               Num slots allocated: 2   Max slots: 0
>>               Username on node: NULL
>>               Num procs: 1     Next node_rank: 1
>>               Data for proc: [[22904,1],2]
>>                              Pid: 0     Local rank: 0       Node rank: 0
>>                              State: 0                App_context: 0          
>>       Slot list: NULL
>> 
>> Has anyone already encounter the same behavior ?
>> Is there a simple fix than not using the tight integration mode between Sge 
>> and OpenMPI ?
>> 
>> Eloi
>> 
>> _______________________________________________
>> users mailing list
>> us...@open-mpi.org
>> http://www.open-mpi.org/mailman/listinfo.cgi/users
>> _______________________________________________
>> users mailing list
>> us...@open-mpi.org
>> http://www.open-mpi.org/mailman/listinfo.cgi/users
> 
> 
> _______________________________________________
> users mailing list
> us...@open-mpi.org
> http://www.open-mpi.org/mailman/listinfo.cgi/users
> 
> 
> _______________________________________________
> users mailing list
> us...@open-mpi.org
> http://www.open-mpi.org/mailman/listinfo.cgi/users


_______________________________________________
users mailing list
us...@open-mpi.org
http://www.open-mpi.org/mailman/listinfo.cgi/users

Re: [OMPI users] sge tight intregration leads to bad allocation

Reply via email to