Hi Reuti,

I configured OpenMPI to support SGE tight integration and used the defined 
below PE for submitting the job:

[16:36][eg@moe:~]$ qconf -sp fill_up
pe_name            fill_up
slots              80
user_lists         NONE
xuser_lists        NONE
start_proc_args    /bin/true
stop_proc_args     /bin/true
allocation_rule    $fill_up
control_slaves     TRUE
job_is_first_task  FALSE
urgency_slots      min
accounting_summary FALSE

Here are the allocation info retrieved from `qstat -g t` for the related job:
---------------------------------------------------------------------------------
smp...@barney.fft              BIP   0/1/4          0.70     lx-amd64
        hc:num_proc=0
        hl:mem_free=31.215G
        hl:mem_used=280.996M
        hc:mem_available=1.715G
   1296 0.54786 semi_direc jj           r     04/03/2012 16:43:49     1
---------------------------------------------------------------------------------
smp...@carl.fft                BIP   0/1/4          0.69     lx-amd64
        hc:num_proc=0
        hl:mem_free=30.764G
        hl:mem_used=742.805M
        hc:mem_available=1.715G
   1296 0.54786 semi_direc jj           r     04/03/2012 16:43:49     1
---------------------------------------------------------------------------------
smp...@charlie.fft             BIP   0/2/8          0.57     lx-amd64
        hc:num_proc=0
        hl:mem_free=62.234G
        hl:mem_used=836.797M
        hc:mem_available=4.018G
   1296 0.54786 semi_direc jj           r     04/03/2012 16:43:49     2
---------------------------------------------------------------------------------

Sge reports whatr pls_gridengine_report does, i.e. what was reserved.
But here is the ouput of the current job (after started by openmpi):
[charlie:05294] ras:gridengine: JOB_ID: 1296
[charlie:05294] ras:gridengine: PE_HOSTFILE: 
/opt/sge/default/spool/charlie/active_jobs/1296.1/pe_hostfile
[charlie:05294] ras:gridengine: charlie.fft: PE_HOSTFILE shows slots=2
[charlie:05294] ras:gridengine: barney.fft: PE_HOSTFILE shows slots=1
[charlie:05294] ras:gridengine: carl.fft: PE_HOSTFILE shows slots=1

======================   ALLOCATED NODES   ======================

 Data for node: Name: charlie   Launch id: -1 Arch: ffc91200  State: 2
  Num boards: 1 Num sockets/board: 2  Num cores/socket: 4
  Daemon: [[54347,0],0] Daemon launched: True
  Num slots: 2  Slots in use: 0
  Num slots allocated: 2  Max slots: 0
  Username on node: NULL
  Num procs: 0  Next node_rank: 0
 Data for node: Name: barney.fft    Launch id: -1 Arch: 0 State: 2
  Num boards: 1 Num sockets/board: 2  Num cores/socket: 4
  Daemon: Not defined Daemon launched: False
  Num slots: 1  Slots in use: 0
  Num slots allocated: 1  Max slots: 0
  Username on node: NULL
  Num procs: 0  Next node_rank: 0
 Data for node: Name: carl.fft    Launch id: -1 Arch: 0 State: 2
  Num boards: 1 Num sockets/board: 2  Num cores/socket: 4
  Daemon: Not defined Daemon launched: False
  Num slots: 1  Slots in use: 0
  Num slots allocated: 1  Max slots: 0
  Username on node: NULL
  Num procs: 0  Next node_rank: 0

=================================================================

 Map generated by mapping policy: 0200
  Npernode: 0 Oversubscribe allowed: TRUE CPU Lists: FALSE
  Num new daemons: 2  New daemon starting vpid 1
  Num nodes: 3

 Data for node: Name: charlie   Launch id: -1 Arch: ffc91200  State: 2
  Num boards: 1 Num sockets/board: 2  Num cores/socket: 4
  Daemon: [[54347,0],0] Daemon launched: True
  Num slots: 2  Slots in use: 2
  Num slots allocated: 2  Max slots: 0
  Username on node: NULL
  Num procs: 2  Next node_rank: 2
  Data for proc: [[54347,1],0]
    Pid: 0  Local rank: 0 Node rank: 0
    State: 0  App_context: 0  Slot list: NULL
  Data for proc: [[54347,1],3]
    Pid: 0  Local rank: 1 Node rank: 1
    State: 0  App_context: 0  Slot list: NULL
Data for node: Name: barney.fft    Launch id: -1 Arch: 0 State: 2
  Num boards: 1 Num sockets/board: 2  Num cores/socket: 4
  Daemon: [[54347,0],1] Daemon launched: False
  Num slots: 1  Slots in use: 1
  Num slots allocated: 1  Max slots: 0
  Username on node: NULL
  Num procs: 1  Next node_rank: 1
  Data for proc: [[54347,1],1]
    Pid: 0  Local rank: 0 Node rank: 0
    State: 0  App_context: 0  Slot list: NULL

 Data for node: Name: carl.fft    Launch id: -1 Arch: 0 State: 2
  Num boards: 1 Num sockets/board: 2  Num cores/socket: 4
  Daemon: [[54347,0],2] Daemon launched: False
  Num slots: 1  Slots in use: 1
  Num slots allocated: 1  Max slots: 0
  Username on node: NULL
  Num procs: 1  Next node_rank: 1
  Data for proc: [[54347,1],2]
    Pid: 0  Local rank: 0 Node rank: 0
    State: 0  App_context: 0  Slot list: NULL

Regards,
Eloi



-----Original Message-----
From: users-boun...@open-mpi.org [mailto:users-boun...@open-mpi.org] On Behalf 
Of Reuti
Sent: mardi 3 avril 2012 16:24
To: Open MPI Users
Subject: Re: [OMPI users] sge tight intregration leads to bad allocation

Hi,

Am 03.04.2012 um 16:12 schrieb Eloi Gaudry:

> Thanks for your feedback.
> No, this is the other way around, the "reserved" slots on all nodes are ok 
> but the "used" slots are different.
>  
> Basically, I'm using SGE to schedule and book resources for a distributed 
> job. When the job is finally launched, it uses a different allocation than 
> the one that was reported by pls_gridengine_info.
>  
> pls_grid_engine_info report states that 3 nodes were booked: barney (1 slot), 
> carl (1 slot) and charlie (2 slots). This booking was done by sge depending 
> on the memory requirements of the job (among others).
>  
> When orterun starts the jobs (i.e. when Sge finally start the scheduled job), 
> it uses 3 nodes but the first one (barney: 2 slots instead of 1) is 
> oversubscribed and the last one (carl: 1 slot instead of 2) is underused.

you configured Open MPI to support SGE tight integration and used a PE for 
submitting the job? Can you please post the defintion of the PE.

What was the allocation you saw in SGE's `qstat -g t ` for the job?

-- Reuti


> If you need further information, please let me know.
>  
> Eloi
>  
> From: users-boun...@open-mpi.org [mailto:users-boun...@open-mpi.org] On 
> Behalf Of Ralph Castain
> Sent: mardi 3 avril 2012 15:58
> To: Open MPI Users
> Subject: Re: [OMPI users] sge tight intregration leads to bad allocation
>  
> I'm afraid there isn't enough info here to help. Are you saying you only 
> allocated one slot/node, so the two slots on charlie is in error?
> 
> Sent from my iPad
> 
> On Apr 3, 2012, at 6:23 AM, "Eloi Gaudry" <eloi.gau...@fft.be> wrote:
> 
> Hi,
>  
> I've observed a strange behavior during rank allocation on a distributed run 
> schedule and submitted using Sge (Son of Grid Egine 8.0.0d) and OpenMPI-1.4.4.
> Briefly, there is a one-slot difference between allocated rank/slot for Sge 
> and OpenMPI. The issue here is that one node becomes oversubscribed at 
> runtime.
>  
> Here is the output of the allocation done for gridengine:
>  
> ======================   ALLOCATED NODES   ======================
>  
> Data for node: Name: barney                 Launch id: -1      Arch: ffc91200 
>   State: 2
>                Num boards: 1  Num sockets/board: 2  Num cores/socket: 2
>                Daemon: [[22904,0],0]  Daemon launched: True
>                Num slots: 1      Slots in use: 0
>                Num slots allocated: 1   Max slots: 0
>                Username on node: NULL
>                Num procs: 0     Next node_rank: 0
> Data for node: Name: carl.fft                  Launch id: -1      Arch: 0  
> State: 2
>                Num boards: 1  Num sockets/board: 2  Num cores/socket: 2
>                Daemon: Not defined   Daemon launched: False
>                Num slots: 1      Slots in use: 0
>                Num slots allocated: 1   Max slots: 0
>                Username on node: NULL
>                Num procs: 0     Next node_rank: 0
> Data for node: Name: charlie.fft                            Launch id: -1     
>  Arch: 0  State: 2
>                Num boards: 1  Num sockets/board: 2  Num cores/socket: 2
>                Daemon: Not defined   Daemon launched: False
>                Num slots: 2      Slots in use: 0
>                Num slots allocated: 2   Max slots: 0
>                Username on node: NULL
>                Num procs: 0     Next node_rank: 0
>  
>  
> And here is the allocation finally used:
> =================================================================
>  
> Map generated by mapping policy: 0200
>                Npernode: 0      Oversubscribe allowed: TRUE   CPU Lists: FALSE
>                Num new daemons: 2  New daemon starting vpid 1
>                Num nodes: 3
>  
> Data for node: Name: barney                 Launch id: -1      Arch: ffc91200 
>   State: 2
>                Num boards: 1  Num sockets/board: 2  Num cores/socket: 2
>                Daemon: [[22904,0],0]  Daemon launched: True
>                Num slots: 1      Slots in use: 2
>                Num slots allocated: 1   Max slots: 0
>                Username on node: NULL
>                Num procs: 2     Next node_rank: 2
>                Data for proc: [[22904,1],0]
>                               Pid: 0     Local rank: 0       Node rank: 0
>                               State: 0                App_context: 0          
>       Slot list: NULL
>                Data for proc: [[22904,1],3]
>                               Pid: 0     Local rank: 1       Node rank: 1
>                               State: 0                App_context: 0          
>       Slot list: NULL
>  
> Data for node: Name: carl.fft                  Launch id: -1      Arch: 0  
> State: 2
>                Num boards: 1  Num sockets/board: 2  Num cores/socket: 2
>                Daemon: [[22904,0],1]  Daemon launched: False
>                Num slots: 1      Slots in use: 1
>                Num slots allocated: 1   Max slots: 0
>                Username on node: NULL
>                Num procs: 1     Next node_rank: 1
>                Data for proc: [[22904,1],1]
>                               Pid: 0     Local rank: 0       Node rank: 0
>                               State: 0                App_context: 0          
>       Slot list: NULL
>  
> Data for node: Name: charlie.fft                            Launch id: -1     
>  Arch: 0  State: 2
>                Num boards: 1  Num sockets/board: 2  Num cores/socket: 2
>                Daemon: [[22904,0],2]  Daemon launched: False
>                Num slots: 2      Slots in use: 1
>                Num slots allocated: 2   Max slots: 0
>                Username on node: NULL
>                Num procs: 1     Next node_rank: 1
>                Data for proc: [[22904,1],2]
>                               Pid: 0     Local rank: 0       Node rank: 0
>                               State: 0                App_context: 0          
>       Slot list: NULL
>  
> Has anyone already encounter the same behavior ?
> Is there a simple fix than not using the tight integration mode between Sge 
> and OpenMPI ?
>  
> Eloi
>  
> _______________________________________________
> users mailing list
> us...@open-mpi.org
> http://www.open-mpi.org/mailman/listinfo.cgi/users
> _______________________________________________
> users mailing list
> us...@open-mpi.org
> http://www.open-mpi.org/mailman/listinfo.cgi/users


_______________________________________________
users mailing list
us...@open-mpi.org
http://www.open-mpi.org/mailman/listinfo.cgi/users


Reply via email to