Hi there,

I'm observing something strange on our cluster managed by SGE6.2u4 when 
launching a parallel computation on several nodes, using OpenMPI/SGE tight-
integration mode (OpenMPI-1.3.3). It seems that the SGE allocated slots are 
not used by OpenMPI, as if OpenMPI was doing is own round-robin allocation 
based on the allocated node hostnames.

Here is what I'm doing:
- launch a parallel computation involving 8 processors, using for each of them 
14GB of memory. I'm using a qsub command where i request memory_free resource 
and use tight integration with openmpi
- 3 servers are available:
. barney with 4 cores (4 slots) and 32GB
. carl with 4 cores (4 slots) and 32GB
. charlie with 8 cores (8 slots) and 64GB

Here is the output of the allocated nodes (OpenMPI output):
======================   ALLOCATED NODES   ======================

 Data for node: Name: charlie   Launch id: -1 Arch: ffc91200  State: 2
  Daemon: [[44332,0],0] Daemon launched: True
  Num slots: 4  Slots in use: 0
  Num slots allocated: 4  Max slots: 0
  Username on node: NULL
  Num procs: 0  Next node_rank: 0
 Data for node: Name: carl.fft    Launch id: -1 Arch: 0 State: 2
  Daemon: Not defined Daemon launched: False
  Num slots: 2  Slots in use: 0
  Num slots allocated: 2  Max slots: 0
  Username on node: NULL
  Num procs: 0  Next node_rank: 0
 Data for node: Name: barney.fft    Launch id: -1 Arch: 0 State: 2
  Daemon: Not defined Daemon launched: False
  Num slots: 2  Slots in use: 0
  Num slots allocated: 2  Max slots: 0
  Username on node: NULL
  Num procs: 0  Next node_rank: 0

=================================================================

Here is what I see when my computation is running on the cluster:
#     rank       pid          hostname
         0     28112          charlie
         1     11417          carl
         2     11808          barney
         3     28113          charlie
         4     11418          carl
         5     11809          barney
         6     28114          charlie
         7     11419          carl

Note that -the parallel environment used under SGE is defined as:
[eg@moe:~]$ qconf -sp round_robin
pe_name            round_robin
slots              32
user_lists         NONE
xuser_lists        NONE
start_proc_args    /bin/true
stop_proc_args     /bin/true
allocation_rule    $round_robin
control_slaves     TRUE
job_is_first_task  FALSE
urgency_slots      min
accounting_summary FALSE

I'm wondering why OpenMPI didn't use the allocated nodes chosen by SGE (cf. 
"ALLOCATED NODES" report) but instead allocate each job of the parallel 
computation at a time, using a round-robin method.

Note that I'm using the '--bynode' option in the orterun command line. If the 
behavior I'm observing is simply the consequence of using this option, please 
let me know. This would eventually mean that one need to state that SGE tight-
integration has a lower priority on orterun behavior than the different command 
line options.

Any help would be appreciated,
Thanks,
Eloi


-- 


Eloi Gaudry

Free Field Technologies
Axis Park Louvain-la-Neuve
Rue Emile Francqui, 1
B-1435 Mont-Saint Guibert
BELGIUM

Company Phone: +32 10 487 959
Company Fax:   +32 10 454 626

Reply via email to