Hi, Am 25.05.2010 um 09:14 schrieb Eloi Gaudry:
> I do no reset any environment variable during job submission or job handling. > Is there a simple way to check that openmpi is working as expected with SGE > tight integration (as displaying environment variables, setting options on > the > command line, etc. ) ? a) put a command: env in the jobscript and check the output for $JOB_ID and various $SGE_* variables. b) to confirm the misbehavior: are the tasks on the slave nodes kids of sge_shepherd or any system sshd/rshd? -- Reuti > Regards, > Eloi > > > On Friday 21 May 2010 17:35:24 Reuti wrote: >> Hi, >> >> Am 21.05.2010 um 17:19 schrieb Eloi Gaudry: >>> Hi Reuti, >>> >>> Yes, the openmpi binaries used were build after having used the >>> --with-sge during configure, and we only use those binaries on our >>> cluster. >>> >>> [eg@moe:~]$ /opt/openmpi-1.3.3/bin/ompi_info >>> >>> MCA ras: gridengine (MCA v2.0, API v2.0, Component >>> v1.3.3) >> >> ok. As you have a Tight Integration as goal and set in your PE >> "control_slaves TRUE", SGE wouldn't allow `qrsh -inherit ...` to nodes >> which are not in the list of granted nodes. So it looks, like your job is >> running outside of this Tight Integration with its own `rsh`or `ssh`. >> >> Do you reset $JOB_ID or other environment variables in your jobscript, >> which could trigger Open MPI to assume that it's not running inside SGE? >> >> -- Reuti >> >>> On Friday 21 May 2010 16:01:54 Reuti wrote: >>>> Hi, >>>> >>>> Am 21.05.2010 um 14:11 schrieb Eloi Gaudry: >>>>> Hi there, >>>>> >>>>> I'm observing something strange on our cluster managed by SGE6.2u4 when >>>>> launching a parallel computation on several nodes, using OpenMPI/SGE >>>>> tight- integration mode (OpenMPI-1.3.3). It seems that the SGE >>>>> allocated slots are not used by OpenMPI, as if OpenMPI was doing is >>>>> own >>>>> round-robin allocation based on the allocated node hostnames. >>>> >>>> you compiled Open MPI with --with-sge (and recompiled your >>>> applications)? You are using the correct mpiexec? >>>> >>>> -- Reuti >>>> >>>>> Here is what I'm doing: >>>>> - launch a parallel computation involving 8 processors, using for each >>>>> of them 14GB of memory. I'm using a qsub command where i request >>>>> memory_free resource and use tight integration with openmpi >>>>> - 3 servers are available: >>>>> . barney with 4 cores (4 slots) and 32GB >>>>> . carl with 4 cores (4 slots) and 32GB >>>>> . charlie with 8 cores (8 slots) and 64GB >>>>> >>>>> Here is the output of the allocated nodes (OpenMPI output): >>>>> ====================== ALLOCATED NODES ====================== >>>>> >>>>> Data for node: Name: charlie Launch id: -1 Arch: ffc91200 State: 2 >>>>> >>>>> Daemon: [[44332,0],0] Daemon launched: True >>>>> Num slots: 4 Slots in use: 0 >>>>> Num slots allocated: 4 Max slots: 0 >>>>> Username on node: NULL >>>>> Num procs: 0 Next node_rank: 0 >>>>> >>>>> Data for node: Name: carl.fft Launch id: -1 Arch: 0 State: 2 >>>>> >>>>> Daemon: Not defined Daemon launched: False >>>>> Num slots: 2 Slots in use: 0 >>>>> Num slots allocated: 2 Max slots: 0 >>>>> Username on node: NULL >>>>> Num procs: 0 Next node_rank: 0 >>>>> >>>>> Data for node: Name: barney.fft Launch id: -1 Arch: 0 State: 2 >>>>> >>>>> Daemon: Not defined Daemon launched: False >>>>> Num slots: 2 Slots in use: 0 >>>>> Num slots allocated: 2 Max slots: 0 >>>>> Username on node: NULL >>>>> Num procs: 0 Next node_rank: 0 >>>>> >>>>> ================================================================= >>>>> >>>>> Here is what I see when my computation is running on the cluster: >>>>> # rank pid hostname >>>>> >>>>> 0 28112 charlie >>>>> 1 11417 carl >>>>> 2 11808 barney >>>>> 3 28113 charlie >>>>> 4 11418 carl >>>>> 5 11809 barney >>>>> 6 28114 charlie >>>>> 7 11419 carl >>>>> >>>>> Note that -the parallel environment used under SGE is defined as: >>>>> [eg@moe:~]$ qconf -sp round_robin >>>>> pe_name round_robin >>>>> slots 32 >>>>> user_lists NONE >>>>> xuser_lists NONE >>>>> start_proc_args /bin/true >>>>> stop_proc_args /bin/true >>>>> allocation_rule $round_robin >>>>> control_slaves TRUE >>>>> job_is_first_task FALSE >>>>> urgency_slots min >>>>> accounting_summary FALSE >>>>> >>>>> I'm wondering why OpenMPI didn't use the allocated nodes chosen by SGE >>>>> (cf. "ALLOCATED NODES" report) but instead allocate each job of the >>>>> parallel computation at a time, using a round-robin method. >>>>> >>>>> Note that I'm using the '--bynode' option in the orterun command line. >>>>> If the behavior I'm observing is simply the consequence of using this >>>>> option, please let me know. This would eventually mean that one need >>>>> to state that SGE tight- integration has a lower priority on orterun >>>>> behavior than the different command line options. >>>>> >>>>> Any help would be appreciated, >>>>> Thanks, >>>>> Eloi >>>> >>>> _______________________________________________ >>>> users mailing list >>>> us...@open-mpi.org >>>> http://www.open-mpi.org/mailman/listinfo.cgi/users > > -- > > > Eloi Gaudry > > Free Field Technologies > Axis Park Louvain-la-Neuve > Rue Emile Francqui, 1 > B-1435 Mont-Saint Guibert > BELGIUM > > Company Phone: +32 10 487 959 > Company Fax: +32 10 454 626