Hi

Reuti wrote:
>>
>> ./configure --prefix=/homes/kazi/glembek/share/openmpi-1.3.3-64
>> --with-sge --enable-shared --enable-static --host=x86_64-linux
>> --build=x86_64-linux NM=x86_64-linux-nm
> 
> Is there any list of valid values for --host, --build and NM - and what
> is NM for? From the ./configure --help I would "assume" that one can
> tell Open MPI to prepare to BUILD on a PPC platform, although I'm
> issuing the command on a x86, and the result of the PPC compile should
> be to run on x86_64. Maybe you can leave it out, as it's the same in
> your case?

This is not the problem... We have both 32bit and 64bit machines and the
problem occurs on both (i.e. omitting the --host --build, etc)...

> 
>> Is there any way to force the ssh before the (...) term???
> 
> Using SSH directly would bypass SGE's startup. What are your entries for
> qrsh_daemon and so on in SGE's configuration? Which version of SGE?

qstat reports version number as "GE 6.2u4"... Below is qconf -sconf dump.

> 
> But I think the real problem is, that Open MPI assumes you are outside
> of SGE and so uses a different startup. Are you resetting any of SGE's
> environment variables in your custom starter method (like $JOB_ID)?
I don't think that openmpi doesn't know about SGE when it calls the
starter.sh...


The starter.sh looks like this:

$$$
#!/bin/sh

ulimit -S -c 0
ulimit -S -t unlimited

#echo "$@" >>/pub/tmp/starter.log

#start the job in thus shell
exec "$@"


so no resetting of any kind. Also the open_info looks ok:

$$$
ompi_info | grep gridengine
                 MCA ras: gridengine (MCA v2.0, API v2.0, Component v1.3.3)


$$$
qconf -sconf:
qconf -sconf
#global:
execd_spool_dir              /usr/local/share/SGE/default/spool
mailer                       /bin/mail
xterm                        /usr/bin/xterm
load_sensor                  /usr/local/share/SGE/util/disk.sh
prolog                       none
epilog                       none
shell_start_mode             posix_compliant
login_shells                 sh,ksh,csh,tcsh,bash
min_uid                      0
min_gid                      0
user_lists                   none
xuser_lists                  none
projects                     none
xprojects                    none
enforce_project              false
enforce_user                 auto
load_report_time             00:00:30
max_unheard                  00:05:00
reschedule_unknown           00:00:00
loglevel                     log_warning
administrator_mail           li...@fit.vutbr.cz
set_token_cmd                none
pag_cmd                      none
token_extend_time            none
shepherd_cmd                 none
qmaster_params               none
reporting_params             accounting=true reporting=false \
                             flush_time=00:00:15 joblog=false
sharelog=00:00:00
finished_jobs                20
gid_range                    20000-20100
qlogin_command               builtin
qlogin_daemon                builtin
rlogin_daemon                builtin
max_aj_instances             2000
max_aj_tasks                 90000
max_u_jobs                   0
max_jobs                     0
auto_user_oticket            0
auto_user_fshare             0
auto_user_default_project    STD
auto_user_delete_time        0
delegated_file_staging       false
rsh_daemon                   builtin
rsh_command                  builtin
rlogin_command               builtin
reprioritize                 0
jsv_url                      none
jsv_allowed_mod              ac,h,i,e,o,j,M,N,p,w


Thanx

> 
> -- Reuti
> 
> 
>>
>> Thanx
>> Ondrej
>>
>>
>> Reuti wrote:
>>> Am 30.11.2009 um 18:46 schrieb Ondrej Glembek:
>>>> Hi, thanx for reply...
>>>>
>>>> I tried to dump the $@ before calling the exec and here it is:
>>>>
>>>>
>>>> ( test ! -r ./.profile || . ./.profile;
>>>> PATH=/homes/kazi/glembek/share/openmpi-1.3.3-64/bin:$PATH ; export
>>>> PATH ;
>>>> LD_LIBRARY_PATH=/homes/kazi/glembek/share/openmpi-1.3.3-64/lib:$LD_LIBRARY_PATH
>>>> ; export LD_LIBRARY_PATH ;
>>>> /homes/kazi/glembek/share/openmpi-1.3.3-64/bin/orted -mca ess env
>>>> -mca orte_ess_jobid 3870359552 -mca orte_ess_vpid 1 -mca
>>>> orte_ess_num_procs 2 --hnp-uri
>>>> "3870359552.0;tcp://147.229.8.134:53727" --mca
>>>> pls_gridengine_verbose 1 --output-filename mpi.log )
>>>>
>>>>
>>>> It looks like the line gets constructed in
>>>> orte/mca/plm/rsh/plm_rsh_module.c and depends on the shell...
>>>>
>>>> Still I wonder, why mpiexec calls the starter.sh... I thought the
>>>> starter was supposed to call the script which wraps a call to
>>>> mpiexec...
>>> Correct. This will happen for the master node of this job, i.e. where
>>> the jobscript is executed. But it will also be used for the qrsh
>>> -inherit calls. I wonder about one thing: I see only a call to
>>> "orted" and not the above sub-shell on my machines. Did you compile
>>> Open MPI with --with-sge?
>>> The original call above would be "ssh node_xy ( test ! ....)" which
>>> seems working for ssh and rsh.
>>> Just one note: with the starter script you will lose the set PATH and
>>> LD_LIBRARY_PATH, as a new shell is created. It might be necessary to
>>> set it again in your starter method.
>>> -- Reuti
>>>>
>>>> Am I not right???
>>>> Ondrej
>>>>
>>>>
>>>> Reuti wrote:
>>>>> Hi,
>>>>> Am 30.11.2009 um 16:33 schrieb Ondrej Glembek:
>>>>>> we are using a custom starter method in our SGE to launch our
>>>>>> jobs... It
>>>>>> looks something like this:
>>>>>>
>>>>>> #!/bin/sh
>>>>>>
>>>>>> # ... we do whole bunch of stuff here
>>>>>>
>>>>>> #start the job in thus shell
>>>>>> exec "$@"
>>>>> the "$@" should be replaced by the path to the jobscript (qsub) or
>>>>> command (qrsh) plus the given options.
>>>>> For the spread tasks to other nodes I get as argument: " orted -mca
>>>>> ess env -mca orte_ess_jobid ...". Also no . ./.profile.
>>>>> So I wonder, where the . ./.profile is coming from. Can you put a
>>>>> `sleep 60` or alike before the `exec ...` and grep the built line
>>>>> from `ps -e f` before it crashes?
>>>>> -- Reuti
>>>>>> The trouble is that mpiexec passes a command which looks like this:
>>>>>>
>>>>>> ( . ./.profile ..... )
>>>>>>
>>>>>> which, however, is not a valid exec argument...
>>>>>>
>>>>>> Is there any way to tell mpiexec to run it in a separate script???
>>>>>> Any
>>>>>> idea how to solve this???
>>>>>>
>>>>>> Thanx
>>>>>> Ondrej Glembek
>>>>>>
>>>>>> -- 
>>>>>>
>>>>>>   Ondrej Glembek, PhD student  E-mail: glem...@fit.vutbr.cz
>>>>>>   UPGM FIT VUT Brno, L226      Web:   
>>>>>> http://www.fit.vutbr.cz/~glembek
>>>>>>   Bozetechova 2, 612 66        Phone:  +420 54114-1292
>>>>>>   Brno, Czech Republic         Fax:    +420 54114-1290
>>>>>>
>>>>>>   ICQ: 93233896
>>>>>>   GPG: C050 A6DC 7291 6776 9B69 BB11 C033 D756 6F33 DE3C
>>>>>> _______________________________________________
>>>>>> users mailing list
>>>>>> us...@open-mpi.org
>>>>>> http://www.open-mpi.org/mailman/listinfo.cgi/users
>>>>> _______________________________________________
>>>>> users mailing list
>>>>> us...@open-mpi.org
>>>>> http://www.open-mpi.org/mailman/listinfo.cgi/users
>>>>
>>>> -- 
>>>>
>>>>   Ondrej Glembek, PhD student  E-mail: glem...@fit.vutbr.cz
>>>>   UPGM FIT VUT Brno, L226      Web:    http://www.fit.vutbr.cz/~glembek
>>>>   Bozetechova 2, 612 66        Phone:  +420 54114-1292
>>>>   Brno, Czech Republic         Fax:    +420 54114-1290
>>>>
>>>>   ICQ: 93233896
>>>>   GPG: C050 A6DC 7291 6776 9B69 BB11 C033 D756 6F33 DE3C
>>>> _______________________________________________
>>>> users mailing list
>>>> us...@open-mpi.org
>>>> http://www.open-mpi.org/mailman/listinfo.cgi/users
>>> _______________________________________________
>>> users mailing list
>>> us...@open-mpi.org
>>> http://www.open-mpi.org/mailman/listinfo.cgi/users
>>
>> -- 
>>
>>   Ondrej Glembek, PhD student  E-mail: glem...@fit.vutbr.cz
>>   UPGM FIT VUT Brno, L226      Web:    http://www.fit.vutbr.cz/~glembek
>>   Bozetechova 2, 612 66        Phone:  +420 54114-1292
>>   Brno, Czech Republic         Fax:    +420 54114-1290
>>
>>   ICQ: 93233896
>>   GPG: C050 A6DC 7291 6776 9B69 BB11 C033 D756 6F33 DE3C
>> _______________________________________________
>> users mailing list
>> us...@open-mpi.org
>> http://www.open-mpi.org/mailman/listinfo.cgi/users
> 
> _______________________________________________
> users mailing list
> us...@open-mpi.org
> http://www.open-mpi.org/mailman/listinfo.cgi/users

-- 

  Ondrej Glembek, PhD student  E-mail: glem...@fit.vutbr.cz
  UPGM FIT VUT Brno, L226      Web:    http://www.fit.vutbr.cz/~glembek
  Bozetechova 2, 612 66        Phone:  +420 54114-1292
  Brno, Czech Republic         Fax:    +420 54114-1290

  ICQ: 93233896
  GPG: C050 A6DC 7291 6776 9B69 BB11 C033 D756 6F33 DE3C

Reply via email to