Hi Ralph,

Thanks, here is what I did as suggested by Jeff:

> What did this command line look like? Can you provide the configure line as 
> well? 

As in my previous post, the script as following:

(1) debug messages:
>>>
yiguang@gulftown testdmp]$ ./test.bash
[gulftown:28340] mca: base: components_open: Looking for plm components
[gulftown:28340] mca: base: components_open: opening plm components
[gulftown:28340] mca: base: components_open: found loaded component rsh
[gulftown:28340] mca: base: components_open: component rsh has no register 
function
[gulftown:28340] mca: base: components_open: component rsh open function 
successful
[gulftown:28340] mca: base: components_open: found loaded component slurm
[gulftown:28340] mca: base: components_open: component slurm has no register 
function
[gulftown:28340] mca: base: components_open: component slurm open function 
successful
[gulftown:28340] mca: base: components_open: found loaded component tm
[gulftown:28340] mca: base: components_open: component tm has no register 
function
[gulftown:28340] mca: base: components_open: component tm open function 
successful
[gulftown:28340] mca:base:select: Auto-selecting plm components
[gulftown:28340] mca:base:select:(  plm) Querying component [rsh]
[gulftown:28340] mca:base:select:(  plm) Query of component [rsh] set priority 
to 10
[gulftown:28340] mca:base:select:(  plm) Querying component [slurm]
[gulftown:28340] mca:base:select:(  plm) Skipping component [slurm]. Query 
failed to return a module
[gulftown:28340] mca:base:select:(  plm) Querying component [tm]
[gulftown:28340] mca:base:select:(  plm) Skipping component [tm]. Query failed 
to return a module
[gulftown:28340] mca:base:select:(  plm) Selected component [rsh]
[gulftown:28340] mca: base: close: component slurm closed
[gulftown:28340] mca: base: close: unloading component slurm
[gulftown:28340] mca: base: close: component tm closed
[gulftown:28340] mca: base: close: unloading component tm
[gulftown:28340] plm:base:set_hnp_name: initial bias 28340 nodename hash 
3546479048
[gulftown:28340] plm:base:set_hnp_name: final jobfam 17438
[gulftown:28340] [[17438,0],0] plm:base:receive start comm
[gulftown:28340] [[17438,0],0] plm:rsh: setting up job [17438,1]
[gulftown:28340] [[17438,0],0] plm:base:setup_job for job [17438,1]
[gulftown:28340] [[17438,0],0] plm:rsh: local shell: 0 (bash)
[gulftown:28340] [[17438,0],0] plm:rsh: assuming same remote shell as local 
shell
[gulftown:28340] [[17438,0],0] plm:rsh: remote shell: 0 (bash)
[gulftown:28340] [[17438,0],0] plm:rsh: final template argv:
        /usr/bin/rsh <template>  orted --daemonize -mca ess env -mca 
orte_ess_jobid 1142816768 -mca 
orte_ess_vpid <template> -mca orte_ess_num_procs 4 --hnp-uri 
"1142816768.0;tcp://198.177.146.70:43159;tcp://10.10.10.4:43159;tcp://172.23.10.1:43159;tcp://172.33.10.1:43159"
 -
-mca plm_rsh_agent rsh:ssh --mca btl_openib_warn_default_gid_prefix 0 --mca btl 
openib,sm,self --mca 
orte_tmpdir_base /tmp --mca plm_base_verbose 100
[gulftown:28340] [[17438,0],0] plm:rsh:launch daemon already exists on node 
gulftown
[gulftown:28340] [[17438,0],0] plm:rsh: launching on node ibnode001
[gulftown:28340] [[17438,0],0] plm:rsh: recording launch of daemon [[17438,0],1]
[gulftown:28340] [[17438,0],0] plm:rsh: executing: (//usr/bin/rsh) 
[/usr/bin/rsh ibnode001  orted --daemonize -mca 
ess env -mca orte_ess_jobid 1142816768 -mca orte_ess_vpid 1 -mca 
orte_ess_num_procs 4 --hnp-uri 
"1142816768.0;tcp://198.177.146.70:43159;tcp://10.10.10.4:43159;tcp://172.23.10.1:43159;tcp://172.33.10.1:43159"
 -
-mca plm_rsh_agent rsh:ssh --mca btl_openib_warn_default_gid_prefix 0 --mca btl 
openib,sm,self --mca 
orte_tmpdir_base /tmp --mca plm_base_verbose 100]
bash: orted: command not found
[gulftown:28340] [[17438,0],0] plm:rsh: launching on node ibnode002
[gulftown:28340] [[17438,0],0] plm:rsh: recording launch of daemon [[17438,0],2]
[gulftown:28340] [[17438,0],0] plm:rsh: executing: (//usr/bin/rsh) 
[/usr/bin/rsh ibnode002  orted --daemonize -mca 
ess env -mca orte_ess_jobid 1142816768 -mca orte_ess_vpid 2 -mca 
orte_ess_num_procs 4 --hnp-uri 
"1142816768.0;tcp://198.177.146.70:43159;tcp://10.10.10.4:43159;tcp://172.23.10.1:43159;tcp://172.33.10.1:43159"
 -
-mca plm_rsh_agent rsh:ssh --mca btl_openib_warn_default_gid_prefix 0 --mca btl 
openib,sm,self --mca 
orte_tmpdir_base /tmp --mca plm_base_verbose 100]
bash: orted: command not found
[gulftown:28340] [[17438,0],0] plm:rsh: launching on node ibnode003
[gulftown:28340] [[17438,0],0] plm:rsh: executing: (//usr/bin/rsh) 
[/usr/bin/rsh ibnode003  orted --daemonize -mca 
ess env -mca orte_ess_jobid 1142816768 -mca orte_ess_vpid 3 -mca 
orte_ess_num_procs 4 --hnp-uri 
"1142816768.0;tcp://198.177.146.70:43159;tcp://10.10.10.4:43159;tcp://172.23.10.1:43159;tcp://172.33.10.1:43159"
 -
-mca plm_rsh_agent rsh:ssh --mca btl_openib_warn_default_gid_prefix 0 --mca btl 
openib,sm,self --mca 
orte_tmpdir_base /tmp --mca plm_base_verbose 100]
[gulftown:28340] [[17438,0],0] plm:rsh: recording launch of daemon [[17438,0],3]
bash: orted: command not found
[gulftown:28340] [[17438,0],0] plm:base:daemon_callback
<<<

(2) test.bash script:
>>>
#!/bin/sh -f
#nohup
#
# 
>-------------------------------------------------------------------------------------------<
adinahome=/usr/adina/system8.8dmp
mpirunfile=$adinahome/bin/mpirun
#
# Set envars for mpirun and orted
#
export PATH=$adinahome/bin:$adinahome/tools:$PATH
export LD_LIBRARY_PATH=$adinahome/lib:$LD_LIBRARY_PATH
#
#
# run DMP problem
#
mcaprefix="--prefix $adinahome"
mcarshagent="--mca plm_rsh_agent rsh:ssh"
mcatmpdir="--mca orte_tmpdir_base /tmp"
mcaopenibmsg="--mca btl_openib_warn_default_gid_prefix 0"
mcaenvars="-x PATH -x LD_LIBRARY_PATH"
mcabtlconn="--mca btl openib,sm,self"
mcaplmbase="--mca plm_base_verbose 100"

mcaparams="$mcaprefix $mcaenvars $mcarshagent $mcaopenibmsg $mcabtlconn 
$mcatmpdir $mcaplmbase"

$mpirunfile $mcaparams --app addmpw-hostname
<<<

(3) the contend of app file addmpw-hostname:
>>>
-n 1 -host gulftown hostname
-n 1 -host ibnode001 hostname
-n 1 -host ibnode002 hostname
-n 1 -host ibnode003 thostname
<<<

Any comments?

Thanks,
Yiguang

Reply via email to