I see the problem.

It looks like the use of the app context file is triggering different behavior, 
and that behavior is erasing the use of --prefix.  If I replace the app context 
file with a complete command line, it works and the --prefix behavior is 
observed.

Specifically:

$mpirunfile $mcaparams --app addmpw-hostname

^^ This one seems to ignore --prefix behavior.

$mpirunfile $mcaparams --host svbu-mpi,svbu-mpi001 -np 2 hostname
$mpirunfile $mcaparams --host svbu-mpi -np 1 hostname : --host svbu-mpi001 -np 
1 hostname

^^ These two seem to adhere to the proper --prefix behavior.

Ralph -- can you have a look?




On Mar 1, 2012, at 2:59 PM, Yiguang Yan wrote:

> Hi Ralph,
> 
> Thanks, here is what I did as suggested by Jeff:
> 
>> What did this command line look like? Can you provide the configure line as 
>> well? 
> 
> As in my previous post, the script as following:
> 
> (1) debug messages:
>>>> 
> yiguang@gulftown testdmp]$ ./test.bash
> [gulftown:28340] mca: base: components_open: Looking for plm components
> [gulftown:28340] mca: base: components_open: opening plm components
> [gulftown:28340] mca: base: components_open: found loaded component rsh
> [gulftown:28340] mca: base: components_open: component rsh has no register 
> function
> [gulftown:28340] mca: base: components_open: component rsh open function 
> successful
> [gulftown:28340] mca: base: components_open: found loaded component slurm
> [gulftown:28340] mca: base: components_open: component slurm has no register 
> function
> [gulftown:28340] mca: base: components_open: component slurm open function 
> successful
> [gulftown:28340] mca: base: components_open: found loaded component tm
> [gulftown:28340] mca: base: components_open: component tm has no register 
> function
> [gulftown:28340] mca: base: components_open: component tm open function 
> successful
> [gulftown:28340] mca:base:select: Auto-selecting plm components
> [gulftown:28340] mca:base:select:(  plm) Querying component [rsh]
> [gulftown:28340] mca:base:select:(  plm) Query of component [rsh] set 
> priority to 10
> [gulftown:28340] mca:base:select:(  plm) Querying component [slurm]
> [gulftown:28340] mca:base:select:(  plm) Skipping component [slurm]. Query 
> failed to return a module
> [gulftown:28340] mca:base:select:(  plm) Querying component [tm]
> [gulftown:28340] mca:base:select:(  plm) Skipping component [tm]. Query 
> failed to return a module
> [gulftown:28340] mca:base:select:(  plm) Selected component [rsh]
> [gulftown:28340] mca: base: close: component slurm closed
> [gulftown:28340] mca: base: close: unloading component slurm
> [gulftown:28340] mca: base: close: component tm closed
> [gulftown:28340] mca: base: close: unloading component tm
> [gulftown:28340] plm:base:set_hnp_name: initial bias 28340 nodename hash 
> 3546479048
> [gulftown:28340] plm:base:set_hnp_name: final jobfam 17438
> [gulftown:28340] [[17438,0],0] plm:base:receive start comm
> [gulftown:28340] [[17438,0],0] plm:rsh: setting up job [17438,1]
> [gulftown:28340] [[17438,0],0] plm:base:setup_job for job [17438,1]
> [gulftown:28340] [[17438,0],0] plm:rsh: local shell: 0 (bash)
> [gulftown:28340] [[17438,0],0] plm:rsh: assuming same remote shell as local 
> shell
> [gulftown:28340] [[17438,0],0] plm:rsh: remote shell: 0 (bash)
> [gulftown:28340] [[17438,0],0] plm:rsh: final template argv:
>        /usr/bin/rsh <template>  orted --daemonize -mca ess env -mca 
> orte_ess_jobid 1142816768 -mca 
> orte_ess_vpid <template> -mca orte_ess_num_procs 4 --hnp-uri 
> "1142816768.0;tcp://198.177.146.70:43159;tcp://10.10.10.4:43159;tcp://172.23.10.1:43159;tcp://172.33.10.1:43159"
>  -
> -mca plm_rsh_agent rsh:ssh --mca btl_openib_warn_default_gid_prefix 0 --mca 
> btl openib,sm,self --mca 
> orte_tmpdir_base /tmp --mca plm_base_verbose 100
> [gulftown:28340] [[17438,0],0] plm:rsh:launch daemon already exists on node 
> gulftown
> [gulftown:28340] [[17438,0],0] plm:rsh: launching on node ibnode001
> [gulftown:28340] [[17438,0],0] plm:rsh: recording launch of daemon 
> [[17438,0],1]
> [gulftown:28340] [[17438,0],0] plm:rsh: executing: (//usr/bin/rsh) 
> [/usr/bin/rsh ibnode001  orted --daemonize -mca 
> ess env -mca orte_ess_jobid 1142816768 -mca orte_ess_vpid 1 -mca 
> orte_ess_num_procs 4 --hnp-uri 
> "1142816768.0;tcp://198.177.146.70:43159;tcp://10.10.10.4:43159;tcp://172.23.10.1:43159;tcp://172.33.10.1:43159"
>  -
> -mca plm_rsh_agent rsh:ssh --mca btl_openib_warn_default_gid_prefix 0 --mca 
> btl openib,sm,self --mca 
> orte_tmpdir_base /tmp --mca plm_base_verbose 100]
> bash: orted: command not found
> [gulftown:28340] [[17438,0],0] plm:rsh: launching on node ibnode002
> [gulftown:28340] [[17438,0],0] plm:rsh: recording launch of daemon 
> [[17438,0],2]
> [gulftown:28340] [[17438,0],0] plm:rsh: executing: (//usr/bin/rsh) 
> [/usr/bin/rsh ibnode002  orted --daemonize -mca 
> ess env -mca orte_ess_jobid 1142816768 -mca orte_ess_vpid 2 -mca 
> orte_ess_num_procs 4 --hnp-uri 
> "1142816768.0;tcp://198.177.146.70:43159;tcp://10.10.10.4:43159;tcp://172.23.10.1:43159;tcp://172.33.10.1:43159"
>  -
> -mca plm_rsh_agent rsh:ssh --mca btl_openib_warn_default_gid_prefix 0 --mca 
> btl openib,sm,self --mca 
> orte_tmpdir_base /tmp --mca plm_base_verbose 100]
> bash: orted: command not found
> [gulftown:28340] [[17438,0],0] plm:rsh: launching on node ibnode003
> [gulftown:28340] [[17438,0],0] plm:rsh: executing: (//usr/bin/rsh) 
> [/usr/bin/rsh ibnode003  orted --daemonize -mca 
> ess env -mca orte_ess_jobid 1142816768 -mca orte_ess_vpid 3 -mca 
> orte_ess_num_procs 4 --hnp-uri 
> "1142816768.0;tcp://198.177.146.70:43159;tcp://10.10.10.4:43159;tcp://172.23.10.1:43159;tcp://172.33.10.1:43159"
>  -
> -mca plm_rsh_agent rsh:ssh --mca btl_openib_warn_default_gid_prefix 0 --mca 
> btl openib,sm,self --mca 
> orte_tmpdir_base /tmp --mca plm_base_verbose 100]
> [gulftown:28340] [[17438,0],0] plm:rsh: recording launch of daemon 
> [[17438,0],3]
> bash: orted: command not found
> [gulftown:28340] [[17438,0],0] plm:base:daemon_callback
> <<<
> 
> (2) test.bash script:
>>>> 
> #!/bin/sh -f
> #nohup
> #
> # 
> >-------------------------------------------------------------------------------------------<
> adinahome=/usr/adina/system8.8dmp
> mpirunfile=$adinahome/bin/mpirun
> #
> # Set envars for mpirun and orted
> #
> export PATH=$adinahome/bin:$adinahome/tools:$PATH
> export LD_LIBRARY_PATH=$adinahome/lib:$LD_LIBRARY_PATH
> #
> #
> # run DMP problem
> #
> mcaprefix="--prefix $adinahome"
> mcarshagent="--mca plm_rsh_agent rsh:ssh"
> mcatmpdir="--mca orte_tmpdir_base /tmp"
> mcaopenibmsg="--mca btl_openib_warn_default_gid_prefix 0"
> mcaenvars="-x PATH -x LD_LIBRARY_PATH"
> mcabtlconn="--mca btl openib,sm,self"
> mcaplmbase="--mca plm_base_verbose 100"
> 
> mcaparams="$mcaprefix $mcaenvars $mcarshagent $mcaopenibmsg $mcabtlconn 
> $mcatmpdir $mcaplmbase"
> 
> $mpirunfile $mcaparams --app addmpw-hostname
> <<<
> 
> (3) the contend of app file addmpw-hostname:
>>>> 
> -n 1 -host gulftown hostname
> -n 1 -host ibnode001 hostname
> -n 1 -host ibnode002 hostname
> -n 1 -host ibnode003 thostname
> <<<
> 
> Any comments?
> 
> Thanks,
> Yiguang
> 
> _______________________________________________
> users mailing list
> us...@open-mpi.org
> http://www.open-mpi.org/mailman/listinfo.cgi/users


-- 
Jeff Squyres
jsquy...@cisco.com
For corporate legal information go to: 
http://www.cisco.com/web/about/doing_business/legal/cri/


Reply via email to