Re: [OMPI users] A problem with 'mpiexec -launch-agent'

2010-06-15 Thread Jeff Squyres
On Jun 14, 2010, at 5:24 PM, Terry Frankcombe wrote:

> Speaking as no more than an uneducated user, having the behaviour change
> depending on invoking by an absolute path or invoking by some
> unspecified (potentially shell-dependent) path magic seems like a bad
> idea.

FWIW, this specific feature was copied (at the request of multiple users) from 
another MPI implementation.  

-- 
Jeff Squyres
jsquy...@cisco.com
For corporate legal information go to:
http://www.cisco.com/web/about/doing_business/legal/cri/




[OMPI users] mpirun jobs only one single node

2010-06-15 Thread Govind Songara
Hi,

I have using openmpi build with tm support
When i run the job requesting for two nodes it run only on single node.
Here is my script.
>cat mpipbs-script.sh
#PBS -N mpipbs-script
#PBS -q short
### Number of nodes: resources per node
### (4 cores/node, so ppn=4 is ALL resources on the node)
#PBS -l nodes=2:ppn=4
/opt/openmpi-1.4.2/bin/mpirun /scratch0/gsongara/mpitest/hello


torque config
set queue short resources_max.nodes = 4
set queue short resources_default.nodes = 1
set server resources_default.neednodes = 1
set server resources_default.nodect = 1
set server resources_default.nodes = 1

Can someone please advise if i missing anything here.

Regards
Govind


Re: [OMPI users] mpirun jobs only one single node

2010-06-15 Thread Ralph Castain
Look at the contents of $PBS_NODEFILE and see how many nodes it contains.

On Jun 15, 2010, at 3:56 AM, Govind Songara wrote:

> Hi,
> 
> I have using openmpi build with tm support
> When i run the job requesting for two nodes it run only on single node.
> Here is my script.
> >cat mpipbs-script.sh
> #PBS -N mpipbs-script
> #PBS -q short
> ### Number of nodes: resources per node
> ### (4 cores/node, so ppn=4 is ALL resources on the node)
> #PBS -l nodes=2:ppn=4
> /opt/openmpi-1.4.2/bin/mpirun /scratch0/gsongara/mpitest/hello
> 
> 
> torque config
> set queue short resources_max.nodes = 4
> set queue short resources_default.nodes = 1
> set server resources_default.neednodes = 1
> set server resources_default.nodect = 1
> set server resources_default.nodes = 1
> 
> Can someone please advise if i missing anything here.
> 
> Regards
> Govind
> ___
> users mailing list
> us...@open-mpi.org
> http://www.open-mpi.org/mailman/listinfo.cgi/users



Re: [OMPI users] A problem with 'mpiexec -launch-agent'

2010-06-15 Thread Jeff Squyres
On Jun 14, 2010, at 3:13 PM, Reuti wrote:

> > bash: -c: line 0: syntax error near unexpected token `('
> > bash: -c: line 0: ` PATH=/OMPI_dir/bin:$PATH ; export PATH ; 
> > LD_LIBRARY_PATH=/OMPI_dir/lib:$LD_LIBRARY_PATH ; export 
> > LD_LIBRARY_PATH ; /some_path/myscript /OMPI_dir/bin/(null) --
> > daemonize -mca ess env -mca orte_ess_jobid 1978662912 -mca 
> > orte_ess_vpid 1 -mca orte_ess_num_procs 2 --hnp-uri 
> > "1978662912.0;tcp://180.0.14.12:54844;tcp://190.0.14.12:54844"'

The problem is that "(null)" in the middle.  We'll have to dig into how that 
got there...  Reuti's probably right that something is somehow NULL in there, 
and glibc is snprintf'ing (null) instead of SEGV'ing.

Ralph and I are talking about this issue, but we're hindered by the fact that 
I'm at the MPI Forum this week (i.e., meetings are taking up all my days).  I 
haven't had a chance to look at the code in depth yet.

-- 
Jeff Squyres
jsquy...@cisco.com
For corporate legal information go to:
http://www.cisco.com/web/about/doing_business/legal/cri/




Re: [OMPI users] A problem with 'mpiexec -launch-agent'

2010-06-15 Thread Reuti
Am 15.06.2010 um 14:52 schrieb Jeff Squyres:

> On Jun 14, 2010, at 3:13 PM, Reuti wrote:
> 
>>> bash: -c: line 0: syntax error near unexpected token `('
>>> bash: -c: line 0: ` PATH=/OMPI_dir/bin:$PATH ; export PATH ; 
>>> LD_LIBRARY_PATH=/OMPI_dir/lib:$LD_LIBRARY_PATH ; export 
>>> LD_LIBRARY_PATH ; /some_path/myscript /OMPI_dir/bin/(null) --
>>> daemonize -mca ess env -mca orte_ess_jobid 1978662912 -mca 
>>> orte_ess_vpid 1 -mca orte_ess_num_procs 2 --hnp-uri 
>>> "1978662912.0;tcp://180.0.14.12:54844;tcp://190.0.14.12:54844"'
> 
> The problem is that "(null)" in the middle.  We'll have to dig into how that 
> got there...  Reuti's probably right that something is somehow NULL in there, 
> and glibc is snprintf'ing (null) instead of SEGV'ing.

I think the problem is not only the (null) itself, but also the output 
"prefix_dir" and "bin_base" (unless the launch-agent would have 
ignore/interpret $1 $2 in a proper way). The (null) is then the content of 
"orted_cmd".

-- Reuti


> 
> Ralph and I are talking about this issue, but we're hindered by the fact that 
> I'm at the MPI Forum this week (i.e., meetings are taking up all my days).  I 
> haven't had a chance to look at the code in depth yet.
> 
> -- 
> Jeff Squyres
> jsquy...@cisco.com
> For corporate legal information go to:
> http://www.cisco.com/web/about/doing_business/legal/cri/
> 
> 
> ___
> users mailing list
> us...@open-mpi.org
> http://www.open-mpi.org/mailman/listinfo.cgi/users




Re: [OMPI users] mpirun jobs only one single node

2010-06-15 Thread Govind Songara
I added the $PBS_NODEFILE in the script in my last email below.
It   show only one node here is the output
===
node47.beowulf.cluster node47.beowulf.cluster node47.beowulf.cluster
node47.beowulf.cluster
This job has allocated 4 nodes
Hello World! from process 1 out of 4 on node47.beowulf.cluster
Hello World! from process 2 out of 4 on node47.beowulf.cluster
Hello World! from process 3 out of 4 on node47.beowulf.cluster
Hello World! from process 0 out of 4 on node47.beowulf.cluster
===

On 15 June 2010 13:41, Ralph Castain  wrote:

> Look at the contents of $PBS_NODEFILE and see how many nodes it contains.
>
> On Jun 15, 2010, at 3:56 AM, Govind Songara wrote:
>
> Hi,
>
> I have using openmpi build with tm support
> When i run the job requesting for two nodes it run only on single node.
> Here is my script.
> >cat mpipbs-script.sh
> #PBS -N mpipbs-script
> #PBS -q short
> ### Number of nodes: resources per node
> ### (4 cores/node, so ppn=4 is ALL resources on the node)
> #PBS -l nodes=2:ppn=4
>
> echo `cat $PBS_NODEFILE`
> NPROCS=`wc -l < $PBS_NODEFILE`
> echo This job has allocated $NPROCS nodes
>
>   /opt/openmpi-1.4.2/bin/mpirun /scratch0/gsongara/mpitest/hello


torque config
set queue short resources_max.nodes = 4
set queue short resources_default.nodes = 1
set server resources_default.neednodes = 1
set server resources_default.nodect = 1
set server resources_default.nodes = 1

Can someone please advise if i missing anything here.

Regards
Govind

>  ___
> users mailing list
> us...@open-mpi.org
> http://www.open-mpi.org/mailman/listinfo.cgi/users
>
>
>
> ___
> users mailing list
> us...@open-mpi.org
> http://www.open-mpi.org/mailman/listinfo.cgi/users
>


Re: [OMPI users] mpirun jobs only one single node

2010-06-15 Thread Ralph Castain
That's what I suspected. I suggest you talk to your sys admin about how PBS is 
configured - looks like you are only getting one node allocated despite your 
request for two. Probably something in the config needs adjusting.

On Jun 15, 2010, at 7:20 AM, Govind Songara wrote:

> I added the $PBS_NODEFILE in the script in my last email below.
> It   show only one node here is the output 
> ===
> node47.beowulf.cluster node47.beowulf.cluster node47.beowulf.cluster 
> node47.beowulf.cluster
> This job has allocated 4 nodes
> Hello World! from process 1 out of 4 on node47.beowulf.cluster
> Hello World! from process 2 out of 4 on node47.beowulf.cluster
> Hello World! from process 3 out of 4 on node47.beowulf.cluster
> Hello World! from process 0 out of 4 on node47.beowulf.cluster
> ===
> 
> On 15 June 2010 13:41, Ralph Castain  wrote:
> Look at the contents of $PBS_NODEFILE and see how many nodes it contains.
> 
> On Jun 15, 2010, at 3:56 AM, Govind Songara wrote:
> 
>> Hi,
>> 
>> I have using openmpi build with tm support
>> When i run the job requesting for two nodes it run only on single node.
>> Here is my script.
>> >cat mpipbs-script.sh
>> #PBS -N mpipbs-script
>> #PBS -q short
>> ### Number of nodes: resources per node
>> ### (4 cores/node, so ppn=4 is ALL resources on the node)
>> #PBS -l nodes=2:ppn=4
> 
>> echo `cat $PBS_NODEFILE`
>> NPROCS=`wc -l < $PBS_NODEFILE`
>> echo This job has allocated $NPROCS nodes
> 
>   /opt/openmpi-1.4.2/bin/mpirun /scratch0/gsongara/mpitest/hello
> 
> 
> torque config
> set queue short resources_max.nodes = 4
> set queue short resources_default.nodes = 1
> set server resources_default.neednodes = 1
> set server resources_default.nodect = 1
> set server resources_default.nodes = 1
> 
> Can someone please advise if i missing anything here.
> 
> Regards
> Govind
>> ___
>> users mailing list
>> us...@open-mpi.org
>> http://www.open-mpi.org/mailman/listinfo.cgi/users
> 
> 
> ___
> users mailing list
> us...@open-mpi.org
> http://www.open-mpi.org/mailman/listinfo.cgi/users
> 
> ___
> users mailing list
> us...@open-mpi.org
> http://www.open-mpi.org/mailman/listinfo.cgi/users