Re: [OMPI users] Can not run a parallel job on all the nodes in the cluster

2012-03-28 Thread Reuti
Hi, Am 27.03.2012 um 23:46 schrieb Hameed Alzahrani: > When I run any parallel job I get the answer just from the submitting node what do you mean by submitting node: you use a queuing system - which one? -- Reuti > even when I tried to benchmark the cluster using LINPACK but it look like the

Re: [OMPI users] Can not run a parallel job on all the nodes in the cluster

2012-03-28 Thread Hameed Alzahrani
Hi, I mean the node that I run mpirun command from. I use condor as a scheduler but I need to benchmark the cluster either from condor or directly from open MPI. when I ran mpirun from a machine and checking the memory status for the three machines that I have it appear that the memory usage i

Re: [OMPI users] Can not run a parallel job on all the nodes in the cluster

2012-03-28 Thread Reuti
Hi, Am 28.03.2012 um 16:30 schrieb Hameed Alzahrani: > Hi, > > I mean the node that I run mpirun command from. I use condor as a scheduler > but I need to benchmark the cluster either from condor or directly from open > MPI. I can't say anything regarding the Condor integration of Open MPI, b

Re: [OMPI users] Can not run a parallel job on all the nodes in the cluster

2012-03-28 Thread Hameed Alzahrani
Hi, I ran hello program which return the host name when I run it using mpirun -np 8 hello all the 8 answer returned from the same machine when I run it using mpirun -np 8 --host host1,host2,host3 hello I got answers from all the machines but it is not from all processors because I have 8 proce

Re: [OMPI users] Can not run a parallel job on all the nodes in the cluster

2012-03-28 Thread Reuti
Hi, Am 28.03.2012 um 16:55 schrieb Hameed Alzahrani: > I ran hello program which return the host name when I run it using > mpirun -np 8 hello > all the 8 answer returned from the same machine > when I run it using > mpirun -np 8 --host host1,host2,host3 hello > I got answers from all the machi

Re: [OMPI users] Can not run a parallel job on all the nodes in the cluster

2012-03-28 Thread Hameed Alzahrani
Hi, Is there a specific name or location for the hostfile because I could not figure how to specify the number of processors for each machine in the command line. Regards, > From: re...@staff.uni-marburg.de > Date: Wed, 28 Mar 2012 17:21:39 +0200 > To: us...@open-mpi.org > Subject: Re: [OMPI

Re: [OMPI users] Can not run a parallel job on all the nodes in the cluster

2012-03-28 Thread Reuti
Am 28.03.2012 um 17:35 schrieb Hameed Alzahrani: > Hi, > > Is there a specific name or location for the hostfile because I could not > figure how to specify the number of processors for each machine in the > command line. No, just specify the name (or path) to it with: --hostfile foobar -- R

Re: [OMPI users] Can not run a parallel job on all the nodes in the cluster

2012-03-28 Thread Hameed Alzahrani
Hi, Thanks that works fine when I submit hello program but when I tried to benchmark the system it look like it does not do anything mpirun -np 8 --hostfile hosts xhpl Regards, > From: re...@staff.uni-marburg.de > Date: Wed, 28 Mar 2012 17:40:07 +0200 > To: us...@open-mpi.org > Subject: Re:

Re: [OMPI users] Problem with MPI_Barrier (Inter-communicator)

2012-03-28 Thread Rodrigo Oliveira
Hi Edgar, I tested the execution of my code using the option -mca coll ^inter as you suggested and the program worked fine, even when I use 1 server instance. What is the modification caused by this parameter? I did not find an explanation about the utilization of the module coll inter. Thanks a

Re: [OMPI users] Can not run a parallel job on all the nodes in the cluster

2012-03-28 Thread Ralph Castain
What di you have in the "hosts" file? We don't have a native integration with Condor, so you'll have to specify the hosts and number of slots on each, as Reuti explained. You'll also need to check that your sys admin allows you to ssh without password to each host. On Mar 28, 2012, at 10:02 AM

Re: [OMPI users] Problem with MPI_Barrier (Inter-communicator)

2012-03-28 Thread Edgar Gabriel
it just uses a different algorithm which avoids the bcast on a communicator of 1 (which is causing the problem here). Thanks Edgar On 3/28/2012 12:08 PM, Rodrigo Oliveira wrote: > Hi Edgar, > > I tested the execution of my code using the option -mca coll ^inter as > you suggested and the program