Re: [OMPI users] another mpirun + xgrid question

2007-09-10 Thread Neeraj Chourasia
If you are using scheduler like PBS or SGE over MPI, there is an option
called prolog and epilog, where you can give scripts which does copy
operation. This script is called before and after job execution as the
name suggests.

Without it, in mpi itself, i have to see, if it can be done.

The alternative way is to keep copy of the program at the same location
on all compute nodes and launch mpirun.

If the executable location is different on compute nodes, you have to
specify the same as the mpirun command-line arguments.

On Mon, 2007-09-10 at 15:35 -0400, Lev Givon wrote:
> When launching an MPI program with mpirun on an xgrid cluster, is
> there a way to cause the program being run to be temporarily copied to
> the compute nodes in the cluster when executed (i.e., similar to what the
> xgrid command line tool does)? Or is it necessary to make the program
> being run available on every compute node (e.g., using NFS data
> partions)?
> 
>   L.G.
> 
> ___
> users mailing list
> us...@open-mpi.org
> http://www.open-mpi.org/mailman/listinfo.cgi/users


The information contained in this electronic message and any attachments to 
this message are intended for the exclusive use of the addressee(s) and may 
contain proprietary, confidential or privileged information. If you are not the 
intended recipient, you should not disseminate, distribute or copy this e-mail. 
Please notify the sender immediately and destroy all copies of this message and 
any attachments contained in it.

Contact your Administrator for further information.


[OMPI users] libnbc compilation

2007-10-01 Thread Neeraj Chourasia
Hello Everyone,    I was checking the development version from 
svn and found that support for libnbc is going to come in next release. I 
thought of compiling it, but failed to do.Could some one suggest me how to get 
it compiled.When i made changes to configure script(Basically added some 
flags), its giving output saying, libnbc can\'t be compiled.any help would be 
appreciatedregardsNeeraj


[OMPI users] Query regarding GPR

2007-10-09 Thread Neeraj Chourasia
Hi everybody,    I have a doubt regarding ORTE. One of the major 
functionality of orte is to maintain GPR, which subscribes and publishes 
information to the universe. I have a doubt saying, when we submit job from a 
machine, where does GPR gets created? Is it on the submit machine (HNP)?if YES, 
then how does compute node gets the information of the same during execution ? 
Does it use OOB for it ?-Neeraj


[OMPI users] Tuning Openmpi with IB Interconnect

2007-10-11 Thread Neeraj Chourasia
Dear All,    Could anyone tell me the important tuning 
parameters in openmpi with IB interconnect? I tried setting eager_rdma, 
min_rdma_size, mpi_leave_pinned parameters from the mpirun command line on 38 
nodes cluster (38*2 processors) but in vain. I found simple mpirun with no mca 
parameters performing better. I conducted test on P2P send/receive with data 
size of 8MB.    Similarly i patched HPL linpack code with 
libnbc(non blocking collectives) and found no performance benefits. I went 
through its patch and found that, its probably not overlapping computation with 
communication.Any help in this direction would be appreciated.-Neeraj


[OMPI users] Re :Re: Tuning Openmpi with IB Interconnect

2007-10-11 Thread Neeraj Chourasia
Hi,    The code was pretty simple. I was trying to send 8MB data 
from one rank to other in a loop(say 1000 iterations). And then i was taking 
the average of time taken and was calculating the bandwidth.The above logic i 
tried with both mpirun-with-mca-parameters and without any parameters. And to 
my surprise, the performance was degrading when i was trying to manipulate.Now 
I have another question in mind. Is it possible to have IB Hardware Multicast 
implementation in OpenMPI? I have gone through the issues/challenges for the 
same, but also read couple of people who have successfully done it for 
Ethernet/Giga-bit Ethernet and IPoIB ofcourse in experimental stage. Actually i 
want to contribute for it in OpenMPI and need the help for the same.-NeerajOn 
Thu, 11 Oct 2007 12:01:39 +0200 Open MPI Users  wrote  Hi Neeraj,  >
Could anyone tell me the important tuning parameters in openmpi with  >
IB interconnect? I tried setting eager_rdma, min_rdma_size,  >
mpi_leave_pinned parameters from the mpirun command line on 38 nodes  >
cluster (38*2 processors) but in vain. I found simple mpirun with no mca  >  
  parameters performing better. I conducted test on P2P send/receive with  > 
   data size of 8MB.  The performance of the BTL with different parameters 
depends heavily on  the code that you run. E.g., leave_pinned works very well 
with many  microbenchmarks (e.g., bandwidth/overlap-wise) but may not perform 
well  with real applications that use different memory regions. It\'s pretty  
much the same with the other parameters. The default values are  considered 
best for many applications. Can you provide us any details  about the code 
you\'re runnning to test performance? >Similarly i patched HPL 
linpack code with libnbc(non blocking  >collectives) and found no 
performance benefits. I went through its patch  >and found that, its 
probably not overlapping computation with  >communication.  Ah, so there 
are two things. LibNBC provides overlap,
 most overlap is  achieved if memory regions are reused and leave_pinned is 
activated. But  again, this is highly application-dependent. However, the patch 
for the  Linpack code (I guess you refer to the patch from the LibNBC webpage  
[1]) is in experimental stage (as the website says) and is not properly  tested 
for performance benefit. The original HPL provides something like  a broadcast 
start and broadcast end phase. I just replaced them with  non-blocking calls to 
NBC_Ibcast() and did not find the time to do any  performance/code analysis 
yet. Any input by HPL experts is appreciated!Best,Torsten[1]: 
http://www.unixer.de/research/nbcoll/hpl/--bash$ :(){ :|:&};: 
- http://www.unixer.de/ -  \"Software Engineering is 
that part of Computer Science which is too  difficult for the Computer 
Scientist.\" ~ F. L. Bauer  ___  
users mailing list  us...@open-mpi.org  
http://www.open-mpi.org/mailman/listinfo.cgi/users  


[OMPI users] Re :Re: Re :Re: Tuning Openmpi with IB Interconnect

2007-10-12 Thread Neeraj Chourasia
Yes, the buffer was being re-used. No we didnt try to benchmark it with netpipe 
and other stuffs. But the program was pretty simple. Do you think, I need to 
test it with bigger chunks (>8MB) for communication.?We also tried 
manipulating eager_limit and min_rdma_sze, but no success.NeerajOn Fri, 12 Oct 
2007 13:00:10 +0200 Open MPI Users  wrote  Hello,  >The code was 
pretty simple. I was trying to send 8MB data from one  >rank to other in 
a loop(say 1000 iterations). And then i was taking the  >average of time 
taken and was calculating the bandwidth.  >   >The above logic i 
tried with both mpirun-with-mca-parameters and without  >any parameters. 
And to my surprise, the performance was degrading when i  >was trying to 
manipulate.  That sounds strange. So did you re-use the communication buffers? 
Did  you try to run some existing benchmarks like Netpipe [1], IMB or  Netgauge 
[2]?>Now I have another question in mind. Is it possible to have IB 
Hardware  >Multicast implementation in OpenMPI? I have gone through the  
>issues/challenges for the same, but also read couple of people who have 
 >successfully done it for Ethernet/Giga-bit Ethernet and IPoIB ofcourse 
in  >experimental stage. Actually i want to contribute for it in OpenMPI 
and  >need the help for the same.  As far as I know, there are two 
groups/people working on this. Andy  Friedley implements a \"traditional\" ACK 
based approach (like the one  that the OSU folks published about some time ago) 
and I implemented a  new idea for extreme scale (see \"A practically 
constant-time MPI  Broadcast Algorithm for large-scale InfiniBand Clusters with 
 Multicast\" [3]). I know that my version is still unstable and has some  
problems. But I\'m working on this.Best,Torsten[1]: 
http://www.scl.ameslab.gov/netpipe/  [2]: 
http://www.unixer.de/research/netgauge/  [3]: 
https://www.unixer.de/publications/#hoefler-cac07--bash$ :(){ 
:|:&};: - 
http://www.unixer.de/ -  Computer scientists are the historians of 
computing.  -- Gordon Bell  ___  
users mailing list  us...@open-mpi.org  
http://www.open-mpi.org/mailman/listinfo.cgi/users  


[OMPI users] Compile test programs

2007-10-18 Thread Neeraj Chourasia
Hi all,    Could someone suggest me, how to compile programs 
given in test directory of the source code? There are couple of directories 
within test which contains sample programs about the usage of datastructures 
being used by open-MPI. I am able to compile some of the directories at it was 
having Makefile created on running configure script, but few of them like 
runtime doesn\'t have the Makefile.Please help me compiling it.-Neeraj


[OMPI users] OpenMPI 1.2.4 vs 1.2

2007-10-24 Thread Neeraj Chourasia
Hello Guys,    I had openmpi v1.2 installed on my cluster. 
Couple of days back, i thought to upgrade it to v1.2.4(latest release i 
suppose). Since i didnt want to take risk, i first installed it on temporary 
location and did bandwidth and bidirectional bandwidth test provided by the OSU 
guys, and to my surprise, old version performs better in both scenarios.Could 
anyone give me the reason for the same?I repeated the above point to 
point  tests between all set of nodes, but the result were same :(-Neeraj


[OMPI users] Re :Re: Process 0 with different time executing the same code

2007-10-26 Thread Neeraj Chourasia
Hi,    Please ensure if following things are correct1) The array 
bounds are equal. Means \"my_x\" and \"size_y\" has the same value on all 
nodes.2) Nodes are homogenous. To check that, you could decide root to be some 
different node and run the program-NeerajOn Fri, 26 Oct 2007 10:13:15 +0500 
(PKT) Open MPI Users  wrote  Thanks for your reply, I used MPI_Wtime for my 
application but even then process 0 took longer  time executing the mentioned 
code segment. I might be worng, but what I  see is process 0 takes more time to 
access the array elements than other  processes. Now I dont see what to do 
because the mentioned code segment  is creating a bottleneck for the timing of 
my application.Can any one suggest somthing in this regard. I will be very 
thankfulregardsAftab Hussain  On Thu, October 25, 2007 9:38 pm, 
jody wrote:  > HI  > I\'m not sure if that is a problem,  > but in MPI 
applications you shoud use MPI_WTime() for time-measurements  >  > Jody  
>  >  > On 10/25/07, 42af...@niit.edu.pk  wrote:  >  >> Hi 
all,  >> I am a research assistant (RA) at NUST Pakistan in High 
Performance  >> Scientific Computing Lab. I am working on the parallel  
>> implementation of the Finitie Difference Time Domain (FDTD) method 
using  >> MPI. I am using the OpenMPI environment on a cluster of 4  
>> SunFire v890 cluster connected through Myrinet. I am having problem  
>> that when I run my code with let say 4 processes. Process 0 takes 
about 3  >> times more time than other three processes, executing a for 
loop which  >> is the main cause of load imbalance in my code. I am 
writing the code  >> that is causing the problem. The code is run by all 
the processes  >> simultaneously and independently and I have timed it 
independent of  >> segments of code.  >>  >> start = 
gethrtime(); for (m = 1; m < my_x ; m++){ for (n = 1; n > size_y-1; n++) 
{ Ez(m,n) = Ez(m,n) + cezh*((Hy(m,n) - Hy(m-
1,n)) -  >> (Hx(m,n) - Hx(m,n-1)));  >> }  >> }  >> 
stop = gethrtime(); time = (stop-start);  >>  >> In my 
implementation I used 1-D array to realize 2-D arrays.I have used  >>  
the following macros for accesing the array elements.  >>  >> 
#define Hx(I,J) hx[(I)*(size_y) + (J)]  >> #define Hy(I,J) 
hy[(I)*(size_y) + (J)]  >> #define Ez(I,J) ez[(I)*(size_y) + (J)]  
>>  >>  >> Can any one tell me what am I doing wrong here, or 
macros are creating  >> the problems or it can be related to any OS 
issue. I will be looking  >> forward for help because this problem has 
stopped my progress for the  >> last two weeks  >>  >> 
regards aftab hussain  >>  >> RA High Performance Scientific 
Computing Lab  >>  >>  >> NUST Institue of Information 
Technology  >>  >>  >> National University of Sciences and 
Technology Pakistan  >>  >>  >>  >>  >>  >> 
--  >> This message has been scanned for viruses and  >> dangerous 
content by MailScanner, and is believed to be clean.  >>  >> 
___  >> users mailing list 
us...@open-mpi.org  >> http://www.open-mpi.org/mailman/listinfo.cgi/users 
 >>  >>  > ___  > 
users mailing list us...@open-mpi.org  > 
http://www.open-mpi.org/mailman/listinfo.cgi/users  >  >  > --  > 
This message has been scanned for viruses and  > dangerous content by 
MailScanner, and is believed to be clean.  >  >--   This message 
has been scanned for viruses and  dangerous content by MailScanner, and is  
believed to be clean.___  users 
mailing list  us...@open-mpi.org  
http://www.open-mpi.org/mailman/listinfo.cgi/users  


[OMPI users] MPI_Send issues with openib btl

2007-10-26 Thread Neeraj Chourasia
hi,    We are facing some problem when calling MPI_Send over IB. 
The problem looks similar to ticket 
https://svn.open-mpi.org/trac/ompi/ticket/232, but this time its for IB 
Interface. When forcefully running the program using --mca btl tcp,self its 
running fine.    On Ib, its giving error messages like local 
protocol error, flush error, invalid request error, local length error kind of 
messages.Any help would be appreciated.-Neeraj


[OMPI users] OpenMP and OpenMPI Issue

2007-10-30 Thread Neeraj Chourasia
Hi folks,    I have been seeing some 
nasty behaviour in MPI_Send/Recv with large dataset(8 MB), when used with 
OpenMP and Openmpi together with IB Interconnect. Attached is a 
program.       The code first calls MPI_Init_thread() 
followed by openmp thread creation API. The program works fine, if we do single 
side comm unication [Thread 0 of process 0 sending some data to any thread of 
process 1], but it hangs if both side tries to send some data (8 MB) using IB 
Interconnect    Interesting to note that 
program works fine, if we send short data(1 MB or 
below).    I see this 
with    openmpi-1.2 or openmpi-1.2.4 
(compiled with --enable-mpi-threads)    
ofed 1.2    
2.6.9-42.4sp.XCsmp    icc (Intel 
Compiler)    compiled 
as   
 mpicc -O3 -openmp temp.c    run 
as   
 mpirun -np 2 -hostfile nodelist 
a.out    The error i am getting 
is    
--   
 [0,1,1][btl_openib_component.c:1199:btl_openib_component_progress] from n129 
to: n115 error polling LP CQ with status LOCAL PROTOCOL ERROR status number 4 
for wr_id 6391728 opcode 
0[0,1,1][btl_openib_component.c:1199:btl_openib_component_progress] from n129 
to: n115 error polling LP CQ with status WORK REQUEST FLUSHED ERROR status 
number 5 for wr_id 7058304 opcode 128[0,1,0][
btl_openib_component.c:1199:btl_openib_component_progress] from n115 to: n129 
[0,1,0][btl_openib_component.c:1199:btl_openib_component_progress] from n115 
to: n129 error polling LP CQ with status WORK REQUEST FLUSHED ERROR status 
number 5 for wr_id 6854256 opcode 128error polling LP CQ with status LOCAL 
LENGTH ERROR status number 1 for wr_id 6920112 opcode 
0    
---   
 Anyone else seeing similar?  Any ideas for 
workarounds?    As a point of reference, 
program works fine, if we force openmpi to select TCP interconnect using --mca 
btl tcp,self.-Neeraj
#include
#include
#include
#include
#include 
#include "time.h"
#include 

#define MAX 100


int main(int argc, char *argv[])
{

  	int		required = MPI_THREAD_MULTIPLE;
  	int		provided;
  	int		rank;
  	int		size;
  	int		id;
  	int		flag;
  	MPI_Status	status;
  	double	*buff1, *buff2;


  	MPI_Init_thread(&argc, &argv, required, &provided);
  	MPI_Comm_rank(MPI_COMM_WORLD, &rank);
  	MPI_Comm_size(MPI_COMM_WORLD, &size);

  	buff1 = (double *)malloc(sizeof(double)*MAX);
  	buff2 = (double *)malloc(sizeof(double)*MAX);

  	omp_set_num_threads(2);

  	#pragma omp parallel private(id)
  	{
  		id = omp_get_thread_num();
  		if(rank == 0)
  		{
			if(id == 0)
MPI_Send(buff1, MAX ,MPI_DOUBLE, 1, rank, MPI_COMM_WORLD);
			else
MPI_Recv(buff2, MAX, MPI_DOUBLE, 1, 1234, MPI_COMM_WORLD, &status);
  		}
  		if(rank == 1)
  		{
			if(id == 0)
		 		MPI_Recv(buff1, MAX, MPI_DOUBLE, 0, 0, MPI_COMM_WORLD, &status);
			else
MPI_Send(buff2, MAX ,MPI_DOUBLE, 0, 1234, MPI_COMM_WORLD);
  		}
	}
	printf("rank = %d %d \n", rank, provided);
  	free(buff1);
  	free(buff2);
  	MPI_Barrier(MPI_COMM_WORLD);
  	MPI_Finalize();
}


[OMPI users] Re :Re: OpenMP and OpenMPI Issue

2007-11-01 Thread Neeraj Chourasia
Thanks for your reply,    but the program is running on TCP 
interconnect with same datasize and also on IB with small datasize say 1MB. So 
i dont think problem is in OpenMPI, it has to do something with IB logic, which 
probably doesnt work well with threads.I also tried the program with 
MPI_THREAD_SERIALIZED, but in vain. When is the version 1.3 scheduled to 
be released? Would it fix such issues?Correct me, if i am wrong-NeerajOn Wed, 
31 Oct 2007 05:31:32 -0700 Open MPI Users  wrote  THREAD_MULTIPLE support does 
not work in the 1.2 series.  Try turningit off.  On Oct 30, 2007, at 
12:17 AM, Neeraj Chourasia wrote:> Hi folks,  >  > I have 
been seeing some nasty behaviour in MPI_Send/Recv> with large dataset(8 
MB), when used with OpenMP and Openmpi> together with IB Interconnect. 
Attached is a program.  >  >The code first calls 
MPI_Init_thread() followed by openmp> thread creation API. The program 
works fine, if we do single side  > comm unication [Thread 0 of process 0 
sending some data to any> thread of process 1], but it hangs if both 
side tries to send some> data (8 MB) using IB Interconnect  >  >   
  Interesting to note that program works fine, if we send> short 
data(1 MB or below).  >  > I see this with  >  > 
openmpi-1.2 or openmpi-1.2.4 (compiled with --enable-mpi-   > threads)  > 
ofed 1.2  > 2.6.9-42.4sp.XCsmp  > icc (Intel 
Compiler)  >  > compiled as  > mpicc -O3 
-openmp temp.c  > run as  > mpirun -np 2 
-hostfile nodelist a.out  >  > The error i am getting is  >
> 
--   > 
--   > 
--  >  > [0,1,1][btl_openib_component.c:   
> 1199:btl_openib_component_progress] fr
om n129 to: n115 error> polling LP CQ with status LOCAL PROTOCOL ERROR 
status number 4 for> wr_id 6391728 opcode 0  > 
[0,1,1][btl_openib_component.c:1199:btl_openib_component_progress]> from 
n129 to: n115 error polling LP CQ with status WORK REQUEST> FLUSHED 
ERROR status number 5 for wr_id 7058304 opcode 128  > 
[0,1,0][btl_openib_component.c:1199:btl_openib_component_progress]> from 
n115 to: n129 [0,1,0][btl_openib_component.c:   > 
1199:btl_openib_component_progress] from n115 to: n129 error> polling LP 
CQ with status WORK REQUEST FLUSHED ERROR status number> 5 for wr_id 
6854256 opcode 128  > error polling LP CQ with status LOCAL LENGTH ERROR 
status number 1> for wr_id 6920112 opcode 0  >  >> 
--   > 
--   > 
---  >  >  > Anyone else seeing similar?  Any 
ideas for workarounds?  > As a point of reference, program works 
fine, if we force> openmpi to select TCP interconnect using --mca btl 
tcp,self.  >  > -Neeraj  >  >   > 
___  > users mailing list  > 
us...@open-mpi.org  > http://www.open-mpi.org/mailman/listinfo.cgi/users 
 --   Jeff Squyres  Cisco Systems
___  users mailing list  
us...@open-mpi.org  http://www.open-mpi.org/mailman/listinfo.cgi/users  


[OMPI users] Adding new API

2007-11-05 Thread Neeraj Chourasia
Hello Everyone,    I just want to add extra API to be used by 
application guys. This API can be called from C application and has to be 
compiled and linked by MPICC. But i am getting undefined references, even 
though i am exporting it in the source code. Could some one tell me the steps, 
i should be considerate about?-Neeraj


[OMPI users] version 1.3

2007-11-28 Thread Neeraj Chourasia
Hello Guys,

   When is the version 1.3 scheduled to be released? As it would contain 
checkpointing, library for non-blocking communication, ConnectX for QP's, it 
would be great to have it ASAP. Since i am evaluating MVAPICH against OpenMPI, 
i found that MVAPICH still has upper hand in terms of checkpointing. But i am 
pretty sure, once v1.3 will come, it will help a lot to HPC community.

I can find the development trunk version, but i am more interested in 
production release version.

-Neeraj
  


Re: [OMPI users] OpenIB problems

2007-11-29 Thread Neeraj Chourasia
Hi Guys,

   The alternative to THREAD_MULTIPLE problem is to use --mca   
mpi_leave_pinned 1 to mpirun option. This will ensure 1 RDMA operation contrary 
to splitting data in MAX RDMA size (default to 1MB).

If your data size is small say below 1 MB, program will run well with 
THREAD_MULTIPLE. Problem comes when data size increases and OpenMPI starts 
splitting it.

I think even with Bigger sizes, Program works if interconnect is TCP, but fails 
to work on IB. So on IB, you can run your program if you set mca paramter 
mpi_leave_pinned to 1.

Cheers
Neeraj



On Thu, 29 Nov 2007 Brock Palen wrote :
>Jeff thanks for all the reply's,
>
>Hate to admit but at the moment we can't log onto the switch.
>
>But the ibcheckerrors command returns nothing out of bounds, and i
>think that command also checks the switch ports.
>
>Thanks, we will do some tests
>
>Brock Palen
>Center for Advanced Computing
>bro...@umich.edu
>(734)936-1985
>
>
>On Nov 27, 2007, at 4:50 PM, Jeff Squyres wrote:
>
> > Sorry for jumping in late; the holiday and other travel prevented me
> > from getting to all my mail recently...  :-\
> >
> > Have you checked the counters on the subnet manager to see if any
> > other errors are occurring?  It might be good to clear all the
> > counters, run the job, and see if the counters are increasing faster
> > than they should (i.e., any particular counter should advance very
> > very slowly -- perhaps 1 per day or so).
> >
> > I'll ask around the kernel-level guys (i.e., Roland) to see what else
> > could cause this kind of error.
> >
> >
> >
> > On Nov 27, 2007, at 3:35 PM, Brock Palen wrote:
> >
> >> Ok i will open a case with cisco,
> >>
> >>
> >> Brock Palen
> >> Center for Advanced Computing
> >> bro...@umich.edu
> >> (734)936-1985
> >>
> >>
> >> On Nov 27, 2007, at 4:19 PM, Andrew Friedley wrote:
> >>
> >>>
> >>>
> >>> Brock Palen wrote:
> >> What would be a place to look?  Should this just be default then
> >> for
> >> OMPI?  ompi_info shows the default as 10 seconds?  Is that right
> >> 'seconds' ?
> > The other IB guys can probably answer better than I can -- I'm
> > not an
> > expert in this part of IB (or really any part I guess :).  Not
> > sure
> > why
> > a larger value isn't the default.  No, its not seconds -- check
> > the
> > description of the MCA parameter:
> >
> > 4.096 microseconds * (2^btl_openib_ib_timeout)
> 
>  You sure?
>  ompi_info --param btl openib
> 
>  MCA btl: parameter "btl_openib_ib_timeout" (current value: "10")
>    InfiniBand transmit timeout, in seconds
>  (must be >= 1)
> >>>
> >>> Yeah:
> >>>
> >>> MCA btl: parameter "btl_openib_ib_timeout" (current value: "10")
> >>>  InfiniBand transmit timeout, plugged into formula:
> >>>  4.096 microseconds * (2^btl_openib_ib_timeout)(must be
>  = 0 and <= 31)
> >>>
> >>> Reading earlier in the thread you said OMPI v1.2.0, I got this
> >>> from a
> >>> trunk checkout thats around 3 weeks old.  A quick check shows this
> >>> description was changed between 1.2.0 and 1.2.1.  However the use of
> >>> this parameter hasn't changed -- it's simply passed along to IB
> >>> verbs
> >>> when creating a queue pair (aka a connection).
> >>>
> >>> Andrew
> >>> ___
> >>> users mailing list
> >>> us...@open-mpi.org
> >>> http://www.open-mpi.org/mailman/listinfo.cgi/users
> >>>
> >>>
> >>
> >> ___
> >> users mailing list
> >> us...@open-mpi.org
> >> http://www.open-mpi.org/mailman/listinfo.cgi/users
> >
> >
> > --
> > Jeff Squyres
> > Cisco Systems
> >
> > ___
> > users mailing list
> > us...@open-mpi.org
> > http://www.open-mpi.org/mailman/listinfo.cgi/users
> >
> >
>
>___
>users mailing list
>us...@open-mpi.org
>http://www.open-mpi.org/mailman/listinfo.cgi/users


[OMPI users] what is MPI_IN_PLACE

2007-12-11 Thread Neeraj Chourasia
Hello everyone,    While going through collective algorithms, I 
came across preprocessor directive MPI_IN_PLACE which is (void *)1. Its always 
being compared against source buffer(sbuf). My question is when MPI_IN_PLACE == 
sbuf condition would be true. As far as i understand, sbuf is the address of 
source buffer, which every node has to transfer to remaining nodes based on 
recursive doubling or say bruck algo. And it can never be equal to (void *)1. 
Any help is appreciated.RegardsNeeraj


[OMPI users] Re :Re: what is MPI_IN_PLACE

2007-12-11 Thread Neeraj Chourasia
Thanks  George, But what is the need for user to 
specify it. The api can check the address of  input buffers and output 
buffers. Is there some extra advantage of MPI_IN_PLACE over automatically 
detecting it using pointers?-NeerajOn Tue, 11 Dec 2007 06:10:06 -0500 Open MPI 
Users  wrote  Neeraj,MPI_IN_PLACE is defined by the MPI standard in order 
to allow theusers to specify that the input and output buffers for the 
collectivesare the same. Moreover, not all collectives support MPI_IN_PLACE 
andfor those that support it some strict rules apply. Please read the
collective section in the MPI standard to see all the restrictions.   
Thanks,   george.On Dec 11, 2007, at 5:56 AM, Neeraj Chourasia wrote:   
 > Hello everyone,  >  > While going through collective 
algorithms, I came across> preprocessor directive MPI_IN_PLACE which is 
(void *)1. Its always> being compared against source buffer(sbuf). My 
question is when> MPI_IN_PLACE == sbuf condition would be true. As far 
as i> understand, sbuf is the address of source buffer, which every node 
   > has to transfer to remaining nodes based on recursive doubling or
> say bruck algo. And it can never be equal to (void *)1. Any help is
> appreciated.  >  > Regards  > Neeraj  >  > 
___  > users mailing list  > 
us...@open-mpi.org  > http://www.open-mpi.org/mailman/listinfo.cgi/users 
 


[OMPI users] orte in persistent mode

2007-12-31 Thread Neeraj Chourasia
Dear All,    I am wondering if ORTE can be run in persistent 
mode. It has already been raised in Mailing list ( 
http://www.open-mpi.org/community/lists/users/2006/03/0939.php),  where it 
was said that the problem is still there. I just want to know, if its fixed or 
being fixed ?   Reason, why i am looking at is in large clusters, 
mpirun takes lot of time starting orted (by ssh) on remote nodes. If orte is 
already running, hopefully we can save considerable time. Any comments is 
appreciated. -Neeraj


[OMPI users] Openmpi with SGE

2008-02-20 Thread Neeraj Chourasia
Hello everyone,    I am facing problem while calling mpirun in a 
loop when using with SGE. My sge version is SGE6.1AR_snapshot3. The script i am 
submitting via sge is 
xlet
 i=0while [ $i -lt 100 ]do    echo 
""   
 echo "Iteration :$i"    
/usr/local/openmpi-1.2.4/bin/mpirun -np $NP -hostfile $TMP/machines 
send    let 
"i+=1"    echo 
""doneNow
 above script runs well for 15-20 iteration and then fails with following 
message-Error 
Message---error:
 executing task of job 3869 failed: execution daemon on host "n101" didn't 
accept task[n199:11989] ERROR: A daemon on node n101 failed to start as 
expected.[n199:11989] ERROR: There may be more information available 
from[n199:11989] ERROR: the 'qstat -t' command on the Grid Engine 
tasks.[n199:11989] ERROR: If the problem persists, please restart 
the[n199:11989] ERROR: Grid Engine PE job[n199:11989] ERROR: The daemon exited 
unexpectedly with status 
1.---When
 i do ssh to n101, there is no orted and qrsh_starter running. While checking 
its spool file, i came across following 
message---Execd spool Error 
Message-|execd|n101|E|no free queue for job 
3869 of user neeraj@n199 (localhost = 
n101)---
What could be the reason for it.While checking 
the mailing list, i come across following link    
    
http://www.open-mpi.org/community/lists/users/2007/03/2771.phpbut, i dont think 
its the same problem. Any help is appreciated.RegardsNeeraj


[OMPI users] RDMA-CM

2008-06-17 Thread Neeraj Chourasia
Hello everyone,    I downloaded openmpi-1.3 version from night 
tarballs to check RDMA-CM support. I am able to compile and install it, but 
dont know how to run it as there is no documentation provided. Did someone try 
running it with OpenMPI?My another question is Does OpenMPI1.3 has 
progress-threads support for IB? Because while compiling with that option, it 
didnt give me any warnings or failure unlike openmpi1.2.X series 
does.RegardsNeeraj


[OMPI users] Re :Re: Linpack Benchmark and File Descriptor Limits

2008-09-19 Thread Neeraj Chourasia
Hello,    With openmpi-1.3,  new mca feature is introduced 
namely --mca routed binomial. This ensures out of band communication to happen 
in binomial fashion and reduces the net socket opening and hence solves file 
open issues.-NeerajOn Thu, 18 Sep 2008 16:46:23 -0700 Open MPI Users  wrote  
I'm just running it using mpirun from the command line. Thanks for the reply.   
 On Thu, Sep 18, 2008 at 4:35 PM, John Hearns  wrote:2008/9/18 
Alex Wolfe   Hello,I am trying to run the HPL benchmarking software on 
a new 1024 core cluster that we have set up. Unfortunately I'm hitting the 
"mca_oob_tcp_accept: accept() failed: Too many open files (24)" error known in 
verson 1.2 of openmpi. No matter what I set the file-descriptor limit for my 
account to, I am still limited to only 808 or so processes. Does anyone have 
any suggestions?  Are you running the Linpack via a batch system or 
just using mpirun from the command line?If via a batch system, looks for 
FAQs on how to set the resource limits for that batch system.  
___users mailing list
us...@open-mpi.orghttp://www.open-mpi.org/mailman/listinfo.cgi/users