[OMPI users] MPI hangs on multiple nodes

2011-09-19 Thread Ole Nielsen
Hi all - and sorry for the multiple postings, but I have more information. 1: After a reboot of two nodes I ran again, and the inter-node freeze didn't happen until the third iteration. I take that to mean that the basic communication works, but that something is saturating. Is there some notion o

[OMPI users] MPI hangs on multiple nodes

2011-09-19 Thread Ole Nielsen
Thanks for your suggestion Gus, we need a way of debugging what is going on. I am pretty sure the problem lies with our cluster configuration. I know MPI simply relies on the underlying network. However, we can ping and ssh to all nodes (and in between and pair as well) so it is currently a mystery

[OMPI users] How could OpenMPI (or MVAPICH) affect floating-point results?

2011-09-19 Thread Blosch, Edwin L
I am observing differences in floating-point results from an application program that appear to be related to whether I link with OpenMPI 1.4.3 or MVAPICH 1.2.0. Both packages were built with the same installation of Intel 11.1, as well as the application program; identical flags passed to the

Re: [OMPI users] RE : MPI hangs on multiple nodes

2011-09-19 Thread Gus Correa
Hi Ole, Eugene For what it is worth, I tried Ole's program here, as Devendra Rai had done before. I ran it across two nodes, with a total of 16 processes. I tried mca parameters for openib Infiniband, then for tcp on Gigabit Ethernet. Both work. I am using OpenMPI 1.4.3 compiled with GCC 4.1.2 on

[OMPI users] Typo in MPI_Cart_coords man page

2011-09-19 Thread Jeremiah Willcock
The bottom of the MPI_Cart_coords man page (in SVN trunk as well as some releases) states: The inverse mapping, rank-to-coordinates translation is provided by MPI_Cart_coords. Although that is true, we are already in the man page for MPI_Cart_coords and so the reverse is the mapping from coo

Re: [OMPI users] RE : MPI hangs on multiple nodes

2011-09-19 Thread Gus Correa
Hi Eugene You're right, it is blocking send, buffers can be reused after MPI_Send returns. My bad, I only read your answer to Sebastien and Ole after I posted mine. Could MPI run out of [internal] buffers to hold the messages, perhaps? The messages aren't that big anyway [5000 doubles]. Could

Re: [OMPI users] Open MPI and Objective C

2011-09-19 Thread Beatty, Daniel D CIV NAVAIR, 474300D
Greetings Scott, The NSLog call should be no big deal since it is provided by the Cocoa frameworks and is inherent to Objective C. The mpicc compiler may be a different thing. It may not recognize that it supposed to be calling gcc to compile in Objective-C mode and not in pure ISO C-99.

Re: [OMPI users] custom sparse collective non-reproducible deadlock, MPI_Sendrecv, MPI_Isend/MPI_Irecv or MPI_Send/MPI_Recv question

2011-09-19 Thread Eugene Loh
On 9/18/2011 9:12 AM, Evghenii Gaburov wrote: Hi All, Update to the original posting: METHOD4 also resulted in a deadlock on system HPC2 after 5h of run with 32 MPI tasks; also, "const int scale=1;" was missing in the code snippet posted above. --Evghenii Message: 2 Date: Sun, 18 Sep 2011

Re: [OMPI users] RE : MPI hangs on multiple nodes

2011-09-19 Thread Gus Correa
Hi Ole You could try the examples/connectivity.c program in the OpenMPI source tree, to test if everything is alright. It also hints how to solve the buffer re-use issue that Sebastien [rightfully] pointed out [i.e., declare separate buffers for MPI_Send and MPI_Recv]. Gus Correa Sébastien Bois

Re: [OMPI users] RE : MPI hangs on multiple nodes

2011-09-19 Thread Eugene Loh
Should be fine. Once MPI_Send returns, it should be safe to reuse the buffer. In fact, the return of the call is the only way you have of checking that the message has left the user's send buffer. The case you're worried about is probably MPI_Isend, where you have to check completion with an

[OMPI users] RE : MPI hangs on multiple nodes

2011-09-19 Thread Sébastien Boisvert
Hello, Is it safe to re-use the same buffer (variable A) for MPI_Send and MPI_Recv given that MPI_Send may be eager depending on the MCA parameters ? > > > Sébastien > > De : users-boun...@open-mpi.org [users-boun...@open-mpi.org] de la part de > Ole

Re: [OMPI users] RE : Problems with MPI_Init_Thread(...)

2011-09-19 Thread Jeff Squyres
On Sep 19, 2011, at 8:37 AM, Sébastien Boisvert wrote: > You need to call MPI_Init before calling MPI_Init_thread. This is incorrect -- MPI_INIT_THREAD does the same job as MPI_INIT, but it allows you to request a specific thread level. > According to http://cw.squyres.com/columns/2004-02-CW-MP

Re: [OMPI users] Open MPI and Objective C

2011-09-19 Thread Jeff Squyres
+1 You'll probably have to run "mpicc --showme" to see all the flags that OMPI is passing to the underlying compiler, and use those (or equivalents) to the ObjC compiler. On Sep 19, 2011, at 8:34 AM, Ralph Castain wrote: > Nothing to do with us - you call a function "NSLog" that Objective C d

[OMPI users] RE : Problems with MPI_Init_Thread(...)

2011-09-19 Thread Sébastien Boisvert
Hello, You need to call MPI_Init before calling MPI_Init_thread. According to http://cw.squyres.com/columns/2004-02-CW-MPI-Mechanic.pdf (Past MPI Mechanic Columns written by Jeff Squyres) only 3 functions that can be called before calling MPI_Init and they are: - MPI_Initialized - MPI_Finalized

Re: [OMPI users] Open MPI and Objective C

2011-09-19 Thread Ralph Castain
Nothing to do with us - you call a function "NSLog" that Objective C doesn't recognize. That isn't an MPI function. On Sep 18, 2011, at 8:20 PM, Scott Wilcox wrote: > I have been asked to convert some C++ code using Open MPI to Objective C and > I am having problems getting a simple Obj C progr

Re: [OMPI users] OpenIB error messages: reporting the default or telling you what's happening?

2011-09-19 Thread Yevgeny Kliteynik
On 14-Sep-11 12:59 PM, Jeff Squyres wrote: > On Sep 13, 2011, at 6:33 PM, kevin.buck...@ecs.vuw.ac.nz wrote: > >> there have been two runs of jobs that invoked the mpirun using these >> OpenMPI parameter setting flags (basically, these mimic what I have >> in the global config file) >> >> -mca btl

Re: [OMPI users] MPI hangs on multiple nodes

2011-09-19 Thread devendra rai
Hello Ole I ran your program on open-mpi-1.4.2  five times, and all five times, it finished successfully. So, I think the problem was with the version of mpi. Output from your program is attached. I ran on 3 nodes: $home/OpenMPI-1.4.2/bin/mpirun -np 3 -v --output-filename mpi_testfile ./mpi_t

[OMPI users] MPI hangs on multiple nodes

2011-09-19 Thread Ole Nielsen
The test program is available here: http://code.google.com/p/pypar/source/browse/source/mpi_test.c Hopefully, someone can help us troubleshoot why communications stop when multiple nodes are involved and CPU usage goes to 100% for as long as we leave the program running. Many thanks Ole Nielsen

[OMPI users] unsubscribe

2011-09-19 Thread Lane, William
please unsubscribe me from this maillist. Thank you, -Bill Lane From: users-boun...@open-mpi.org [users-boun...@open-mpi.org] on behalf of Ole Nielsen [ole.moller.niel...@gmail.com] Sent: Monday, September 19, 2011 1:39 AM To: us...@open-mpi.org Subject: Re: [OMP

Re: [OMPI users] MPI hangs on multiple nodes

2011-09-19 Thread Ole Nielsen
Further to the posting below, I can report that the test program (attached - this time correctly) is chewing up CPU time on both compute nodes for as long as I care to let it continue. It would appear that MPI_Receive which is the next command after the print statements in the test program. Has an

[OMPI users] Problems with MPI_Init_Thread(...)

2011-09-19 Thread devendra rai
Hello Community, I am building an application which uses MPI_Ssend(...) and MPI_Recv(...) in threads. So, there is more than one thread which invokes MPI functions. Based on Jeff's inputs, I rebuilt open-mpi with threads support: ./configure --enable-mpi-threads=yes --with-threads=posix ...

[OMPI users] MPI hangs on multiple nodes

2011-09-19 Thread Ole Nielsen
Hi all We have been using OpenMPI for many years with Ubuntu on our 20-node cluster. Each node has 2 quad cores, so we usually run up to 8 processes on each node up to a maximum of 160 processes. However, we just upgraded the cluster to Ubuntu 11.04 with Open MPI 1.4.3 and and have come across a