Sorry for the long delay on my behalf too. Using MPI_Init_thread with MPI_THREAD_MULTIPLE fixes this problem! Should have had a closer look at the documentation...
Cheers, Pedro > Sorry for the delay in replying. > I think you need to use MPI_INIT_THREAD with a level of > MPI_THREAD_MULTIPLE instead of MPI_INIT. This sets up internal locking > in Open MPI to protect against multiple threads inside the progress > engine, etc. > Be aware that only some of Open MPI's transports are THREAD_MULTIPLE > safe -- see the README for more detail. > On Oct 23, 2011, at 1:11 PM, Pedro Gonnet wrote: > > > > Hi again, > > > > As promised, I implemented a small program reproducing the error. > > > > The program's main routine spawns a pthread which calls the > function > > "exchange". "exchange" uses MPI_Isend/MPI_Irecv/MPI_Waitany to > exchange > > a buffer of double-precision numbers with all other nodes. > > > > At the same time, the "main" routine exchanges the sum of all the > > buffers using MPI_Allreduce. > > > > To compile and run the program, do the following: > > > > mpicc -g -Wall mpitest.c -pthread > > mpirun -np 8 ./a.out > > > > Timing is, of course, of the essence and you may have to run the > program > > a few times or twiddle with the value of "usleep" in line 146 for it > to > > hang. To see where things go bad, you can do the following > > > > mpirun -np 8 xterm -e gdb -ex run ./a.out > > > > Things go bad when MPI_Allreduce is called while any of the threads > are > > in MPI_Waitany. The value of "usleep" in line 146 should be long > enough > > for all the nodes to have started exchanging data but small enough > so > > that they are not done yet. > > > > Cheers, > > Pedro > > > > > > > > On Thu, 2011-10-20 at 11:25 +0100, Pedro Gonnet wrote: > >> Short update: > >> > >> I just installed version 1.4.4 from source (compiled with > >> --enable-mpi-threads), and the problem persists. > >> > >> I should also point out that if, in thread (ii), I wait for the > >> nonblocking communication in thread (i) to finish, nothing bad > happens. > >> But this makes the nonblocking communication somewhat pointless. > >> > >> Cheers, > >> Pedro > >> > >> > >> On Thu, 2011-10-20 at 10:42 +0100, Pedro Gonnet wrote: > >>> Hi all, > >>> > >>> I am currently working on a multi-threaded hybrid parallel > simulation > >>> which uses both pthreads and OpenMPI. The simulation uses several > >>> pthreads per MPI node. > >>> > >>> My code uses the nonblocking routines > MPI_Isend/MPI_Irecv/MPI_Waitany > >>> quite successfully to implement the node-to-node communication. > When I > >>> try to interleave other computations during this communication, > however, > >>> bad things happen. > >>> > >>> I have two MPI nodes with two threads each: one thread (i) doing > the > >>> nonblocking communication and the other (ii) doing other > computations. > >>> At some point, the threads (ii) need to exchange data using > >>> MPI_Allreduce, which fails if the first thread (i) has not > completed all > >>> the communication, i.e. if thread (i) is still in MPI_Waitany. > >>> > >>> Using the in-place MPI_Allreduce, I get a re-run of this bug: > >>> http://www.open-mpi.org/community/lists/users/2011/09/17432.php. > If I > >>> don't use in-place, the call to MPI_Waitany (thread ii) on one of > the > >>> MPI nodes waits forever. > >>> > >>> My guess is that when the thread (ii) calls MPI_Allreduce, it > gets > >>> whatever the other node sent with MPI_Isend to thread (i), drops > >>> whatever it should have been getting from the other node's > >>> MPI_Allreduce, and the call to MPI_Waitall hangs. > >>> > >>> Is this a known issue? Is MPI_Allreduce not designed to work > alongside > >>> the nonblocking routines? Is there a "safe" variant of > MPI_Allreduce I > >>> should be using instead? > >>> > >>> I am using OpenMPI version 1.4.3 (version 1.4.3-1ubuntu3 of the > package > >>> openmpi-bin in Ubuntu). Both MPI nodes are run on the same > dual-core > >>> computer (Lenovo x201 laptop). > >>> > >>> If you need more information, please do let me know! I'll also try > to > >>> cook-up a small program reproducing this problem... > >>> > >>> Cheers and kind regards, > >>> Pedro > >>> > >>> > >>> > >>> > >> > > > > <mpitest.c>_______________________________________________ > > users mailing list > > users_at_[hidden] > > http://www.open-mpi.org/mailman/listinfo.cgi/users > -- > Jeff Squyres > jsquyres_at_[hidden] > For corporate legal information go to: > http://www.cisco.com/web/about/doing_business/legal/cri/ >