Glad it worked! On Nov 13, 2011, at 6:15 AM, Pedro Gonnet wrote:
> > Sorry for the long delay on my behalf too. > > Using MPI_Init_thread with MPI_THREAD_MULTIPLE fixes this problem! > Should have had a closer look at the documentation... > > Cheers, > Pedro > > > >> Sorry for the delay in replying. >> I think you need to use MPI_INIT_THREAD with a level of >> MPI_THREAD_MULTIPLE instead of MPI_INIT. This sets up internal locking >> in Open MPI to protect against multiple threads inside the progress >> engine, etc. >> Be aware that only some of Open MPI's transports are THREAD_MULTIPLE >> safe -- see the README for more detail. >> On Oct 23, 2011, at 1:11 PM, Pedro Gonnet wrote: >>> >>> Hi again, >>> >>> As promised, I implemented a small program reproducing the error. >>> >>> The program's main routine spawns a pthread which calls the >> function >>> "exchange". "exchange" uses MPI_Isend/MPI_Irecv/MPI_Waitany to >> exchange >>> a buffer of double-precision numbers with all other nodes. >>> >>> At the same time, the "main" routine exchanges the sum of all the >>> buffers using MPI_Allreduce. >>> >>> To compile and run the program, do the following: >>> >>> mpicc -g -Wall mpitest.c -pthread >>> mpirun -np 8 ./a.out >>> >>> Timing is, of course, of the essence and you may have to run the >> program >>> a few times or twiddle with the value of "usleep" in line 146 for it >> to >>> hang. To see where things go bad, you can do the following >>> >>> mpirun -np 8 xterm -e gdb -ex run ./a.out >>> >>> Things go bad when MPI_Allreduce is called while any of the threads >> are >>> in MPI_Waitany. The value of "usleep" in line 146 should be long >> enough >>> for all the nodes to have started exchanging data but small enough >> so >>> that they are not done yet. >>> >>> Cheers, >>> Pedro >>> >>> >>> >>> On Thu, 2011-10-20 at 11:25 +0100, Pedro Gonnet wrote: >>>> Short update: >>>> >>>> I just installed version 1.4.4 from source (compiled with >>>> --enable-mpi-threads), and the problem persists. >>>> >>>> I should also point out that if, in thread (ii), I wait for the >>>> nonblocking communication in thread (i) to finish, nothing bad >> happens. >>>> But this makes the nonblocking communication somewhat pointless. >>>> >>>> Cheers, >>>> Pedro >>>> >>>> >>>> On Thu, 2011-10-20 at 10:42 +0100, Pedro Gonnet wrote: >>>>> Hi all, >>>>> >>>>> I am currently working on a multi-threaded hybrid parallel >> simulation >>>>> which uses both pthreads and OpenMPI. The simulation uses several >>>>> pthreads per MPI node. >>>>> >>>>> My code uses the nonblocking routines >> MPI_Isend/MPI_Irecv/MPI_Waitany >>>>> quite successfully to implement the node-to-node communication. >> When I >>>>> try to interleave other computations during this communication, >> however, >>>>> bad things happen. >>>>> >>>>> I have two MPI nodes with two threads each: one thread (i) doing >> the >>>>> nonblocking communication and the other (ii) doing other >> computations. >>>>> At some point, the threads (ii) need to exchange data using >>>>> MPI_Allreduce, which fails if the first thread (i) has not >> completed all >>>>> the communication, i.e. if thread (i) is still in MPI_Waitany. >>>>> >>>>> Using the in-place MPI_Allreduce, I get a re-run of this bug: >>>>> http://www.open-mpi.org/community/lists/users/2011/09/17432.php. >> If I >>>>> don't use in-place, the call to MPI_Waitany (thread ii) on one of >> the >>>>> MPI nodes waits forever. >>>>> >>>>> My guess is that when the thread (ii) calls MPI_Allreduce, it >> gets >>>>> whatever the other node sent with MPI_Isend to thread (i), drops >>>>> whatever it should have been getting from the other node's >>>>> MPI_Allreduce, and the call to MPI_Waitall hangs. >>>>> >>>>> Is this a known issue? Is MPI_Allreduce not designed to work >> alongside >>>>> the nonblocking routines? Is there a "safe" variant of >> MPI_Allreduce I >>>>> should be using instead? >>>>> >>>>> I am using OpenMPI version 1.4.3 (version 1.4.3-1ubuntu3 of the >> package >>>>> openmpi-bin in Ubuntu). Both MPI nodes are run on the same >> dual-core >>>>> computer (Lenovo x201 laptop). >>>>> >>>>> If you need more information, please do let me know! I'll also try >> to >>>>> cook-up a small program reproducing this problem... >>>>> >>>>> Cheers and kind regards, >>>>> Pedro >>>>> >>>>> >>>>> >>>>> >>>> >>> >>> <mpitest.c>_______________________________________________ >>> users mailing list >>> users_at_[hidden] >>> http://www.open-mpi.org/mailman/listinfo.cgi/users >> -- >> Jeff Squyres >> jsquyres_at_[hidden] >> For corporate legal information go to: >> http://www.cisco.com/web/about/doing_business/legal/cri/ >> > > _______________________________________________ > users mailing list > us...@open-mpi.org > http://www.open-mpi.org/mailman/listinfo.cgi/users -- Jeff Squyres jsquy...@cisco.com For corporate legal information go to: http://www.cisco.com/web/about/doing_business/legal/cri/