Glad it worked!

On Nov 13, 2011, at 6:15 AM, Pedro Gonnet wrote:

> 
> Sorry for the long delay on my behalf too.
> 
> Using MPI_Init_thread with MPI_THREAD_MULTIPLE fixes this problem!
> Should have had a closer look at the documentation...
> 
> Cheers,
> Pedro
> 
> 
> 
>> Sorry for the delay in replying. 
>> I think you need to use MPI_INIT_THREAD with a level of
>> MPI_THREAD_MULTIPLE instead of MPI_INIT. This sets up internal locking
>> in Open MPI to protect against multiple threads inside the progress
>> engine, etc. 
>> Be aware that only some of Open MPI's transports are THREAD_MULTIPLE
>> safe -- see the README for more detail. 
>> On Oct 23, 2011, at 1:11 PM, Pedro Gonnet wrote: 
>>> 
>>> Hi again, 
>>> 
>>> As promised, I implemented a small program reproducing the error. 
>>> 
>>> The program's main routine spawns a pthread which calls the
>> function 
>>> "exchange". "exchange" uses MPI_Isend/MPI_Irecv/MPI_Waitany to
>> exchange 
>>> a buffer of double-precision numbers with all other nodes. 
>>> 
>>> At the same time, the "main" routine exchanges the sum of all the 
>>> buffers using MPI_Allreduce. 
>>> 
>>> To compile and run the program, do the following: 
>>> 
>>> mpicc -g -Wall mpitest.c -pthread 
>>> mpirun -np 8 ./a.out 
>>> 
>>> Timing is, of course, of the essence and you may have to run the
>> program 
>>> a few times or twiddle with the value of "usleep" in line 146 for it
>> to 
>>> hang. To see where things go bad, you can do the following 
>>> 
>>> mpirun -np 8 xterm -e gdb -ex run ./a.out 
>>> 
>>> Things go bad when MPI_Allreduce is called while any of the threads
>> are 
>>> in MPI_Waitany. The value of "usleep" in line 146 should be long
>> enough 
>>> for all the nodes to have started exchanging data but small enough
>> so 
>>> that they are not done yet. 
>>> 
>>> Cheers, 
>>> Pedro 
>>> 
>>> 
>>> 
>>> On Thu, 2011-10-20 at 11:25 +0100, Pedro Gonnet wrote: 
>>>> Short update: 
>>>> 
>>>> I just installed version 1.4.4 from source (compiled with 
>>>> --enable-mpi-threads), and the problem persists. 
>>>> 
>>>> I should also point out that if, in thread (ii), I wait for the 
>>>> nonblocking communication in thread (i) to finish, nothing bad
>> happens. 
>>>> But this makes the nonblocking communication somewhat pointless. 
>>>> 
>>>> Cheers, 
>>>> Pedro 
>>>> 
>>>> 
>>>> On Thu, 2011-10-20 at 10:42 +0100, Pedro Gonnet wrote: 
>>>>> Hi all, 
>>>>> 
>>>>> I am currently working on a multi-threaded hybrid parallel
>> simulation 
>>>>> which uses both pthreads and OpenMPI. The simulation uses several 
>>>>> pthreads per MPI node. 
>>>>> 
>>>>> My code uses the nonblocking routines
>> MPI_Isend/MPI_Irecv/MPI_Waitany 
>>>>> quite successfully to implement the node-to-node communication.
>> When I 
>>>>> try to interleave other computations during this communication,
>> however, 
>>>>> bad things happen. 
>>>>> 
>>>>> I have two MPI nodes with two threads each: one thread (i) doing
>> the 
>>>>> nonblocking communication and the other (ii) doing other
>> computations. 
>>>>> At some point, the threads (ii) need to exchange data using 
>>>>> MPI_Allreduce, which fails if the first thread (i) has not
>> completed all 
>>>>> the communication, i.e. if thread (i) is still in MPI_Waitany. 
>>>>> 
>>>>> Using the in-place MPI_Allreduce, I get a re-run of this bug: 
>>>>> http://www.open-mpi.org/community/lists/users/2011/09/17432.php.
>> If I 
>>>>> don't use in-place, the call to MPI_Waitany (thread ii) on one of
>> the 
>>>>> MPI nodes waits forever. 
>>>>> 
>>>>> My guess is that when the thread (ii) calls MPI_Allreduce, it
>> gets 
>>>>> whatever the other node sent with MPI_Isend to thread (i), drops 
>>>>> whatever it should have been getting from the other node's 
>>>>> MPI_Allreduce, and the call to MPI_Waitall hangs. 
>>>>> 
>>>>> Is this a known issue? Is MPI_Allreduce not designed to work
>> alongside 
>>>>> the nonblocking routines? Is there a "safe" variant of
>> MPI_Allreduce I 
>>>>> should be using instead? 
>>>>> 
>>>>> I am using OpenMPI version 1.4.3 (version 1.4.3-1ubuntu3 of the
>> package 
>>>>> openmpi-bin in Ubuntu). Both MPI nodes are run on the same
>> dual-core 
>>>>> computer (Lenovo x201 laptop). 
>>>>> 
>>>>> If you need more information, please do let me know! I'll also try
>> to 
>>>>> cook-up a small program reproducing this problem... 
>>>>> 
>>>>> Cheers and kind regards, 
>>>>> Pedro 
>>>>> 
>>>>> 
>>>>> 
>>>>> 
>>>> 
>>> 
>>> <mpitest.c>_______________________________________________ 
>>> users mailing list 
>>> users_at_[hidden] 
>>> http://www.open-mpi.org/mailman/listinfo.cgi/users 
>> -- 
>> Jeff Squyres
>> jsquyres_at_[hidden]
>> For corporate legal information go to:
>> http://www.cisco.com/web/about/doing_business/legal/cri/
>> 
> 
> _______________________________________________
> users mailing list
> us...@open-mpi.org
> http://www.open-mpi.org/mailman/listinfo.cgi/users


-- 
Jeff Squyres
jsquy...@cisco.com
For corporate legal information go to:
http://www.cisco.com/web/about/doing_business/legal/cri/

Reply via email to