I am using OpenMPI 1.8.3 on a linux cluster to run fairly long CFD 
(computational fluid dynamics) simulations using 16 MPI processes. The 
calculations last several days and typically involve millions of MPI exchanges. 
I use the Intel Fortran compiler, and when I compile with the -openmp option 
and run with only one OpenMP thread per MPI process, I tend to get deadlocks 
after several days of computing. These deadlocks only occur in about 1 out of 
10 calculations, and they only occur after running for several days. I have 
eliminated things like network glitches, power spikes, etc, as possibilities. 
The only thing left is the inclusion of the OpenMP option - even though I am 
running with just one OpenMP thread per MPI process. I have read about the 
issues with MPI_THREAD_INIT, and I have reduced the REQUIRED level of support 
to MPI_THREAD_FUNNELED, down from MPI_THREAD_SERIALIZED. The latter was not 
necessary for my application, and I think the reduction in level of support has 
helped, but not completely removed, the deadlocking. Of course, there is always 
the possibility that I have coded my MPI calls improperly, even though the code 
runs for days on end. Maybe there's that one in a million possibility that rank 
x gets to a point in the code that is so far ahead of all the other ranks that 
a deadlock occurs. Placing MPI_BARRIERs has not helped me find any such 
situation.

So I have two questions. First, has anyone experienced something similar to 
this where inclusion of OpenMP in an MPI code has caused deadlocks? Second, is 
it possible that reducing the REQUIRED level of support to MPI_THREAD_SINGLE 
will cause the code to behave differently than FUNNELED? I have read in another 
post that SINGLE and FUNNELED are essentially the same thing. I have even noted 
that I can spawn OpenMP threads even when I use SINGLE.

Thanks

Kevin McGrattan
National Institute of Standards and Technology
100 Bureau Drive, Mail Stop 8664
Gaithersburg, Maryland 20899

301 975 2712

Reply via email to