Re: [OMPI users] Random hangs using btl sm with OpenMPI 1.3.2/1.3.3 + gcc4.4?

Jonathan Dursi Tue, 22 Sep 2009 21:46:22 -0400

Hi, Jeff:

I wish I had your problems reproducing this. This problem apparentlyrears its head when OpenMPI is compiled with the intel compilers, aswell, but only ~1% of the time. Unfortunately, we have users wholaunch ~1400 single-node jobs at a go. So they see on order a dozenor two jobs hang per suite of simulations when using the defaults, buttheir problem goes away when they use -mca btl self,tcp, or when theyuse sm but set the number of fifos to np-1.

At first I had assumed it was a new-ish-architecture thing, as wefirst saw the problem on the Nehalem Xeon E5540 nodes, but the sampleprogram hangs in exactly the same way on a Harpertown (E5430) machineas well. So I've been assuming that this is a real problem that forwhatever reason is just exposed more with this particular version ofthis particular compiler. I'd love to be wrong and for it to besomething strange but easily changed in our environment that iscausing this.


Running with your suggested test change, eg
       leftneighbour = rank-1
       if (leftneighbour .eq. -1) then
!          leftneighbour = nprocs-1
          leftneighbour = MPI_PROC_NULL
       endif
       rightneighbour = rank+1
       if (rightneighbour .eq. nprocs) then
!          rightneighbour = 0
          rightneighbour = MPI_PROC_NULL
       endif

like so:
mpirun -np 6 -mca btl self,sm,tcp ./diffusion-mpi

I do seem to get different behaviour. With OpenMPI 1.3.2, the programfrequently runs to completion, but when it does so it hangs at theend, which hadn't happened before -- attaching gdb to a process tellsme that it's hanging in mpi_finalize;

(gdb) where
#0  0x00002b3635ecb51f in poll () from /lib64/libc.so.6

#1 0x00002b3634bd87c1 in poll_dispatch () from /scinet/gpc/mpi/openmpi/1.3.2-gcc-v4.4.0-ofed/lib/libopen-pal.so.0#2 0x00002b3634bd7659 in opal_event_base_loop () from /scinet/gpc/mpi/openmpi/1.3.2-gcc-v4.4.0-ofed/lib/libopen-pal.so.0#3 0x00002b3634bcc189 in opal_progress () from /scinet/gpc/mpi/openmpi/1.3.2-gcc-v4.4.0-ofed/lib/libopen-pal.so.0#4 0x00002b3636d7cf15 in barrier () from /scinet/gpc/mpi/openmpi/1.3.2-gcc-v4.4.0-ofed/lib/openmpi/mca_grpcomm_bad.so#5 0x00002b363470158b in ompi_mpi_finalize () from /scinet/gpc/mpi/openmpi/1.3.2-gcc-v4.4.0-ofed/lib/libmpi.so.0#6 0x00002b36344bb529 in pmpi_finalize__ () from /scinet/gpc/mpi/openmpi/1.3.2-gcc-v4.4.0-ofed/lib/libmpi_f77.so.0

#7  0x0000000000400f99 in MAIN__ ()

#8 0x0000000000400fda in main (argc=1, argv=0x7fff3e3908c8)at ../../../gcc-4.4.0/libgfortran/fmain.c:21

(gdb)

The rest of the time (maybe 1/4 of the time?) it hangs mid-run, inthe sendrecv:

(gdb) where

#0 0x00002b2bb44b4230 in mca_pml_ob1_send () from /scinet/gpc/mpi/openmpi/1.3.2-gcc-v4.4.0-ofed/lib/openmpi/mca_pml_ob1.so#1 0x00002b2baf47d296 in PMPI_Sendrecv () from /scinet/gpc/mpi/openmpi/1.3.2-gcc-v4.4.0-ofed/lib/libmpi.so.0#2 0x00002b2baf215540 in pmpi_sendrecv__ () from /scinet/gpc/mpi/openmpi/1.3.2-gcc-v4.4.0-ofed/lib/libmpi_f77.so.0

#3  0x0000000000400ea6 in MAIN__ ()

#4 0x0000000000400fda in main (argc=1, argv=0x7fff62d9b9c8)at ../../../gcc-4.4.0/libgfortran/fmain.c:21

When running with OpenMPI 1.3.3, I get hangs in the programsignificantly _more_ often with this change than before, typically inthe sendrecv again

#0 0x00002aeb89d6cf2b in mca_btl_sm_component_progress () from /scinet/gpc/mpi/openmpi/1.3.3-gcc-v4.4.0-ofed/lib/openmpi/mca_btl_sm.so#1 0x00002aeb849bd14a in opal_progress () from /scinet/gpc/mpi/openmpi/1.3.3-gcc-v4.4.0-ofed/lib/libopen-pal.so.0#2 0x00002aeb8954f235 in mca_pml_ob1_send () from /scinet/gpc/mpi/openmpi/1.3.3-gcc-v4.4.0-ofed/lib/openmpi/mca_pml_ob1.so#3 0x00002aeb84516586 in PMPI_Sendrecv () from /scinet/gpc/mpi/openmpi/1.3.3-gcc-v4.4.0-ofed/lib/libmpi.so.0#4 0x00002aeb842ae5b0 in pmpi_sendrecv__ () from /scinet/gpc/mpi/openmpi/1.3.3-gcc-v4.4.0-ofed/lib/libmpi_f77.so.0

#5  0x0000000000400ea6 in MAIN__ ()

#6 0x0000000000400fda in main (argc=1, argv=0x7fff12a13068)at ../../../gcc-4.4.0/libgfortran/fmain.c:21

but again occasionally in the finalize, and (unlike with 1.3.2)occasional successful runs through completion.


Again, running the program with both versions of openmpi without sm
mpirun -np 6 -mca btl self,tcp  ./diffusion-mpi

or with num_fifos=(np-1):
mpirun -np 6 -mca btl self,sm -mca btl_sm_num_fifos 5 ./diffusion-mpi

seems to work fine.

        - Jonathan

On 2009-09-22, at 8:52PM, Jeff Squyres wrote:

Johnathan --

Sorry for the delay in replying; thanks for posting again.
I'm actually unable to replicate your problem. :-( I have a newintel 8 core X5570 box; I'm running at np6 and np8 on both Open MPI1.3.2 and 1.3.3 and am not seeing the problem you're seeing. I evenmade your sample program worse -- I made a and b be 100,000 elementreal arrays (increasing the count args in MPI_SENDRECV to 100,000 aswell), and increased nsteps to 150,000,000. No hangs. :-\
The version of the compiler *usually* isn't significant, so gcc 4.xshould be fine.
Yes, the sm flow control issue was a significant fix, but theblocking nature of MPI_SENDRECV means that you shouldn't have runinto the problems that were fixed (the main issues had to do withfast senders exhausting resources of slow receivers -- butMPI_SENDRECV is synchronous so the senders should always be matchingthe speed of the receivers).
Just for giggles, what happens if you change

     if (leftneighbour .eq. -1) then
        leftneighbour = nprocs-1
     endif
     if (rightneighbour .eq. nprocs) then
        rightneighbour = 0
     endif

to

     if (leftneighbour .eq. -1) then
        leftneighbour = MPI_PROC_NULL
     endif
     if (rightneighbour .eq. nprocs) then
        rightneighbour = MPI_PROC_NULL
     endif



On Sep 21, 2009, at 5:09 PM, Jonathan Dursi wrote:
Continuing the conversation with myself:
Google pointed me to Trac ticket #1944, which spoke of deadlocks inlooped collective operations; there is no collective operationanywhere in this sample code, but trying one of the suggestedworkarounds/clues: that is, setting btl_sm_num_fifos to at least(np-1) seems to make things work quite reliably, for both OpenMPI1.3.2 and 1.3.3; that is, while this
mpirun -np 6 -mca btl sm,self ./diffusion-mpi
invariably hangs (at random-seeming numbers of iterations) withOpenMPI 1.3.2 and sometimes hangs (maybe 10% of the time, againseemingly randomly) with 1.3.3,
mpirun -np 6 -mca btl tcp,self ./diffusion-mpi

or

mpirun -np 6 -mca btl_sm_num_fifos 5 -mca btl sm,self ./diffusion-mpi
always succeeds, with (as one might guess) the second being muchfaster...
        Jonathan

--
Jonathan Dursi     <ljdu...@scinet.utoronto.ca>
_______________________________________________
users mailing list
us...@open-mpi.org
http://www.open-mpi.org/mailman/listinfo.cgi/users
--
Jeff Squyres
jsquy...@cisco.com

_______________________________________________
users mailing list
us...@open-mpi.org
http://www.open-mpi.org/mailman/listinfo.cgi/users


--
Jonathan Dursi <ljdu...@scinet.utoronto.ca>

Re: [OMPI users] Random hangs using btl sm with OpenMPI 1.3.2/1.3.3 + gcc4.4?

Reply via email to