Thanks Howard, I’ve attached ompi_info –c output.
Below are the code snippets that all processes execute. The first one has some number of MPI_REDUCE calls. It’s not an IREDUCE, is that what you mean by ‘variation’? The second one calls the MPI_ALLREDUCE. All the processes execute both of these regions of the code. When the job hangs, I notice after 15 or 20 minutes of no progress. Then I kill one of the processes, and the stack trace indicates that most of the processes are still in the next-to-last MPI_REDUCE (the 3rd of the 4 that you see), but 3 of them are in the MPI_ALLREDUCE. I miscounted earlier when I said the majority of processes were in the 4th MPI_REDUCE out of 5. It was the 3rd out of 4. You also asked about size. The first two MPI_REDUCE calls in the loop below involve 1 element; the second two calls involve num_quans elements, which is 22 in the case that hangs. I will post some output from the coll_base_verbose output in the next e-mail. Thanks again Ed Snippet #1 do k = 1, num_integrations if (integration(k)%skip) cycle atots_tot = 0.0_fdf atots = integration(k)%atots ! locally accumulated call mpi_reduce(atots,atots_tot,1,my_mpi_real,MPI_SUM,0,exec_comm,ierr) if (ierr /= MPI_SUCCESS) call handle_mpi_error('inforce atots mpi_reduce',ierr) integration(k)%atots = atots_tot rats_tot = 0.0_fdf rats = integration(k)%rats ! locally accumulated call mpi_reduce(rats,rats_tot,1,my_mpi_real,MPI_SUM,0,exec_comm,ierr) if (ierr /= MPI_SUCCESS) call handle_mpi_error('inforce rats mpi_reduce',ierr) integration(k)%rats = rats_tot int_data_tot = 0.0_fdf call mpi_reduce(integration(k)%int_data,int_data_tot, & integration(k)%num_quans,my_mpi_real,MPI_SUM, & 0,exec_comm,ierr) if (ierr /= MPI_SUCCESS) call handle_mpi_error('inforce int_data mpi_reduce',ierr) integration(k)%int_data = int_data_tot quan_num_max = 0 call mpi_reduce(integration(k)%quan_num,quan_num_max, & integration(k)%num_quans,MPI_INTEGER,MPI_MAX, & 0,exec_comm,ierr) if (ierr /= MPI_SUCCESS) call handle_mpi_error('inforce quan_num mpi_reduce',ierr) integration(k)%quan_num = quan_num_max enddo Snippet #2: ! Everybody gets the information about whether any cells have failed. itmp(1) = wallfn_runinfo%nwallfn_cells itmp(2) = wallfn_runinfo%ncells_failed itmp(3) = wallfn_runinfo%ncells_printed itmpg = 0 call mpi_allreduce(itmp,itmpg,3,MPI_INTEGER,MPI_SUM,exec_comm,ierr) if (ierr /= MPI_SUCCESS) call handle_mpi_error('wallfn_runinfo_dump_errors mpi_allreduce',ierr) g_nwallfn_cells = itmpg(1) g_ncells_failed = itmpg(2) g_ncells_printed = itmpg(3) From: users [mailto:users-boun...@open-mpi.org] On Behalf Of Howard Pritchard Sent: Friday, September 26, 2014 4:10 PM To: Open MPI Users Subject: EXTERNAL: Re: [OMPI users] Application hangs in 1.8.1 related to collective operations Hello Ed, Could you post the output of ompi_info? It would also help to know which variant of the collective ops your doing. If you could post the output when you run with mpirun --mca coll_base_verbose 10 "other mpirun args you've been using" that would be great Also, if you know the sizes (number of elements) involved in the reduce and allreduce operations it would be helpful to know this as well. Thanks, Howard 2014-09-25 3:34 GMT-06:00 Blosch, Edwin L <edwin.l.blo...@lmco.com<mailto:edwin.l.blo...@lmco.com>>: I had an application suddenly stop making progress. By killing the last process out of 208 processes, then looking at the stack trace, I found 3 of 208 processes were in an MPI_REDUCE call. The other 205 had progressed in their execution to another routine, where they were waiting in an unrelated MPI_ALLREDUCE call. The code structure is such that each processes calls MPI_REDUCE 5 times for different variables, then some work is done, then the MPI_ALLREDUCE call happens early in the next iteration of the solution procedure. I thought it was also noteworthy that the 3 processes stuck at MPI_REDUCE, were actually stuck on the 4th of 5 MPI_REDUCE calls, not the 5th call. No issues with MVAPICH. Problem easily solved by adding MPI_BARRIER after the section of MPI_REDUCE calls. It seems like MPI_REDUCE has some kind of non-blocking implementation, and it was not safe to enter the MPI_ALLREDUCE while those MPI_REDUCE calls had not yet completed for other processes. This was in OpenMPI 1.8.1. Same problem seen on 3 slightly different systems, all QDR Infiniband, Mellanox HCAs, using a Mellanox OFED stack (slightly different versions on each cluster). Intel compilers, again slightly different versions on each of the 3 systems. Has anyone encountered anything similar? While I have a workaround, I want to make sure the root cause of the deadlock gets fixed. Please let me know what I can do to help. Thanks, Ed _______________________________________________ users mailing list us...@open-mpi.org<mailto:us...@open-mpi.org> Subscription: http://www.open-mpi.org/mailman/listinfo.cgi/users Link to this post: http://www.open-mpi.org/community/lists/users/2014/09/25389.php
Configured by: bloscel Configured on: Sun Sep 28 12:34:59 CDT 2014 Configure host: head.cluster Built by: bloscel Built on: Sun Sep 28 12:55:50 CDT 2014 Built host: head.cluster C bindings: yes C++ bindings: yes Fort mpif.h: yes (all) Fort use mpi: yes (full: ignore TKR) Fort use mpi size: deprecated-ompi-info-value Fort use mpi_f08: yes Fort mpi_f08 compliance: The mpi_f08 module is available, but due to limitations in the /applocal/intel/composer_xe_2013/bin/ifort compiler, does not support the following: array subsections, direct passthru (where possible) to underlying Open MPI's C functionality Fort mpi_f08 subarrays: no Java bindings: no Wrapper compiler rpath: runpath C compiler: /applocal/intel/composer_xe_2013/bin/icc C compiler absolute: C compiler family name: INTEL C compiler version: 1310.20130514 C char size: 1 C bool size: 1 C short size: 2 C int size: 4 C long size: 8 C float size: 4 C double size: 8 C pointer size: 8 C char align: 1 C bool align: 1 C int align: 4 C float align: 4 C double align: 8 C++ compiler: /applocal/intel/composer_xe_2013/bin/icpc C++ compiler absolute: none Fort compiler: /applocal/intel/composer_xe_2013/bin/ifort Fort compiler abs: Fort ignore TKR: yes (!DEC$ ATTRIBUTES NO_ARG_CHECK ::) Fort 08 assumed shape: no Fort optional args: yes Fort BIND(C) (all): yes Fort ISO_C_BINDING: yes Fort SUBROUTINE BIND(C): yes Fort TYPE,BIND(C): yes Fort T,BIND(C,name="a"): yes Fort PRIVATE: yes Fort PROTECTED: yes Fort ABSTRACT: yes Fort ASYNCHRONOUS: yes Fort PROCEDURE: yes Fort f08 using wrappers: yes Fort integer size: 4 Fort logical size: 4 Fort logical value true: -1 Fort have integer1: yes Fort have integer2: yes Fort have integer4: yes Fort have integer8: yes Fort have integer16: no Fort have real4: yes Fort have real8: yes Fort have real16: yes Fort have complex8: yes Fort have complex16: yes Fort have complex32: yes Fort integer1 size: 1 Fort integer2 size: 2 Fort integer4 size: 4 Fort integer8 size: 8 Fort integer16 size: -1 Fort real size: 4 Fort real4 size: 4 Fort real8 size: 8 Fort real16 size: 16 Fort dbl prec size: 8 Fort cplx size: 8 Fort dbl cplx size: 16 Fort cplx8 size: 8 Fort cplx16 size: 16 Fort cplx32 size: 32 Fort integer align: 1 Fort integer1 align: 1 Fort integer2 align: 1 Fort integer4 align: 1 Fort integer8 align: 1 Fort integer16 align: -1 Fort real align: 1 Fort real4 align: 1 Fort real8 align: 1 Fort real16 align: 1 Fort dbl prec align: 1 Fort cplx align: 1 Fort dbl cplx align: 1 Fort cplx8 align: 1 Fort cplx16 align: 1 Fort cplx32 align: 1 C profiling: yes C++ profiling: yes Fort mpif.h profiling: yes Fort use mpi profiling: yes Fort use mpi_f08 prof: yes C++ exceptions: no Thread support: posix (MPI_THREAD_MULTIPLE: no, OPAL support: yes, OMPI progress: no, ORTE progress: yes, Event lib: yes) Sparse Groups: no Build CFLAGS: -DNDEBUG -O2 -finline-functions -fno-strict-aliasing -restrict -Qoption,cpp,--extended_float_types -pthread Build CXXFLAGS: -DNDEBUG -O2 -finline-functions -pthread Build FCFLAGS: -D_GNU_SOURCE -traceback -O2 Build LDFLAGS: -export-dynamic -static-intel Build LIBS: -lrt -lutil Wrapper extra CFLAGS: -pthread Wrapper extra CXXFLAGS: -pthread Wrapper extra FCFLAGS: Wrapper extra LDFLAGS: -L/opt/mellanox/fca/lib -L/opt/mellanox/mxm/lib -Wl,-rpath -Wl,/opt/mellanox/fca/lib -Wl,-rpath -Wl,/opt/mellanox/mxm/lib -Wl,-rpath -Wl,@{libdir} -Wl,--enable-new-dtags Wrapper extra LIBS: -lm -lnuma -lutil -ldl -lrt -losmcomp -libverbs -lrdmacm -lfca -lpsm_infinipath -lmxm -L/opt/mellanox/mxm/lib Internal debug support: no MPI interface warnings: yes MPI parameter check: runtime Memory profiling support: no Memory debugging support: no libltdl support: yes Heterogeneous support: no mpirun default --prefix: yes MPI I/O support: yes MPI_WTIME support: gettimeofday Symbol vis. support: yes Host topology support: yes MPI extensions: FT Checkpoint support: no (checkpoint thread: no) C/R Enabled Debugging: no VampirTrace support: no MPI_MAX_PROCESSOR_NAME: 256 MPI_MAX_ERROR_STRING: 256 MPI_MAX_OBJECT_NAME: 64 MPI_MAX_INFO_KEY: 36 MPI_MAX_INFO_VAL: 256 MPI_MAX_PORT_NAME: 1024 MPI_MAX_DATAREP_STRING: 128