Thanks Howard,

I’ve attached ompi_info –c output.

Below are the code snippets that all processes execute.  The first one has some 
number of MPI_REDUCE calls. It’s not an IREDUCE, is that what you mean by 
‘variation’?  The second one calls the MPI_ALLREDUCE.

All the processes execute both of these regions of the code. When the job 
hangs, I notice after 15 or 20 minutes of no progress.  Then I kill one of the 
processes, and the stack trace indicates that most of the processes are still 
in the next-to-last MPI_REDUCE (the 3rd of the 4 that you see), but 3 of them 
are in the MPI_ALLREDUCE.  I miscounted earlier when I said the majority of 
processes were in the 4th MPI_REDUCE out of 5.  It was the 3rd out of 4.

You also asked about size.  The first two MPI_REDUCE calls in the loop below 
involve 1 element; the second two calls involve num_quans elements, which is 22 
in the case that hangs.

I will post some output from the coll_base_verbose output in the next e-mail.

Thanks again

Ed

Snippet #1

    do k = 1, num_integrations
      if (integration(k)%skip) cycle

      atots_tot = 0.0_fdf
      atots = integration(k)%atots  ! locally accumulated
      call mpi_reduce(atots,atots_tot,1,my_mpi_real,MPI_SUM,0,exec_comm,ierr)
      if (ierr /= MPI_SUCCESS) call handle_mpi_error('inforce atots 
mpi_reduce',ierr)
      integration(k)%atots = atots_tot

      rats_tot = 0.0_fdf
      rats = integration(k)%rats  ! locally accumulated
      call mpi_reduce(rats,rats_tot,1,my_mpi_real,MPI_SUM,0,exec_comm,ierr)
      if (ierr /= MPI_SUCCESS) call handle_mpi_error('inforce rats 
mpi_reduce',ierr)
      integration(k)%rats = rats_tot

     int_data_tot = 0.0_fdf
      call mpi_reduce(integration(k)%int_data,int_data_tot, &
                      integration(k)%num_quans,my_mpi_real,MPI_SUM, &
                      0,exec_comm,ierr)
      if (ierr /= MPI_SUCCESS) call handle_mpi_error('inforce int_data 
mpi_reduce',ierr)
      integration(k)%int_data = int_data_tot

      quan_num_max = 0
      call mpi_reduce(integration(k)%quan_num,quan_num_max, &
                      integration(k)%num_quans,MPI_INTEGER,MPI_MAX, &
                      0,exec_comm,ierr)
      if (ierr /= MPI_SUCCESS) call handle_mpi_error('inforce quan_num 
mpi_reduce',ierr)
      integration(k)%quan_num = quan_num_max

    enddo


Snippet #2:
    ! Everybody gets the information about whether any cells have failed.
      itmp(1) = wallfn_runinfo%nwallfn_cells
      itmp(2) = wallfn_runinfo%ncells_failed
      itmp(3) = wallfn_runinfo%ncells_printed
      itmpg = 0
      call mpi_allreduce(itmp,itmpg,3,MPI_INTEGER,MPI_SUM,exec_comm,ierr)
      if (ierr /= MPI_SUCCESS) call 
handle_mpi_error('wallfn_runinfo_dump_errors mpi_allreduce',ierr)
      g_nwallfn_cells = itmpg(1)
      g_ncells_failed = itmpg(2)
      g_ncells_printed = itmpg(3)


From: users [mailto:users-boun...@open-mpi.org] On Behalf Of Howard Pritchard
Sent: Friday, September 26, 2014 4:10 PM
To: Open MPI Users
Subject: EXTERNAL: Re: [OMPI users] Application hangs in 1.8.1 related to 
collective operations

Hello Ed,

Could you post the output of ompi_info?  It would also help to know which 
variant of the collective ops
your doing.  If you could post the output when you run with

mpirun --mca coll_base_verbose 10 "other mpirun args you've been using"

that would be great

Also, if you know the sizes (number of elements) involved in the reduce and 
allreduce operations it
would be helpful to know this as well.

Thanks,

Howard


2014-09-25 3:34 GMT-06:00 Blosch, Edwin L 
<edwin.l.blo...@lmco.com<mailto:edwin.l.blo...@lmco.com>>:
I had an application suddenly stop making progress.  By killing the last 
process out of 208 processes, then looking at the stack trace, I found 3 of 208 
processes were in an MPI_REDUCE call.  The other 205 had progressed in their 
execution to another routine, where they were waiting in an unrelated 
MPI_ALLREDUCE call.

The code structure is such that each processes calls MPI_REDUCE 5 times for 
different variables, then some work is done, then the MPI_ALLREDUCE call 
happens early in the next iteration of the solution procedure.  I thought it 
was also noteworthy that the 3 processes stuck at MPI_REDUCE, were actually 
stuck on the 4th of 5 MPI_REDUCE calls, not the 5th call.

No issues with MVAPICH.  Problem easily solved by adding MPI_BARRIER after the 
section of MPI_REDUCE calls.

It seems like MPI_REDUCE has some kind of non-blocking implementation, and it 
was not safe to enter the MPI_ALLREDUCE while those MPI_REDUCE calls had not 
yet completed for other processes.

This was in OpenMPI 1.8.1.  Same problem seen on 3 slightly different systems, 
all QDR Infiniband, Mellanox HCAs, using a Mellanox OFED stack (slightly 
different versions on each cluster).  Intel compilers, again slightly different 
versions on each of the 3 systems.

Has anyone encountered anything similar?  While I have a workaround, I want to 
make sure the root cause of the deadlock gets fixed.  Please let me know what I 
can do to help.

Thanks,

Ed

_______________________________________________
users mailing list
us...@open-mpi.org<mailto:us...@open-mpi.org>
Subscription: http://www.open-mpi.org/mailman/listinfo.cgi/users
Link to this post: 
http://www.open-mpi.org/community/lists/users/2014/09/25389.php

           Configured by: bloscel
           Configured on: Sun Sep 28 12:34:59 CDT 2014
          Configure host: head.cluster
                Built by: bloscel
                Built on: Sun Sep 28 12:55:50 CDT 2014
              Built host: head.cluster
              C bindings: yes
            C++ bindings: yes
             Fort mpif.h: yes (all)
            Fort use mpi: yes (full: ignore TKR)
       Fort use mpi size: deprecated-ompi-info-value
        Fort use mpi_f08: yes
 Fort mpi_f08 compliance: The mpi_f08 module is available, but due to 
limitations in the /applocal/intel/composer_xe_2013/bin/ifort compiler, does 
not support the following: array subsections, direct passthru (where possible) 
to underlying Open MPI's C functionality
  Fort mpi_f08 subarrays: no
           Java bindings: no
  Wrapper compiler rpath: runpath
              C compiler: /applocal/intel/composer_xe_2013/bin/icc
     C compiler absolute: 
  C compiler family name: INTEL
      C compiler version: 1310.20130514
             C char size: 1
             C bool size: 1
            C short size: 2
              C int size: 4
             C long size: 8
            C float size: 4
           C double size: 8
          C pointer size: 8
            C char align: 1
            C bool align: 1
             C int align: 4
           C float align: 4
          C double align: 8
            C++ compiler: /applocal/intel/composer_xe_2013/bin/icpc
   C++ compiler absolute: none
           Fort compiler: /applocal/intel/composer_xe_2013/bin/ifort
       Fort compiler abs: 
         Fort ignore TKR: yes (!DEC$ ATTRIBUTES NO_ARG_CHECK ::)
   Fort 08 assumed shape: no
      Fort optional args: yes
      Fort BIND(C) (all): yes
      Fort ISO_C_BINDING: yes
 Fort SUBROUTINE BIND(C): yes
       Fort TYPE,BIND(C): yes
 Fort T,BIND(C,name="a"): yes
            Fort PRIVATE: yes
          Fort PROTECTED: yes
           Fort ABSTRACT: yes
       Fort ASYNCHRONOUS: yes
          Fort PROCEDURE: yes
 Fort f08 using wrappers: yes
       Fort integer size: 4
       Fort logical size: 4
 Fort logical value true: -1
      Fort have integer1: yes
      Fort have integer2: yes
      Fort have integer4: yes
      Fort have integer8: yes
     Fort have integer16: no
         Fort have real4: yes
         Fort have real8: yes
        Fort have real16: yes
      Fort have complex8: yes
     Fort have complex16: yes
     Fort have complex32: yes
      Fort integer1 size: 1
      Fort integer2 size: 2
      Fort integer4 size: 4
      Fort integer8 size: 8
     Fort integer16 size: -1
          Fort real size: 4
         Fort real4 size: 4
         Fort real8 size: 8
        Fort real16 size: 16
      Fort dbl prec size: 8
          Fort cplx size: 8
      Fort dbl cplx size: 16
         Fort cplx8 size: 8
        Fort cplx16 size: 16
        Fort cplx32 size: 32
      Fort integer align: 1
     Fort integer1 align: 1
     Fort integer2 align: 1
     Fort integer4 align: 1
     Fort integer8 align: 1
    Fort integer16 align: -1
         Fort real align: 1
        Fort real4 align: 1
        Fort real8 align: 1
       Fort real16 align: 1
     Fort dbl prec align: 1
         Fort cplx align: 1
     Fort dbl cplx align: 1
        Fort cplx8 align: 1
       Fort cplx16 align: 1
       Fort cplx32 align: 1
             C profiling: yes
           C++ profiling: yes
   Fort mpif.h profiling: yes
  Fort use mpi profiling: yes
   Fort use mpi_f08 prof: yes
          C++ exceptions: no
          Thread support: posix (MPI_THREAD_MULTIPLE: no, OPAL support: yes, 
OMPI progress: no, ORTE progress: yes, Event lib: yes)
           Sparse Groups: no
            Build CFLAGS: -DNDEBUG -O2 -finline-functions -fno-strict-aliasing 
-restrict -Qoption,cpp,--extended_float_types -pthread
          Build CXXFLAGS: -DNDEBUG -O2 -finline-functions -pthread
           Build FCFLAGS: -D_GNU_SOURCE -traceback  -O2
           Build LDFLAGS: -export-dynamic  -static-intel
              Build LIBS: -lrt -lutil
    Wrapper extra CFLAGS: -pthread
  Wrapper extra CXXFLAGS: -pthread
   Wrapper extra FCFLAGS: 
   Wrapper extra LDFLAGS: -L/opt/mellanox/fca/lib -L/opt/mellanox/mxm/lib    
-Wl,-rpath -Wl,/opt/mellanox/fca/lib -Wl,-rpath -Wl,/opt/mellanox/mxm/lib 
-Wl,-rpath -Wl,@{libdir} -Wl,--enable-new-dtags
      Wrapper extra LIBS: -lm -lnuma -lutil -ldl -lrt -losmcomp -libverbs 
-lrdmacm -lfca -lpsm_infinipath -lmxm -L/opt/mellanox/mxm/lib
  Internal debug support: no
  MPI interface warnings: yes
     MPI parameter check: runtime
Memory profiling support: no
Memory debugging support: no
         libltdl support: yes
   Heterogeneous support: no
 mpirun default --prefix: yes
         MPI I/O support: yes
       MPI_WTIME support: gettimeofday
     Symbol vis. support: yes
   Host topology support: yes
          MPI extensions: 
   FT Checkpoint support: no (checkpoint thread: no)
   C/R Enabled Debugging: no
     VampirTrace support: no
  MPI_MAX_PROCESSOR_NAME: 256
    MPI_MAX_ERROR_STRING: 256
     MPI_MAX_OBJECT_NAME: 64
        MPI_MAX_INFO_KEY: 36
        MPI_MAX_INFO_VAL: 256
       MPI_MAX_PORT_NAME: 1024
  MPI_MAX_DATAREP_STRING: 128

Reply via email to