Thanks for the hint on "mpirun ldd". I will try it. The problem is that I am running on the cloud and it is trickier to get into a node at run time, or save information to be retrieved later.
Sorry for my ignorance on mca stuff, but what would exactly be the suggested mpirun command line options on coll / tuned? Cheers, Ernesto. From: users <users-boun...@lists.open-mpi.org> On Behalf Of Gilles Gouaillardet via users Sent: Monday, March 14, 2022 2:22 AM To: Open MPI Users <users@lists.open-mpi.org> Cc: Gilles Gouaillardet <gilles.gouaillar...@gmail.com> Subject: Re: [OMPI users] [Ext] Re: Call to MPI_Allreduce() returning value 15 Ernesto, you can mpirun ldd <your binary> and double check it uses the library you expect. you might want to try adapting your trick to use Open MPI 4.1.2 with your binary built with Open MPI 4.0.3 and see how it goes. i'd try disabling coll/tuned first though. Keep in mind PETSc might call MPI_Allreduce under the hood with matching but different signatures. Cheers, Gilles On Mon, Mar 14, 2022 at 4:09 PM Ernesto Prudencio via users <users@lists.open-mpi.org<mailto:users@lists.open-mpi.org>> wrote: Thanks, Gilles. In the case of the application I am working on, all ranks call MPI with the same signature / types of variables. I do not think there is a code error anywhere. I think this is "just" a configuration error from my part. Regarding the idea of changing just one item at a time: that would be the next step, but first I would like to check if my suspicion that the presence of both "/opt/openmpi_4.0.3" and "/appl-third-parties/openmpi-4.1.2" at run time could be an issue: * It is an issue on situation 2, when I explicitly point the runtime mpi to be 4.1.2 (also used in compilation) * It is not an issue on situation 3, when I explicitly point the runtime mpi to be 4.0.3 compiled with INTEL (even though I compiled the application and openmpi 4.1.2 with GNU, and I link the application with openmpi 4.1.2) Best, Ernesto. From: Gilles Gouaillardet <gilles.gouaillar...@gmail.com<mailto:gilles.gouaillar...@gmail.com>> Sent: Monday, March 14, 2022 1:37 AM To: Open MPI Users <users@lists.open-mpi.org<mailto:users@lists.open-mpi.org>> Cc: Ernesto Prudencio <epruden...@slb.com<mailto:epruden...@slb.com>> Subject: Re: [OMPI users] [Ext] Re: Call to MPI_Allreduce() returning value 15 Ernesto, the coll/tuned module (that should handle collective subroutines by default) has a known issue when matching but non identical signatures are used: for example, one rank uses one vector of n bytes, and an other rank uses n bytes. Is there a chance your application might use this pattern? You can give try disabling this component with mpirun --mca coll ^tuned ... I noted between the successful a) case and the unsuccessful b) case, you changed 3 parameters: - compiler vendor - Open MPI version - PETSc 3.10.4 so at this stage, it is not obvious which should be blamed for the failure. In order to get a better picture, I would first try - Intel compilers - Open MPI 4.1.2 - PETSc 3.10.4 => a failure would suggest a regression in Open MPI And then - Intel compilers - Open MPI 4.0.3 - PETSc 3.16.5 => a failure would either suggest a regression in PETSc, or PETSc doing something different but legit that evidences a bug in Open MPI. If you have time, you can also try - Intel compilers - MPICH (or a derivative such as Intel MPI) - PETSc 3.16.5 => a success would strongly point to Open MPI Cheers, Gilles On Mon, Mar 14, 2022 at 2:56 PM Ernesto Prudencio via users <users@lists.open-mpi.org<mailto:users@lists.open-mpi.org>> wrote: Forgot to mention that in all 3 situations, mpirun is called as follows (35 nodes, 4 MPI ranks per node): mpirun -x LD_LIBRARY_PATH=:<PATH1>:<PATH2>:... -hostfile /tmp/hostfile.txt -np 140 -npernode 4 --mca btl_tcp_if_include eth0 <APPLICATION_PATH> <APPLICATION OPTIONS> So I have a question 3) Should I add some extra option in the mpirun command line in order to make situation 2 successful? Thanks, Ernesto. Schlumberger-Private Schlumberger-Private Schlumberger-Private From: users <users-boun...@lists.open-mpi.org<mailto:users-boun...@lists.open-mpi.org>> On Behalf Of Ernesto Prudencio via users Sent: Monday, March 14, 2022 12:39 AM To: Open MPI Users <users@lists.open-mpi.org<mailto:users@lists.open-mpi.org>> Cc: Ernesto Prudencio <epruden...@slb.com<mailto:epruden...@slb.com>> Subject: Re: [OMPI users] [Ext] Re: Call to MPI_Allreduce() returning value 15 Thank you for the quick answer, George. I wanted to investigate the problem further before replying. Below I show 3 situations of my C++ (and Fortran) application, which runs on top of PETSc, OpenMPI, and MKL. All 3 situations use MKL 2019.0.5 compiled with INTEL. At the end, I have 2 questions. Note: all codes are compiled in a certain set of nodes, and the execution happens at _another_ set of nodes. +- - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - Situation 1) It has been successful for months now: a) Use INTEL compilers for OpenMPI 4.0.3, PETSc 3.10.4 , and application. The configuration options for OpenMPI are: '--with-flux-pmi=no' '--enable-orterun-prefix-by-default' '--prefix=/mnt/disks/intel-2018-3-222-blade-runtime-env-2018-1-07-08-2018-132838/openmpi_4.0.3_intel2019.5_gcc7.3.1' 'FC=ifort' 'CC=gcc' b) At run time, each MPI rank prints this info: PATH = /opt/openmpi_4.0.3/bin:/opt/openmpi_4.0.3/bin:/opt/openmpi_4.0.3/bin:/opt/rh/devtoolset-7/root/usr/bin:/usr/local/sbin:/usr/local/bin:/usr/sbin:/usr/bin:/sbin:/bin LD_LIBRARY_PATH = /opt/openmpi_4.0.3/lib::/opt/rh/devtoolset-7/root/usr/lib64:/opt/rh/devtoolset-7/root/usr/lib/gcc/x86_64-redhat-linux/7:/opt/petsc/lib:/opt/2019.5/compilers_and_libraries/linux/mkl/lib/intel64:/opt/openmpi_4.0.3/lib:/lib64:/lib:/usr/lib64:/usr/lib MPI version (compile time) = 4.0.3 MPI_Get_library_version() = Open MPI v4.0.3, package: Open MPI root@<STRING1> Distribution, ident: 4.0.3, repo rev: v4.0.3, Mar 03, 2020 PETSc version (compile time) = 3.10.4 c) A test of 20 minutes with 14 nodes, 4 MPI ranks per node, runs ok. d) A test of 2 hours with 35 nodes, 4 MPI ranks per node, runs ok. +- - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - Situation 2) This situation is the one failing during execution. a) Use GNU compilers for OpenMPI 4.1.2, PETSc 3.16.5 , and application. The configuration options for OpenMPI are: '--with-flux-pmi=no' '--prefix=/appl-third-parties/openmpi-4.1.2' '--enable-orterun-prefix-by-default' b) At run time, each MPI rank prints this info: PATH = /appl-third-parties/openmpi-4.1.2/bin:/appl-third-parties/openmpi-4.1.2/bin:/appl-third-parties/openmpi-4.1.2/bin:/opt/rh/devtoolset-7/root/usr/bin:/usr/local/sbin:/usr/local/bin:/usr/sbin:/usr/bin:/sbin:/bin LD_LIBRARY_PATH = /appl-third-parties/openmpi-4.1.2/lib::/opt/rh/devtoolset-7/root/usr/lib64:/opt/rh/devtoolset-7/root/usr/lib/gcc/x86_64-redhat-linux/7:/appl-third-parties/petsc-3.16.5/lib :/opt/2019.5/compilers_and_libraries/linux/mkl/lib/intel64:/appl-third-parties/openmpi-4.1.2/lib:/lib64:/lib:/usr/lib64:/usr/lib MPI version (compile time) = 4.1.2 MPI_Get_library_version() = Open MPI v4.1.2, package: Open MPI root@<STRING2> Distribution, ident: 4.1.2, repo rev: v4.1.2, Nov 24, 2021 PETSc version (compile time) = 3.16.5 PetscGetVersion() = Petsc Release Version 3.16.5, Mar 04, 2022 PetscGetVersionNumber() = 3.16.5 c) Same as (1.c) d) Test with 35 nodes fails: d.1) The very first MPI call is a MPI_Allreduce() with MPI_MAX op: it returns the right values only to rank 0, while all other ranks get value 0. The routine returns MPI_SUCCESS, though. d.2) The second MPI call is a MPI_Allreduce() with MPI_SUM op: again, it returns the right values only to rank 0, while all other ranks get wrong values (mostly 0). The routine also returns MPI_SUCCESS, though. d.3) The third MPI call is a MPI_Allreduce() with MPI_MIN op: it returns 15 = MPI_ERR_TRUNCATE. This is the error reported in my first e-mail of March 9. +- - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - Situation 3) Runs ok!!! a) Same as (2.a), that is, I continue to compile everything with GNU. b) At run time, I only change the path of MPI to point to the "old" /opt/openmpi_4.0.3 compiled with INTEL. Each MPI rank prints this info: PATH = /opt/openmpi_4.0.3/bin:/opt/openmpi_4.0.3/bin:/opt/openmpi_4.0.3/bin:/opt/rh/devtoolset-7/root/usr/bin:/usr/local/sbin:/usr/local/bin:/usr/sbin:/usr/bin:/sbin:/bin LD_LIBRARY_PATH = /opt/openmpi_4.0.3/lib::/opt/rh/devtoolset-7/root/usr/lib64:/opt/rh/devtoolset-7/root/usr/lib/gcc/x86_64-redhat-linux/7:/appl-third-parties/petsc-3.16.5/lib:/opt/2019.5/co mpilers_and_libraries/linux/mkl/lib/intel64:/opt/openmpi_4.0.3/lib:/lib64:/lib:/lib64:/lib:/usr/lib64:/usr/lib MPI version (compile time) = 4.1.2 MPI_Get_library_version() = Open MPI v4.0.3, package: Open MPI root@<STRING1> Distribution, ident: 4.0.3, repo rev: v4.0.3, Mar 03, 2020 PETSc version (compile time) = 3.16.5 (my observation here: this PETSc was compiled using OpenMPI 4.1.2) PetscGetVersion() = Petsc Release Version 3.16.5, Mar 04, 2022 PetscGetVersionNumber() = 3.16.5 c) Same as (1.c) d) Same as (1.d) +- - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - Note: at run time, the nodes have both OpenMPI available (4.0.3 compiled with INTEL, and 4.1.2 compiled with GNU). That is why I can apply the "trick" of situation 3 above. Question 1) Am I missing some configuration option on OpenMPI? I have been using the same OpenMPI configurations options of the stable situation 1. Question 2) In the failing situation 2, does OpenMPI expect to use some /opt path, even though there is no PATH variable mentioning the "old" /opt/openmpi_4.0.3? I mean, could the problem be that I am providing the "new" OpenMPI 4.1.2 in a path (/appl-thrid-parties/...) that is NOT /opt? Thank you, Ernesto. From: George Bosilca <bosi...@icl.utk.edu<mailto:bosi...@icl.utk.edu>> Sent: Wednesday, March 9, 2022 1:46 PM To: Open MPI Users <users@lists.open-mpi.org<mailto:users@lists.open-mpi.org>> Cc: Ernesto Prudencio <epruden...@slb.com<mailto:epruden...@slb.com>> Subject: [Ext] Re: [OMPI users] Call to MPI_Allreduce() returning value 15 There are two ways the MPI_Allreduce returns MPI_ERR_TRUNCATE: 1. it is propagated from one of the underlying point-to-point communications, which means that at least one of the participants has an input buffer with a larger size. I know you said the size is fixed, but it only matters if all processes are in the same blocking MPI_Allreduce. 2. The code is not SPMD, and one of your processes calls a different MPI_Allreduce on the same communicator. There is no simple way to get more information about this issue. If you have a version of OMPI compiled in debug mode, you can increase the verbosity of the collective framework to see if you get more interesting information. George. On Wed, Mar 9, 2022 at 2:23 PM Ernesto Prudencio via users <users@lists.open-mpi.org<mailto:users@lists.open-mpi.org>> wrote: Hello all, The very simple code below returns mpiRC = 15. const std::array< double, 2 > rangeMin { minX, minY }; std::array< double, 2 > rangeTempRecv { 0.0, 0.0 }; int mpiRC = MPI_Allreduce( rangeMin.data(), rangeTempRecv.data(), rangeMin.size(), MPI_DOUBLE, MPI_MIN, PETSC_COMM_WORLD ); Some information before my questions: 1. The environment I am running this code has hundreds of compute nodes, each node with 4 MPI ranks. 2. It is running in the cloud, so it is tricky to get extra information "on the fly". 3. I am using OpenMPI 4.1.2 + PETSc 3.16.5 + GNU compilers. 4. The error happens consistently at the same point in the execution, at ranks 1 and 2 only (out of hundreds of MPI ranks). 5. By the time the execution gets to the code above, the execution has already called PetscInitialize() and many MPI routines successfully 6. Before the call to MPI_Allreduce() above, the code calls MPI_Barrier(). So, all nodes call MPI_Allreduce() 7. At https://www.open-mpi.org/doc/current/man3/OpenMPI.3.php<https://urldefense.com/v3/__https:/www.open-mpi.org/doc/current/man3/OpenMPI.3.php__;!!Kjv0uj3L4nM6H-I!wS37Nk1AtIBFQXXmEOtP8UEWGnLUdtL5BB5vOPisS0qoHGf7Pmq6bE3Eo-Xebw$> it is written "MPI_ERR_TRUNCATE 15 Message truncated on receive." 8. At https://www.open-mpi.org/doc/v4.1/man3/MPI_Allreduce.3.php<https://urldefense.com/v3/__https:/www.open-mpi.org/doc/v4.1/man3/MPI_Allreduce.3.php__;!!Kjv0uj3L4nM6H-I!wS37Nk1AtIBFQXXmEOtP8UEWGnLUdtL5BB5vOPisS0qoHGf7Pmq6bE2WQh4XoA$>, it is written "The reduction functions ( MPI_Op ) do not return an error value. As a result, if the functions detect an error, all they can do is either call MPI_Abort<https://urldefense.com/v3/__https:/www.open-mpi.org/doc/v4.1/man3/MPI_Abort.3.php__;!!Kjv0uj3L4nM6H-I!wS37Nk1AtIBFQXXmEOtP8UEWGnLUdtL5BB5vOPisS0qoHGf7Pmq6bE19olVdVw$> or silently skip the problem. Thus, if you change the error handler from MPI_ERRORS_ARE_FATAL to something else, for example, MPI_ERRORS_RETURN , then no error may be indicated." Questions: 1. Any ideas for what could be the cause for the return code 15? The code is pretty simple and the buffers have fixed size = 2. 2. In view of item (8), does it mean that the return code 15 in item (7) might not be informative? 3. Once I get a return code != MPI_SUCCESS, is there any routine I can call, in the application code, to get extra information on MPI? 4. Once the application aborts (I throw an exception once a return code is != MPI_SUCESS), is there some command line I can run on all nodes in order to get extra info? Thank you in advance, Ernesto. Schlumberger-Private Schlumberger-Private