Re: [OMPI users] [Ext] Re: Call to MPI_Allreduce() returning value 15

2022-03-13 Thread Gilles Gouaillardet via users
Ernesto, the coll/tuned module (that should handle collective subroutines by default) has a known issue when matching but non identical signatures are used: for example, one rank uses one vector of n bytes, and an other rank uses n bytes. Is there a chance your application might use this pattern?

Re: [OMPI users] [Ext] Re: Call to MPI_Allreduce() returning value 15

2022-03-13 Thread Ernesto Prudencio via users
Forgot to mention that in all 3 situations, mpirun is called as follows (35 nodes, 4 MPI ranks per node): mpirun -x LD_LIBRARY_PATH=:::... -hostfile /tmp/hostfile.txt -np 140 -npernode 4 --mca btl_tcp_if_include eth0 So I have a question 3) Should I add some extra option in the mpirun command

Re: [OMPI users] [Ext] Re: Call to MPI_Allreduce() returning value 15

2022-03-13 Thread Ernesto Prudencio via users
Thank you for the quick answer, George. I wanted to investigate the problem further before replying. Below I show 3 situations of my C++ (and Fortran) application, which runs on top of PETSc, OpenMPI, and MKL. All 3 situations use MKL 2019.0.5 compiled with INTEL. At the end, I have 2 question