*MPI_ERR_PROC_FAILED is not yet a valid error in MPI. It is coming from
ULFM, an extension to MPI that is not yet in the OMPI master.*

*Daniel what version of Open MPI are you using ? Are you sure you are not
mixing multiple versions due to PATH/LD_LIBRARY_PATH ?*

*George.*


On Mon, Jan 11, 2021 at 21:31 Gilles Gouaillardet via users <
users@lists.open-mpi.org> wrote:

> Daniel,
>
> the test works in my environment (1 node, 32 GB memory) with all the
> mentioned parameters.
>
> Did you check the memory usage on your nodes and made sure the oom
> killer did not shoot any process?
>
> Cheers,
>
> Gilles
>
> On Tue, Jan 12, 2021 at 1:48 AM Daniel Torres via users
> <users@lists.open-mpi.org> wrote:
> >
> > Hi.
> >
> > Thanks for responding. I have taken the most important parts from my
> code and I created a test that reproduces the behavior I described
> previously.
> >
> > I attach to this e-mail the compressed file "test.tar.gz". Inside him,
> you can find:
> >
> > 1.- The .c source code "test.c", which I compiled with "mpicc -g -O3
> test.c -o test -lm". The main work is performed on the function
> "work_on_grid", starting at line 162.
> > 2.- Four execution examples in two different machines (my own and a
> cluster machine), which I executed with "mpiexec -np 16 --machinefile
> hostfile --map-by node --mca btl tcp,vader,self --mca btl_base_verbose 100
> ./test 4096 4096", varying the last two arguments with 4096, 8192 and 16384
> (a matrix size). The error appears with bigger numbers (8192 in my machine,
> 16384 in the cluster)
> > 3.- The "ompi_info -a" output from the two machines.
> > 4.- The hostfile.
> >
> > The duration of the delay is just a few seconds, about 3 ~ 4.
> >
> > Essentially, the first error message I get from a waiting process is
> "74: MPI_ERR_PROC_FAILED: Process Failure".
> >
> > Hope this information can help.
> >
> > Thanks a lot for your time.
> >
> > El 08/01/21 a las 18:40, George Bosilca via users escribió:
> >
> > Daniel,
> >
> > There are no timeouts in OMPI with the exception of the initial
> connection over TCP, where we use the socket timeout to prevent deadlocks.
> As you already did quite a few communicator duplications and other
> collective communications before you see the timeout, we need more info
> about this. As Gilles indicated, having the complete output might help.
> What is the duration of the delay for the waiting process ? Also, can you
> post a replicator of this issue ?
> >
> >   George.
> >
> >
> > On Fri, Jan 8, 2021 at 9:03 AM Gilles Gouaillardet via users <
> users@lists.open-mpi.org> wrote:
> >>
> >> Daniel,
> >>
> >> Can you please post the full error message and share a reproducer for
> >> this issue?
> >>
> >> Cheers,
> >>
> >> Gilles
> >>
> >> On Fri, Jan 8, 2021 at 10:25 PM Daniel Torres via users
> >> <users@lists.open-mpi.org> wrote:
> >> >
> >> > Hi all.
> >> >
> >> > Actually I'm implementing an algorithm that creates a process grid
> and divides it into row and column communicators as follows:
> >> >
> >> >              col_comm0    col_comm1    col_comm2 col_comm3
> >> > row_comm0    P0           P1           P2        P3
> >> > row_comm1    P4           P5           P6        P7
> >> > row_comm2    P8           P9           P10       P11
> >> > row_comm3    P12          P13          P14       P15
> >> >
> >> > Then, every process works on its own column communicator and
> broadcast data on row communicators.
> >> > While column operations are being executed, processes not included in
> the current column communicator just wait for results.
> >> >
> >> > In a moment, a column communicator could be splitted to create a temp
> communicator and allow only the right processes to work on it.
> >> >
> >> > At the end of a step, a call to MPI_Barrier (on a duplicate of
> MPI_COMM_WORLD) is executed to sync all processes and avoid bad results.
> >> >
> >> > With a small amount of data (a small matrix) the MPI_Barrier call
> syncs correctly on the communicator that includes all processes and
> processing ends fine.
> >> > But when the amount of data (a big matrix) is incremented, operations
> on column communicators take more time to finish and hence waiting time
> also increments for waiting processes.
> >> >
> >> > After a few time, waiting processes return an error when they have
> not received the broadcast (MPI_Bcast) on row communicators or when they
> have finished their work at the sync point (MPI_Barrier). But when the
> operations on the current column communicator end, the still active
> processes try to broadcast on row communicators and they fail because the
> waiting processes have returned an error. So all processes fail in
> different moment in time.
> >> >
> >> > So my problem is that waiting processes "believe" that the current
> operations have failed (but they have not finished yet!) and they fail too.
> >> >
> >> > So I have a question about MPI_Bcast/MPI_Barrier:
> >> >
> >> > Is there a way to increment the timeout a process can wait for a
> broadcast or barrier to be completed?
> >> >
> >> > Here is my machine and OpenMPI info:
> >> > - OpenMPI version: Open MPI 4.1.0u1a1
> >> > - OS: Linux Daniel 5.4.0-52-generic #57-Ubuntu SMP Thu Oct 15
> 10:57:00 UTC 2020 x86_64 x86_64 x86_64 GNU/Linux
> >> >
> >> > Thanks in advance for reading my description/question.
> >> >
> >> > Best regards.
> >> >
> >> > --
> >> > Daniel Torres
> >> > LIPN - Université Sorbonne Paris Nord
> >
> > --
> > Daniel Torres
> > LIPN - Université Sorbonne Paris Nord
>

Reply via email to