>From what you sent, it appears that Open MPI thinks your processes called
> MPI_Abort (as opposed to segfaulting or some other failure mode). The system
> appears to be operating exactly as it should - it just thinks your program
> aborted the job - i.e., that one or more processes actually called MPI_Abort
> for some reason.

I do not see any other processes to produce an error or call MPI_Abort.
I even don't know which one of the 4 processes is the cause.

> 
> Have you tried running your code without valgrind? I'm wondering if the
> valgrind interaction may be part of the problem.

Yes, when I run it without valgrind I get the problem. With valgrind
there is no problem. I had hoped valgrind would help me identify the
problem.

> 
> Do you have a code path in your program that would lead to MPI_Abort? I'm
> wondering if you have some logic that might abort if it encounters what it
> believes is a problem. If so, you might put some output in that path to see
> if you are traversing it. Then we would have some idea as to why the code
> thinks it *should* abort.

I do not call MPI_Abort explicitly anywhere in my code. And so far I
can't see any logic as to why it would be called anywhere in the code.
My code is actually quite simple MPI-wise - one process reads a config
file, then I call MPI_Bcast to deliver the configuration parameters to
other processes. Then I set subdomains and after that only use MPI_Send
and MPI_Recv for convergence checks and ghost shells exchange. But for
example with grid size 40x40x40 it doesn't even reach the computational
part.

Reply via email to