>From what you sent, it appears that Open MPI thinks your processes called > MPI_Abort (as opposed to segfaulting or some other failure mode). The system > appears to be operating exactly as it should - it just thinks your program > aborted the job - i.e., that one or more processes actually called MPI_Abort > for some reason.
I do not see any other processes to produce an error or call MPI_Abort. I even don't know which one of the 4 processes is the cause. > > Have you tried running your code without valgrind? I'm wondering if the > valgrind interaction may be part of the problem. Yes, when I run it without valgrind I get the problem. With valgrind there is no problem. I had hoped valgrind would help me identify the problem. > > Do you have a code path in your program that would lead to MPI_Abort? I'm > wondering if you have some logic that might abort if it encounters what it > believes is a problem. If so, you might put some output in that path to see > if you are traversing it. Then we would have some idea as to why the code > thinks it *should* abort. I do not call MPI_Abort explicitly anywhere in my code. And so far I can't see any logic as to why it would be called anywhere in the code. My code is actually quite simple MPI-wise - one process reads a config file, then I call MPI_Bcast to deliver the configuration parameters to other processes. Then I set subdomains and after that only use MPI_Send and MPI_Recv for convergence checks and ghost shells exchange. But for example with grid size 40x40x40 it doesn't even reach the computational part.