On Oct 27, 2006, at 10:56 AM, laurent.po...@fr.thalesgroup.com wrote:
I did change the default error handler (using
Mpi_Comm_set_errhandler) in the main_exe program. I replaced it
with a printf.
My error handler is never called, but main_exe receives a SIGPIPE
signal.
So the only solution I found is to catch SIGPIPE and forget it...>
I wonder how this SIGPIPE get generated ... And why we didn't catch it.
It is not supposed to work, and if you find an MPI implementation
that support this approach please tell me. I know the paper
where you read about this, but even with their MPI library this
approach does
not work.
which paper are you talking about ?
I was talking about W. Gropp paper called "Fault Tolerance in MPI
Programs". I don't remember where it was published, it might be one
of the Euro PVM/MPI conferences. Here is a link to the paper (http://
www-unix.mcs.anl.gov/~gropp/bib/papers/2002/mpi-fault.pdf)
Thanks,
george.
Thanks,
Laurent.
Thanks,
george.
On Oct 26, 2006, at 10:19 AM, laurent.po...@fr.thalesgroup.com wrote:
Hi,
I developped a launcher application :
a MPI application (say main_exe) lauches 2 MPI applications (say
exe1 and exe2), using MPI_Comm_spawn_multiple.
Now, I'm looking at the behavior when an exe crashes.
What I can see is the following :
1) when everybody is launched, I see the followings processes,
using 'ps' :
- the 'mpiexec -v -d -n 1 ./main_exe' command
- the orted server used for 'main_exe' (say 'orted1')
- main_exe
- the orted server used for 'exe1' and 'exe2' (say 'orted2')
- exe1
- exe2
2) I use kill -9 to 'crash' exe2
3) orted2 and exe1 finish.
4) with ps, I see it remains the following processes : mpiexec,
'orted1', main_exe
5) main_exe tries to send a message to exe1, using MPI_Bsend :
main_exe gets killed by a SIG_PIPE signal !!!!
So what I see is that when a part of an MPI application crashes,
the whole application crashes !
Is there a way to get an other behavior ? For exemple, MPI_Bsend
could return an error message.
A few additionnal informations :
- I work on linux, with Open-MPI 1.1.1.
- I'm developping in C and C++.
Thanks,
Laurent.
_______________________________________________
users mailing list
us...@open-mpi.org
http://www.open-mpi.org/mailman/listinfo.cgi/users