Re: [OMPI users] SIGTERM propagation across MPI processes

2012-03-25 Thread Júlio Hoffimann
That is great news! I also would have voted to remove the bindings. Boost.MPI is the only library i ever used for MPI in C++, it's a much better designed object-oriented library, not just bindings. ;-) With Boost.MPI we can send our own types through MPI messages by serializing the objects, which

Re: [OMPI users] SIGTERM propagation across MPI processes

2012-03-25 Thread Ralph Castain
I doubt anything will be done about those warnings, given that the MPI Forum has voted to remove the C++ bindings altogether. On Mar 25, 2012, at 12:36 PM, Júlio Hoffimann wrote: > I have no much time now for trying a more recent version, but i'll keep that > in mind. I also dislike the warnin

Re: [OMPI users] SIGTERM propagation across MPI processes

2012-03-25 Thread Júlio Hoffimann
I have no much time now for trying a more recent version, but i'll keep that in mind. I also dislike the warnings my current version is giving me ( http://www.open-mpi.org/community/lists/devel/2011/08/9606.php). I'll see how to contact Ubuntu maintainers to update OpenMPI and solve both problems i

Re: [OMPI users] SIGTERM propagation across MPI processes

2012-03-25 Thread Ralph Castain
On Mar 25, 2012, at 11:28 AM, Júlio Hoffimann wrote: > I wrote the version in a previous P.S. statement: MPI 1.4.3 from Ubuntu 11.10 > repositories. :-) Sorry - I see a lot of emails over the day, and forgot. :-/ Have you tried this on something more recent, like 1.5.4 or even the developer's

Re: [OMPI users] SIGTERM propagation across MPI processes

2012-03-25 Thread Júlio Hoffimann
I wrote the version in a previous P.S. statement: MPI 1.4.3 from Ubuntu 11.10 repositories. :-) Thanks for the clarifications! 2012/3/25 Ralph Castain > > On Mar 25, 2012, at 10:57 AM, Júlio Hoffimann wrote: > > I forgot to mention, i tried to set the odls_base_sigkill_timeout as you > told, ev

Re: [OMPI users] SIGTERM propagation across MPI processes

2012-03-25 Thread Ralph Castain
On Mar 25, 2012, at 10:57 AM, Júlio Hoffimann wrote: > I forgot to mention, i tried to set the odls_base_sigkill_timeout as you > told, even 5s was not sufficient for the root execute it's task, and most > important, the kill was instantaneous, there is no 5s hang. My erroneous > conclusion: S

Re: [OMPI users] SIGTERM propagation across MPI processes

2012-03-25 Thread Júlio Hoffimann
I forgot to mention, i tried to set the odls_base_sigkill_timeout as you told, even 5s was not sufficient for the root execute it's task, and most important, the kill was instantaneous, there is no 5s hang. My erroneous conclusion: SIGKILL was being sent instead of SIGTERM. About the man page, at

Re: [OMPI users] SIGTERM propagation across MPI processes

2012-03-25 Thread Ralph Castain
On Mar 25, 2012, at 7:19 AM, Júlio Hoffimann wrote: > Dear Ralph, > > Thank you for your prompt reply. I confirmed what you just said by reading > the mpirun man page at the sections Signal Propagation and Process > Termination / Signal Handling. > > "During the run of an MPI application, i

Re: [OMPI users] SIGTERM propagation across MPI processes

2012-03-25 Thread Júlio Hoffimann
Dear Ralph, Thank you for your prompt reply. I confirmed what you just said by reading the mpirun man page at the sections *Signal Propagation* and *Process Termination / Signal Handling*. "During the run of an MPI application, if any rank dies abnormally (either exiting before invoking MPI

Re: [OMPI users] SIGTERM propagation across MPI processes

2012-03-23 Thread Ralph Castain
Well, yes and no. When a process abnormally terminates, OMPI will kill the job - this is done by first hitting each process with a SIGTERM, followed shortly thereafter by a SIGKILL. So you do have a short time on each process to attempt to cleanup. My guess is that your signal handler actually

[OMPI users] SIGTERM propagation across MPI processes

2012-03-22 Thread Júlio Hoffimann
Dear all, I'm trying to handle signals inside a MPI task farming model. Following is a pseudo-code of what i'm trying to achieve: volatile sig_atomic_t unexpected_error_occurred = 0; void my_handler( int sig ){ unexpected_error_occurred = 1;} somewhere in the code...// signal(SIGTERM, my