That is great news! I also would have voted to remove the bindings. Boost.MPI is the only library i ever used for MPI in C++, it's a much better designed object-oriented library, not just bindings. ;-)
With Boost.MPI we can send our own types through MPI messages by serializing the objects, which is amazing. A great post about it: http://daveabrahams.com/2010/09/03/whats-so-cool-about-boost-mpi/ The library: http://www.boost.org/doc/libs/1_49_0/doc/html/mpi.html Regards, Júlio. 2012/3/25 Ralph Castain <r...@open-mpi.org> > I doubt anything will be done about those warnings, given that the MPI > Forum has voted to remove the C++ bindings altogether. > > > On Mar 25, 2012, at 12:36 PM, Júlio Hoffimann wrote: > > I have no much time now for trying a more recent version, but i'll keep > that in mind. I also dislike the warnings my current version is giving me ( > http://www.open-mpi.org/community/lists/devel/2011/08/9606.php). I'll see > how to contact Ubuntu maintainers to update OpenMPI and solve both problems > in one shot. ;-) > > Regards, > Júlio. > > 2012/3/25 Ralph Castain <r...@open-mpi.org> > >> >> On Mar 25, 2012, at 11:28 AM, Júlio Hoffimann wrote: >> >> I wrote the version in a previous P.S. statement: MPI 1.4.3 from Ubuntu >> 11.10 repositories. :-) >> >> >> Sorry - I see a lot of emails over the day, and forgot. :-/ >> >> Have you tried this on something more recent, like 1.5.4 or even the >> developer's trunk? IIRC, there were some issues in the older 1.4 releases, >> but they have since been fixed. >> >> >> Thanks for the clarifications! >> >> 2012/3/25 Ralph Castain <r...@open-mpi.org> >> >>> >>> On Mar 25, 2012, at 10:57 AM, Júlio Hoffimann wrote: >>> >>> I forgot to mention, i tried to set the odls_base_sigkill_timeout as you >>> told, even 5s was not sufficient for the root execute it's task, and most >>> important, the kill was instantaneous, there is no 5s hang. My erroneous >>> conclusion: SIGKILL was being sent instead of SIGTERM. >>> >>> >>> Which version are you using? Could be a bug in there - I can take a look. >>> >>> >>> About the man page, at least for me, the word "kill" is not clear. The >>> SIGTERM+SIGKILL keywords would be unambiguous. >>> >>> >>> I'll clarify it - thanks! >>> >>> >>> Regards, >>> Júlio. >>> >>> 2012/3/25 Ralph Castain <r...@open-mpi.org> >>> >>>> >>>> On Mar 25, 2012, at 7:19 AM, Júlio Hoffimann wrote: >>>> >>>> Dear Ralph, >>>> >>>> Thank you for your prompt reply. I confirmed what you just said by >>>> reading the mpirun man page at the sections *Signal Propagation* and >>>> *Process >>>> Termination / Signal Handling*. >>>> >>>> "During the run of an MPI application, if any rank dies >>>> abnormally (either exiting before invoking MPI_FINALIZE, or dying as the >>>> result of a signal), mpirun will print out an error message and kill the >>>> rest of the MPI application." >>>> >>>> If i understood correctly, the SIGKILL signal is sent to every process >>>> on a premature death. >>>> >>>> >>>> Each process receives a SIGTERM, and then a SIGKILL if it doesn't exit >>>> within a specified time frame. I told you how to adjust that time period in >>>> the prior message. >>>> >>>> In my point of view, i consider this a bug. If OpenMPI allows handling >>>> signals such as SIGTERM, the other processes in the communicator should >>>> also have the opportunity to die prettily. Perhaps i'm missing something? >>>> >>>> >>>> Yes, you are - you do get a SIGTERM first, but you are required to exit >>>> in a timely fashion. You are not allowed to continue running. This is >>>> required in order to ensure proper cleanup of the job, per the MPI >>>> standard. >>>> >>>> >>>> Supposing the described behaviour in the last paragraph, i think would >>>> be great to explicitly mention the SIGKILL in the man page, or even better, >>>> fix the implementation to send SIGTERM instead, making possible for the >>>> user cleanup all processes before exit. >>>> >>>> >>>> We already do, as described above. >>>> >>>> >>>> I solved my particular problem by adding another flag * >>>> unexpected_error_on_slave*: >>>> >>>> volatile sig_atomic_t unexpected_error_occurred = 0;int >>>> unexpected_error_on_slave = 0;enum tag { work_tag, die_tag } >>>> void my_handler( int sig ){ >>>> unexpected_error_occurred = 1;} >>>> //// somewhere in the code...// >>>> signal(SIGTERM, my_handler); >>>> if (root process) { >>>> >>>> // do stuff >>>> >>>> world.recv(mpi::any_source, die_tag, unexpected_error_on_slave); >>>> if ( unexpected_error_occurred || unexpected_error_on_slave ) { >>>> >>>> // save something >>>> >>>> world.abort(SIGABRT); >>>> }}else { // slave process >>>> >>>> // do different stuff >>>> >>>> if ( unexpected_error_occurred ) { >>>> >>>> // just communicate the problem to the root >>>> world.send(root,die_tag,1); >>>> signal(SIGTERM,SIG_DFL); >>>> while(true) >>>> ; // wait, master will take care of this >>>> } >>>> world.send(root,die_tag,0); // everything is fine} >>>> signal(SIGTERM, SIG_DFL); // reassign default handler >>>> // continues the code... >>>> >>>> >>>> Note the slave must hang for the store operation get executed at the >>>> root, otherwise we back for the previous scenario. It's theoretically >>>> unnecessary send MPI messages to accomplish the desired cleanup, and in >>>> more complex applications this can turn into a nightmare. As we know, >>>> asynchronous events are insane to debug. >>>> >>>> Best regards, >>>> Júlio. >>>> >>>> P.S.: MPI 1.4.3 from Ubuntu 11.10 repositories. >>>> >>>> 2012/3/23 Ralph Castain <r...@open-mpi.org> >>>> >>>>> Well, yes and no. When a process abnormally terminates, OMPI will kill >>>>> the job - this is done by first hitting each process with a SIGTERM, >>>>> followed shortly thereafter by a SIGKILL. So you do have a short time on >>>>> each process to attempt to cleanup. >>>>> >>>>> My guess is that your signal handler actually is getting called, but >>>>> we then kill the process before you can detect that it was called. >>>>> >>>>> You might try adjusting the time between sigterm and sigkill using >>>>> the odls_base_sigkill_timeout MCA param: >>>>> >>>>> mpirun -mca odls_base_sigkill_timeout N >>>>> >>>>> should cause it to wait for N seconds before issuing the sigkill. Not >>>>> sure if that will help or not - it used to work for me, but I haven't >>>>> tried >>>>> it for awhile. What versions of OMPI are you using? >>>>> >>>>> >>>>> On Mar 22, 2012, at 4:49 PM, Júlio Hoffimann wrote: >>>>> >>>>> Dear all, >>>>> >>>>> I'm trying to handle signals inside a MPI task farming model. >>>>> Following is a pseudo-code of what i'm trying to achieve: >>>>> >>>>> volatile sig_atomic_t unexpected_error_occurred = 0; >>>>> void my_handler( int sig ){ >>>>> unexpected_error_occurred = 1;} >>>>> //// somewhere in the code...// >>>>> signal(SIGTERM, my_handler); >>>>> if (root process) { >>>>> >>>>> // do stuff >>>>> >>>>> if ( unexpected_error_occurred ) { >>>>> >>>>> // save something >>>>> >>>>> // reraise the SIGTERM again, but now with the default handler >>>>> signal(SIGTERM, SIG_DFL); >>>>> raise(SIGTERM); >>>>> }}else { // slave process >>>>> >>>>> // do different stuff >>>>> >>>>> if ( unexpected_error_occurred ) { >>>>> >>>>> // just propragate the signal to the root >>>>> signal(SIGTERM, SIG_DFL); >>>>> raise(SIGTERM); >>>>> }} >>>>> signal(SIGTERM, SIG_DFL); // reassign default >>>>> handler >>>>> // continues the code... >>>>> >>>>> >>>>> As can be seen, the signal handling is required for implementing a >>>>> restart feature. All the problem resides in the assumption i made that all >>>>> processes in the communicator will receive a SIGTERM as a side effect. Is >>>>> it a valid assumption? How the actual MPI implementation deals with such >>>>> scenarios? >>>>> >>>>> I also tried to replace all the raise() calls by MPI_Abort(), which >>>>> according to the documentation ( >>>>> http://www.open-mpi.org/doc/v1.5/man3/MPI_Abort.3.php), sends a >>>>> SIGTERM to all associated processes. The undesired behaviour persists: >>>>> when >>>>> killing a slave process, the save section in the root branch is not >>>>> executed. >>>>> >>>>> Appreciate any help, >>>>> Júlio. >>>>> _______________________________________________ >>>>> users mailing list >>>>> us...@open-mpi.org >>>>> http://www.open-mpi.org/mailman/listinfo.cgi/users >>>>> >>>>> >>>>> >>>>> _______________________________________________ >>>>> users mailing list >>>>> us...@open-mpi.org >>>>> http://www.open-mpi.org/mailman/listinfo.cgi/users >>>>> >>>> >>>> _______________________________________________ >>>> users mailing list >>>> us...@open-mpi.org >>>> http://www.open-mpi.org/mailman/listinfo.cgi/users >>>> >>>> >>>> >>>> _______________________________________________ >>>> users mailing list >>>> us...@open-mpi.org >>>> http://www.open-mpi.org/mailman/listinfo.cgi/users >>>> >>> >>> _______________________________________________ >>> users mailing list >>> us...@open-mpi.org >>> http://www.open-mpi.org/mailman/listinfo.cgi/users >>> >>> >>> >>> _______________________________________________ >>> users mailing list >>> us...@open-mpi.org >>> http://www.open-mpi.org/mailman/listinfo.cgi/users >>> >> >> _______________________________________________ >> users mailing list >> us...@open-mpi.org >> http://www.open-mpi.org/mailman/listinfo.cgi/users >> >> >> >> _______________________________________________ >> users mailing list >> us...@open-mpi.org >> http://www.open-mpi.org/mailman/listinfo.cgi/users >> > > _______________________________________________ > users mailing list > us...@open-mpi.org > http://www.open-mpi.org/mailman/listinfo.cgi/users > > > > _______________________________________________ > users mailing list > us...@open-mpi.org > http://www.open-mpi.org/mailman/listinfo.cgi/users >