Hi, I recently installed open mpi (4.0.3) using the procedure described here <https://www.open-mpi.org/faq/?category=building>, as I'm trying to use Horovod for multiple gpu acceleration.
I am looking for a way to handle a keyboard interrupt (save a deep learning model before shutting everything down). I posted a question here <https://github.com/horovod/horovod/issues/1903>. I have seen this thread <https://www.mail-archive.com/users@lists.open-mpi.org/msg26892.html>, which is inconclusive, and this specific message <https://www.mail-archive.com/users@lists.open-mpi.org/msg26894.html> which is really the exact situation I'm in. And I've seen that this earlier one <https://www.mail-archive.com/users@lists.open-mpi.org/msg31805.html> mentions the SIGINT received (although strangely enough when I tried to print the signal I got SIGCONT instead (the result being the same as above anyway, my subprocesses just stop without any handling). I'm wondering if there is a not way of delaying the shutdown of my gpu processes so I can save the latest state of the model. It would be practical. Many thanks in advance for your help, Jeremie