Hi,
I've got a question on suspending/resuming an process started with
"mpirun", I've already found the FAQ entry on this
http://www.open-mpi.de/faq/?category=running#suspend-resume but I've
still got a question on this. Basically for now let's assume I'm running
all MPI processes on one host only with one multi-core CPU (so I could
directly send SIGSTOP to other processes if I want to). What I wonder
about is the following: I want to start multiple (let's say four)
instances of my program with "mpirun -np 4 ./mybinary" and at some point
during the program execution I want to suspend two of those four
processes, those two processes are waiting at an MPI_Barrier() at this
point. The goal of that is to suspend execution so that those processes
don't use the CPU at all while they are suspended (that's not the case
with MPI_Barrier as far as I understand this). So now my question
basically is: Will it work when I send SIGSTOP signal from my MPI rank 0
process to these two processes while they are waiting at an MPI_Barrier
and then those two processes won't use the CPU anymore? Later I want to
resume the processes with SIGCONT when the other two processes also
arrived at this MPI_Barrier. Performance of the barrier does not matter
here, what matters for me is that those suspended processes don't cause
any CPU usage. I never used SIGSTOP signal so far, so I'm not sure if
this will work. And before I start coding the logic for this into my
program, I thought I'll ask here first if this will work at all :).

Frank

Reply via email to