Dear all, I installed the developer's version r14519 and was able to
get it running. I successfully checkpointed a parallel job and
restarted it. My question is how can I checkpoint the restarted job?
The problem is once the original job is terminated and restarted later
on, the mpirun does not exist anymore (ps -efa|grep mpirun) and hence
I do not know which PID I should use when I run the ompi-checkpoint on
the restarted job. Any help would be greatly appreciated.
- [OMPI users] How to restart a job twice Tamer
- Re: [OMPI users] How to restart a job twice Josh Hursey
- Re: [OMPI users] How to restart a job twice Tamer
- Re: [OMPI users] How to restart a job twice Josh Hursey
- Re: [OMPI users] How to restart a job twice Tamer
- Re: [OMPI users] How to restart a job tw... Josh Hursey
- Re: [OMPI users] How to restart a jo... Josh Hursey
- Re: [OMPI users] How to restart ... Tamer
- Re: [OMPI users] How to restart ... Josh Hursey
- Re: [OMPI users] How to restart ... Josh Hursey