Dear all, I installed the developer's version r14519 and was able to
get it running. I successfully checkpointed a parallel job and
restarted it. My question is how can I checkpoint the restarted job?
The problem is once the original job is terminated and restarted later
on, the mpirun does
n completely
and I would have to go to r18208? Thank you in advance for your help.
Tamer
On Apr 18, 2008, at 6:03 AM, Josh Hursey wrote:
When you use 'ompi-restart' to restart a job it fork/execs the
completely new job using the restarted processes for the ranks.
However instead o
ng without calling "finalize". This
may
have caused other processes in the application to be
terminated by signals sent by mpirun (as reported here).
Thank you in advance for your help.
Tamer
On Apr 18, 2008, at 7:07 AM, Josh Hursey wrote:
This problem has come up in the past and
checkpoints and restarts as many times as I want to without any
problems. This means that the issue above must be platform dependent
and I must be missing some option in building the code.
Cheers,
Tamer
On Apr 22, 2008, at 5:52 PM, Josh Hursey wrote:
Tamer,
This should now be fixed in
x27;t give me an error message. Has this problem been reported before?
All the required executables and libraries are in my path.
Thanks,
Tamer
On Apr 29, 2008, at 1:37 PM, Sharon Brunett wrote:
Thanks, I'll try the version you recommend below!
Josh Hursey wrote:
Your previous emai