Hi
I'm trying to get fault tolerant ompi running on our cluster for my
semesterthesis.
On the login node i was successful, checkpointing works.
Since the compute nodes have different kernels, i had to compile blcr on the
compute nodes again. blcr on the compute nodes works. after that i instal
roman
Von: users-boun...@open-mpi.org [users-boun...@open-mpi.org]" im Auftrag
von "Hellmüller Roman [hro...@student.ethz.ch]
Gesendet: Mittwoch, 30. März 2011 16:33
Bis: us...@open-mpi.org
Betreff: [OMPI users] Fault tolerant ompi - Error: Unable to find a list of
active M
solved
don't know exactly how. just work on it, set some other parameters/directorys.
cheers
roman
Von: users-boun...@open-mpi.org [users-boun...@open-mpi.org]" im Auftrag
von "Hellmüller Roman [hro...@student.ethz.ch]
Gesendet: Donne
Hi
I'm trying to get fault tolerant ompi running on our cluster for my
semesterthesis.
Build & compile were successful, blcr checkpointing works. openmpi 1.5.3, blcr
0.8.2
Now i'm trying to set up the SELF checkpointing. the example from
http://osl.iu.edu/research/ft/ompi-cr/examples.php does
]
Gesendet: Mittwoch, 6. April 2011 13:20
Bis: Open MPI Users
Betreff: Re: [OMPI users] openmpi self checkpointing - error while running
example
Hi Roman,
Did you try to checkpoint and restart with the parameter "-machinefile". It may
work.
Regards,
Nguyen Toan
On Wed, Apr 6, 2011 at 7:05
e MACHINES_FILE". Hope
it works.
On Wed, Apr 6, 2011 at 9:13 PM, Hellmüller Roman
mailto:hro...@student.ethz.ch>> wrote:
Hi Toan
Thx for your suggestion. It gives me the following result, which does not tell
anything more.
hroman@cbl1 ~/checkpoints $ ompi-restart -v -machin