Re: [OMPI users] checkpoint problem

2012-07-28 Thread Josh Hursey
Currently you have to do as Reuti mentioned (use the queuing system, or create a script). We do have a feature request ticket open for this feature if you are interested in following the progress: https://svn.open-mpi.org/trac/ompi/ticket/1961 It has been open for a while, but the feature should

Re: [OMPI users] checkpoint problem

2012-07-23 Thread Reuti
Am 23.07.2012 um 10:02 schrieb 陈松: > How can I create ckpt files regularly? I mean, do checkpoint every 100 > seconds. Is there any options to do this? Or I have to write a script myself? Yes, or use a queuing system which supports creation of a checkpoint in fixed time intervals. -- Reuti >

[OMPI users] checkpoint problem

2012-07-23 Thread 陈松
 Hi all,How can I create ckpt files regularly? I mean, do checkpoint every 100 seconds. Is there any options to do this? Or I have to write a script myself?THANKS,---CHEN SongR&D DepartmentNational Supercomputer Center in TianjinBinhai New Area, Tianjin, China

Re: [OMPI users] Checkpoint problem with BLCR + OpenMPI

2010-08-27 Thread Joshua Hursey
On Aug 27, 2010, at 3:52 AM, 陈文浩 wrote: > Dear OMPI Users, > > I have installed BLCR(0.8.2) and OpenMPI(1.4.2) successfully. But now I met a > problem when I take a checkpoint. > I run CG NPB(NPROCS=16, two nodes: blade02 & blade04, CLASS=C, NFS: $HOME & > /opt are shared) > > BLCR configur

[OMPI users] Checkpoint problem with BLCR + OpenMPI

2010-08-27 Thread 陈文浩
Dear OMPI Users, I have installed BLCR(0.8.2) and OpenMPI(1.4.2) successfully. But now I met a problem when I take a checkpoint. I run CG NPB(NPROCS=16, two nodes: blade02 & blade04, CLASS=C, NFS: $HOME & /opt are shared) BLCR configure: ./configure �Cprefix=/opt/blcr �Cenable-static Open

Re: [OMPI users] Checkpoint problem

2008-08-23 Thread Gabriele Fatigati
Well, as you've suggested i've installed latest version of OpenMPi nigthly: 1.4a1r19370 version. Now, checkpoint procedure works well, and related restart files are correctly created, but process restart fails. After restart command, the process starts, but remains frozen doing nothing, and die.

Re: [OMPI users] Checkpoint problem

2008-08-20 Thread Tim Mattox
Hello, Three things... 1) Josh, the main developer for checkpoint/restart, has been away for a few weeks and has just returned. I suspect he will get unburied from e-mail in another day or two. 2) The 1.4 (and 1.3) branch is very much under rapid development, and there will be times when basic fu

Re: [OMPI users] Checkpoint problem

2008-08-20 Thread Ralph Castain
There was a bug that caused ompi-checkpoint not to find the correct place in the session directory for mpirun's contact file. This was fixed in r19265, so you should no longer have a problem. On Aug 20, 2008, at 2:11 AM, Matthias Hovestadt wrote: Hi Gabriele! In this case, mpirun works w

Re: [OMPI users] Checkpoint problem

2008-08-20 Thread Matthias Hovestadt
Hi Gabriele! In this case, mpirun works well, but the checkpoint procedure fails: ompi-checkpoint 20109 [node0316:20134] Error: Unable to get the current working directory [node0316:20134] [[42404,0],0] ORTE_ERROR_LOG: Not found in file orte-checkpoint.c at line 395 [node0316:20134] HNP with PI

[OMPI users] Checkpoint problem

2008-08-20 Thread Gabriele Fatigati
Dear OpenMPI developers, i'm testing checkpoint and restart with OpenMPI 1.4 nightly. Test machine is IBM Blade System over Infiniband with 4 processors every communication node. At the moment, I have some problems. My application is a simply communication ring between processors, with parametric