Re: [OMPI users] Checkpointing an MPI application with OMPI

2013-02-05 Thread Josh Hursey
This is a bit late in the thread, but I wanted to add one more note. The functionality that made it to v1.6 is fairly basic in terms of C/R support in Open MPI. It supported a global checkpoint write, and (for a time) a simple staged option (I think that is now broken). In the trunk (about 3 year

Re: [OMPI users] Checkpointing an MPI application with OMPI

2013-01-30 Thread Maxime Boissonneault
Le 2013-01-29 21:02, Ralph Castain a écrit : On Jan 28, 2013, at 10:53 AM, Maxime Boissonneault > wrote: While our filesystem and management nodes are on UPS, our compute nodes are not. With one average generic (power/cooling mostly) failure ever

Re: [OMPI users] Checkpointing an MPI application with OMPI

2013-01-30 Thread Constantinos Makassikis
On Wed, Jan 30, 2013 at 3:02 AM, Ralph Castain wrote: > > If your node hardware is the problem, or you decide you do want/need to > pursue an FT solution, then you might look at the OMPI-based solutions from > parties such as http://fault-tolerance.org or the MPICH2 folks. > Just as Ralph said,

Re: [OMPI users] Checkpointing an MPI application with OMPI

2013-01-29 Thread Ralph Castain
On Jan 28, 2013, at 10:53 AM, Maxime Boissonneault wrote: > While our filesystem and management nodes are on UPS, our compute nodes are > not. With one average generic (power/cooling mostly) failure every one or two > months, running for weeks is just asking for trouble. If you add to that >

Re: [OMPI users] Checkpointing an MPI application with OMPI

2013-01-28 Thread George Bosilca
Based on the paper you linked the answer is quite obvious. The proposed CRFS mechanism supports all of the checkpoint-enabled MPI implementation, thus you just have to go with the one providing and caring about the services you need. George. On Mon, Jan 28, 2013 at 3:46 PM, Maxime Boissonneault

Re: [OMPI users] Checkpointing an MPI application with OMPI

2013-01-28 Thread Maxime Boissonneault
Hi George, The problem here is not the bandwidth, but the number of IOPs. I wrote to the BLCR list, and they confirmed that : "While ideally the checkpoint would be written in sizable chunks, the current code in BLCR will issue a single write operation for each contiguous range of user memory,

Re: [OMPI users] Checkpointing an MPI application with OMPI

2013-01-28 Thread George Bosilca
At the scale you address you should have no trouble with the C/R if the file system is correctly configured. We get more bandwidth per node out of an NFS over 1Gb/s at 32 nodes. Have you run some parallel benchmarks on your cluster ? George. PS: You can some MPI I/O benchmarks at http://www.mcs.

Re: [OMPI users] Checkpointing an MPI application with OMPI

2013-01-28 Thread Ralph Castain
On Jan 28, 2013, at 10:53 AM, Maxime Boissonneault wrote: > Le 2013-01-28 13:15, Ralph Castain a écrit : >> On Jan 28, 2013, at 9:52 AM, Maxime Boissonneault >> wrote: >> >>> Le 2013-01-28 12:46, Ralph Castain a écrit : On Jan 28, 2013, at 8:25 AM, Maxime Boissonneault wrote: >>>

Re: [OMPI users] Checkpointing an MPI application with OMPI

2013-01-28 Thread Maxime Boissonneault
Le 2013-01-28 13:15, Ralph Castain a écrit : On Jan 28, 2013, at 9:52 AM, Maxime Boissonneault wrote: Le 2013-01-28 12:46, Ralph Castain a écrit : On Jan 28, 2013, at 8:25 AM, Maxime Boissonneault wrote: Hello Ralph, I agree that ideally, someone would implement checkpointing in the appl

Re: [OMPI users] Checkpointing an MPI application with OMPI

2013-01-28 Thread Ralph Castain
On Jan 28, 2013, at 9:52 AM, Maxime Boissonneault wrote: > Le 2013-01-28 12:46, Ralph Castain a écrit : >> On Jan 28, 2013, at 8:25 AM, Maxime Boissonneault >> wrote: >> >>> Hello Ralph, >>> I agree that ideally, someone would implement checkpointing in the >>> application itself, but that

Re: [OMPI users] Checkpointing an MPI application with OMPI

2013-01-28 Thread Maxime Boissonneault
Le 2013-01-28 12:46, Ralph Castain a écrit : On Jan 28, 2013, at 8:25 AM, Maxime Boissonneault wrote: Hello Ralph, I agree that ideally, someone would implement checkpointing in the application itself, but that is not always possible (commercial applications, use of complicated libraries, a

Re: [OMPI users] Checkpointing an MPI application with OMPI

2013-01-28 Thread Ralph Castain
On Jan 28, 2013, at 8:25 AM, Maxime Boissonneault wrote: > Hello Ralph, > I agree that ideally, someone would implement checkpointing in the > application itself, but that is not always possible (commercial applications, > use of complicated libraries, algorithms with no clear progression poi

Re: [OMPI users] Checkpointing an MPI application with OMPI

2013-01-28 Thread Maxime Boissonneault
Hello Ralph, I agree that ideally, someone would implement checkpointing in the application itself, but that is not always possible (commercial applications, use of complicated libraries, algorithms with no clear progression points at which you can interrupt the algorithm and start it back fro

Re: [OMPI users] Checkpointing an MPI application with OMPI

2013-01-28 Thread Ralph Castain
Our c/r person has moved on to a different career path, so we may not have anyone who can answer this question. What we can say is that checkpointing at any significant scale will always be a losing proposition. It just takes too long and hammers the file system. People have been working on ext

[OMPI users] Checkpointing an MPI application with OMPI

2013-01-28 Thread Maxime Boissonneault
Hello, I am doing checkpointing tests (with BLCR) with an MPI application compiled with OpenMPI 1.6.3, and I am seeing behaviors that are quite strange. First, some details about the tests : - The only filesystem available on the nodes are 1) one tmpfs, 2) one lustre shared filesystem (tested

Re: [OMPI users] checkpointing of NPB

2012-06-24 Thread Ifeanyi
Thanks Josh for the reply. I will try patching, and possibly other benchmarking. Regards, Ifeanyi On Wed, Jun 20, 2012 at 11:51 PM, Josh Hursey wrote: > Ifeanyi, > > I am usually the one that responds to checkpoint/restart questions, > but unfortunately I do not have time to look into this is

Re: [OMPI users] checkpointing of NPB

2012-06-20 Thread Josh Hursey
Ifeanyi, I am usually the one that responds to checkpoint/restart questions, but unfortunately I do not have time to look into this issue at the moment (and probably won't for at least a few more months). There are a few other developers that work on the checkpoint/restart functionality that might

[OMPI users] checkpointing of NPB

2012-06-19 Thread Ifeanyi
Dear, Please help. I configured the open mpi and it can checkpoint HPL. However, whenever I want to checkpoint NAS parallel benchmark it kills the application without informative message. Please how do I configure the openmpi 1.6 to checkpoint NPB? I really need a help, I have been on this issu

[OMPI users] checkpointing

2012-06-14 Thread Ifeanyi
Hi Please help. I have installed openmpi-1.6, I have also tested the installation with different mpi applications and my application executed successfully. Whenever I ran NPB-3.3 LU without checkpointing, NPB-3.3 completes successfully. however whenever I checkpointing the application, it abort

Re: [OMPI users] checkpointing/restart of hpl

2012-06-05 Thread Ifeanyi
Thanks Constantinos. I have gone through the site you sent to me, however whenever I want to enable FT, it will not be enabled. this is what I got. /openmpi-1.6# ./configure --enable-ft-thread --with-ft=cr --with-blcr=/usr/src/blcr-0.8.2 #FT Checkpoint support: no (checkpoint thread: no) "FT Chec

Re: [OMPI users] checkpointing/restart of hpl

2012-06-04 Thread Constantinos Makassikis
Hi, you may start by looking at http://www.open-mpi.org/faq/?category=ft which leads you to https://svn.open-mpi.org/trac/ompi/wiki/ProcessFT_CR and http://osl.iu.edu/research/ft/ompi-cr/ The latter is the most up-to-date link and probably what you are looking for. HTH, -- Constantinos On Mon

[OMPI users] checkpointing/restart of hpl

2012-06-03 Thread Ifeanyi
Dear, I am a new user of open mpi. I have already installed openmpi and build hpl. I want to checkpoint/restart hpl and compare its performance Please can you point me to a useful link that will guide me through this checkpointing of hpl. thanks in advance. Regards, Ifeanyi

Re: [OMPI users] checkpointing on other transports

2012-01-17 Thread Josh Hursey
I have not tried to support a MTL with the checkpointing functionality, so I do not have first hand experience with those - just the OB1/BML/BTL stack. The difficulty in porting to a new transport is really a function of how the transport interacts with the checkpointer (e.g., BLCR). The draining

[OMPI users] checkpointing on other transports

2012-01-12 Thread Dave Love
What would be involved in adding checkpointing to other transports, specifically the PSM MTL? Are there (likely to be?) technical obstacles, and would it be a lot of work if not? I'm asking in case it would be easy, and we don't have to exclude QLogic from a procurement, given they won't respond

Re: [OMPI users] Checkpointing mpi4py program (Probably bcast issue)

2010-08-20 Thread ananda.mudar
e > > - Ananda > > Ananda B Mudar, PMP > Senior Technical Architect > Wipro Technologies > Ph: 972 765 8093 begin_of_the_skype_highlighting 972 765 8093 > end_of_the_skype_highlighting > ananda.mudar_at_[hidden] > > From: Ananda Babu Mudar > Sen

Re: [OMPI users] Checkpointing mpi4py program (Probably bcast issue)

2010-08-18 Thread ananda.mudar
d any other information. Thanks Ananda Ananda B Mudar, PMP Senior Technical Architect Wipro Technologies Ph: 972 765 8093 ananda.mu...@wipro.com --- Original Message --- Subject: Re: [OMPI users] Checkpointing mpi4py program From: Joshua Hursey (jjhursey_at_[h

Re: [OMPI users] Checkpointing mpi4py program

2010-08-18 Thread Joshua Hursey
ve reported. > > Let me know if you need any additional information. > > Thanks for your time in advance > > - Ananda > > Ananda B Mudar, PMP > Senior Technical Architect > Wipro Technologies > Ph: 972 765 8093 > ananda.mu...@wipro.com > >

Re: [OMPI users] Checkpointing mpi4py program

2010-08-16 Thread ananda.mudar
logies Ph: 972 765 8093 ananda.mu...@wipro.com From: Ananda Babu Mudar (WT01 - Energy and Utilities) Sent: Sunday, August 15, 2010 11:25 PM To: us...@open-mpi.org Subject: Re: [OMPI users] Checkpointing mpi4py program Importance: High Josh I tried running the mpi4py program with the late

Re: [OMPI users] Checkpointing mpi4py program

2010-08-16 Thread ananda.mudar
e let me know if you need any additional information. Thanks for your time in advance Thanks Ananda -Original Message- Subject: Re: [OMPI users] Checkpointing mpi4py program From: Joshua Hursey (jjhursey_at_[hidden]) List-Post: users@lists.open-mpi.org Date: 2010-08-13 12:28:31 No

Re: [OMPI users] Checkpointing mpi4py program

2010-08-13 Thread Joshua Hursey
em. I will see if I have to change any other environment variables >> to have a successful compilation. I will keep you posted. >> >> BTW, were you successful in reproducing the problem on a system with >> OpenMPI 1.4.2? >> >> Thanks >> Ananda >&g

Re: [OMPI users] Checkpointing mpi4py program

2010-08-13 Thread ananda.mudar
> Thanks > Ananda > -Original Message- > Date: Thu, 12 Aug 2010 09:12:26 -0400 > From: Joshua Hursey > Subject: Re: [OMPI users] Checkpointing mpi4py program > To: Open MPI Users > Message-ID: <1f1445ab-9208-4ef0-af25-5926bd53c...@open-mpi.org> > Content

Re: [OMPI users] Checkpointing mpi4py program

2010-08-13 Thread ananda.mudar
: Joshua Hursey Subject: Re: [OMPI users] Checkpointing mpi4py program To: Open MPI Users Message-ID: <1f1445ab-9208-4ef0-af25-5926bd53c...@open-mpi.org> Content-Type: text/plain; charset=us-ascii Can you try this with the current trunk (r23587 or later)? I just added a number of new features a

Re: [OMPI users] Checkpointing mpi4py program

2010-08-12 Thread Joshua Hursey
...@wipro.com > > > -Original Message----- > Date: Mon, 9 Aug 2010 16:37:58 -0400 > From: Joshua Hursey > Subject: Re: [OMPI users] Checkpointing mpi4py program > To: Open MPI Users > Message-ID: <270bd450-743a-4662-9568-1fedfcc6f...@open-mpi.org> > Content-Ty

Re: [OMPI users] Checkpointing mpi4py program

2010-08-10 Thread ananda.mudar
users] Checkpointing mpi4py program To: Open MPI Users Message-ID: <270bd450-743a-4662-9568-1fedfcc6f...@open-mpi.org> Content-Type: text/plain; charset=windows-1252 I have not tried to checkpoint an mpi4py application, so I cannot say for sure if it works or not. You might be hitting somethin

Re: [OMPI users] Checkpointing mpi4py program

2010-08-09 Thread Joshua Hursey
I have not tried to checkpoint an mpi4py application, so I cannot say for sure if it works or not. You might be hitting something with the Python runtime interacting in an odd way with either Open MPI or BLCR. Can you attach a debugger and get a backtrace on a stuck checkpoint? That might show

[OMPI users] Checkpointing mpi4py program

2010-08-09 Thread ananda.mudar
Hi I have integrated mpi4py with openmpi 1.4.2 that was built with BLCR 0.8.2. When I run ompi-checkpoint on the program written using mpi4py, I see that program doesn't resume sometimes after successful checkpoint creation. This doesn't occur always meaning the program resumes after successful

Re: [OMPI users] checkpointing multi node and multi process applications

2010-03-04 Thread Joshua Hursey
On Mar 4, 2010, at 8:17 AM, Fernando Lemos wrote: > On Wed, Mar 3, 2010 at 10:24 PM, Fernando Lemos wrote: > >> Is there anything I can do to provide more information about this bug? >> E.g. try to compile the code in the SVN trunk? I also have kept the >> snapshots intact, I can tar them up an

Re: [OMPI users] checkpointing multi node and multi process applications

2010-03-04 Thread Fernando Lemos
On Wed, Mar 3, 2010 at 10:24 PM, Fernando Lemos wrote: > Is there anything I can do to provide more information about this bug? > E.g. try to compile the code in the SVN trunk? I also have kept the > snapshots intact, I can tar them up and upload them somewhere in case > you guys need it. I can a

[OMPI users] checkpointing multi node and multi process applications

2010-03-03 Thread Fernando Lemos
Hi, First, I'm hoping setting the subject of this e-mail will get it attached to the thread that starts with this e-mail: http://www.open-mpi.org/community/lists/users/2009/12/11608.php The reason I'm not replying to that thread is that I wasn't subscribed to the list at the time. My environm

Re: [OMPI users] checkpointing multi node and multi process applications

2010-01-25 Thread Josh Hursey
ruct me on how to resolve this problem. Thank you Jean --- On Mon, 11/1/10, Josh Hursey wrote: From: Josh Hursey Subject: Re: [OMPI users] checkpointing multi node and multi process applications To: "Open MPI Users" Date: Monday, 11 January, 2010, 21:42 On Dec 19, 2

Re: [OMPI users] checkpointing multi node and multi process applications

2010-01-25 Thread Josh Hursey
, Josh Hursey wrote: From: Josh Hursey Subject: Re: [OMPI users] checkpointing multi node and multi process applications To: "Open MPI Users" Date: Monday, 11 January, 2010, 21:42 On Dec 19, 2009, at 7:42 AM, Jean Potsam wrote: > Hi Everyone, >I

Re: [OMPI users] checkpointing multi node and multi process applications

2010-01-25 Thread Josh Hursey
can be wrong? Please instruct me on how to resolve this problem. Thank you Jean --- On Mon, 11/1/10, Josh Hursey wrote: From: Josh Hursey Subject: Re: [OMPI users] checkpointing multi node and multi process applications To: "Open MPI Users" Date: Monday, 11 January, 2010

[OMPI users] checkpointing multi node and multi process applications

2010-01-21 Thread Jean Potsam
   --- On Mon, 11/1/10, Josh Hursey wrote: From: Josh Hursey Subject: Re: [OMPI users] checkpointing multi node and multi process applications To: "Open MPI Users" List-Post: users@lists.open-mpi.org Date: Monday, 11 January, 2010, 21:42 On Dec 19, 2009, at 7:

Re: [OMPI users] checkpointing multi node and multi process applications

2010-01-11 Thread Josh Hursey
On Dec 19, 2009, at 7:42 AM, Jean Potsam wrote: Hi Everyone, I am trying to checkpoint an mpi application running on multiple nodes. However, I get some error messages when i trigger the checkpointing process. Error: expected_component: PID information unavailable!

[OMPI users] checkpointing multi node and multi process applications

2009-12-19 Thread Jean Potsam
Hi Everyone,    I am trying to checkpoint an mpi application running on multiple nodes. However, I get some error messages when i trigger the checkpointing process. Error: expected_component: PID information unavailable! Error: expected_component: Component Name information u

Re: [OMPI users] checkpointing 2 or more processes running in parallel

2009-09-08 Thread Josh Hursey
Though I would not recommend your technique for initiating a checkpoint from an application, it may work. Since ompi-checkpoint will need to contact and interact with every MPI process, this could cause problems if the application is blocking in system() while ompi- checkpoint is trying to i

[OMPI users] checkpointing 2 or more processes running in parallel

2009-08-27 Thread Jean Potsam
Dear all,   I am trying to checkpoint an mpi application at specific points in my program. So, i created a small function as follows: void mychkpt() { system ("ompi-checkpoint -v `pidof mpirun`"); } and I am calling it in my MPI application at specific points. e.g ## pri

Re: [OMPI users] Checkpointing automatically at regular intervals

2009-07-02 Thread Josh Hursey
running application. I would imagine an automatic restart from the last checkpoint in case of failure would also be interesting. Many thanks. Regards, Kritiraj --- On Tue, 6/30/09, Josh Hursey wrote: From: Josh Hursey Subject: Re: [OMPI users] Checkpointing automatically at regular

Re: [OMPI users] Checkpointing automatically at regular intervals

2009-06-30 Thread Kritiraj Sajadah
restart from the last checkpoint in case of failure would also be interesting. Many thanks. Regards, Kritiraj --- On Tue, 6/30/09, Josh Hursey wrote: > From: Josh Hursey > Subject: Re: [OMPI users] Checkpointing automatically at regular intervals > To: "Open MPI Users" >

Re: [OMPI users] Checkpointing automatically at regular intervals

2009-06-30 Thread Reuti
Hi, Am 30.06.2009 um 14:29 schrieb Kritiraj Sajadah: I can manually checkpoint an MPI application using OPEN MPI and BLCR. However, I now want to checkpointing my application automatically at every 5 minutes. Is there a way in OPEN MPI to ensure automatic checkpointing without the

Re: [OMPI users] Checkpointing automatically at regular intervals

2009-06-30 Thread Josh Hursey
Currently, there is no mechanism to checkpoint every X minutes in Open MPI. As mentioned below you can use a script to initiate the checkpoint every X minutes. Alternatively it should not be too difficult to add such a feature to Open MPI. If enough people would be interested I can file a

Re: [OMPI users] Checkpointing automatically at regular intervals

2009-06-30 Thread Mohamed Slim bouguerra
Hi, I think that you can write a simple script such as: wihle `pgrep mpirun` != "" ompi-checkpoint `pidof mpirun` sleep 5 done Le 30 juin 09 à 14:29, Kritiraj Sajadah a écrit : Dear All, I can manually checkpoint an MPI application using OPEN MPI and BLCR. However, I now want to ch

[OMPI users] Checkpointing automatically at regular intervals

2009-06-30 Thread Kritiraj Sajadah
Dear All, I can manually checkpoint an MPI application using OPEN MPI and BLCR. However, I now want to checkpointing my application automatically at every 5 minutes. Is there a way in OPEN MPI to ensure automatic checkpointing without the user intervention while the application is run

Re: [OMPI users] Checkpointing configuration problem

2009-05-01 Thread Yaakoub El Khamra
You might want to consider --enable-mpi-threads=yes Regards Yaakoub El Khamra On Fri, May 1, 2009 at 3:17 PM, Kritiraj Sajadah wrote: > > Dear all, >            I am trying to install openmpi 1.3 on my laptop. I successfully > installed BLCR in /usr/local. > > When installing openmpi using

Re: [OMPI users] Checkpointing configuration problem

2009-05-01 Thread Josh Hursey
Try replacing "--enable-MPI-thread" with "--enable-mpi-threads". That should fix it. -- Josh On May 1, 2009, at 4:17 PM, Kritiraj Sajadah wrote: Dear all, I am trying to install openmpi 1.3 on my laptop. I successfully installed BLCR in /usr/local. When installing openmpi u

[OMPI users] Checkpointing configuration problem

2009-05-01 Thread Kritiraj Sajadah
Dear all, I am trying to install openmpi 1.3 on my laptop. I successfully installed BLCR in /usr/local. When installing openmpi using the following options: ./configure --prefix=/usr/local --with-ft=cr --enable-ft-thread --enable-MPI-thread --with-blcr=/usr/local I got the follo

Re: [OMPI users] Checkpointing hangs with OpenMPI-1.3.1

2009-04-28 Thread Josh Hursey
12:34 AM Please respond to Open MPI Users To Open MPI Users cc Subject Re: [OMPI users] Checkpointing hangs with OpenMPI-1.3.1 I still have not been able to reproduce the hang, but I'm still looking into it. I did commit a fix for the datatype copy error that I mentioned (r21080 i

Re: [OMPI users] Checkpointing hangs with OpenMPI-1.3.1

2009-04-28 Thread neeraj
Owned Subsidiary of TATA SONS Ltd) P: +91.9225520634 Josh Hursey Sent by: users-boun...@open-mpi.org 04/28/2009 12:34 AM Please respond to Open MPI Users To Open MPI Users cc Subject Re: [OMPI users] Checkpointing hangs with OpenMPI-1.3.1 I still have not been able to reproduce the

Re: [OMPI users] Checkpointing hangs with OpenMPI-1.3.1

2009-04-27 Thread Josh Hursey
I still have not been able to reproduce the hang, but I'm still looking into it. I did commit a fix for the datatype copy error that I mentioned (r21080 in the Open MPI trunk, and it is in the pipeline for v1.3). Can you put in a print statement before MPI_Finalize, then try the program a

Re: [OMPI users] Checkpointing hangs with OpenMPI-1.3.1

2009-04-27 Thread Josh Hursey
Sorry for the long delay to respond. It is a bit odd that the hang does not occur when running on only one host. I suspect that is more due to timing than anything else. I am not able to reproduce the hang at the moment, but I do get an occasional datatype copy error which could be symptoma

[OMPI users] Checkpointing hangs with OpenMPI-1.3.1

2009-04-10 Thread neeraj
Dear All, I am trying to checkpoint a test application using openmpi-1.3.1, but fails to do so, when run multiple process on different nodes. Checkpointing runs fine, if process is running on the same node along with mpirun process. But the moment i launch MPI process from different node,

Re: [OMPI users] Checkpointing fails with BLCR 0.8.0b2

2008-12-10 Thread Josh Hursey
This issue has been addressed in the Open MPI trunk with r20114. This fix will be moved to the v1.3 branch in the next few days and will be in the eventual v1.3.0 release. Thanks again for the heads up on this issue. Cheers, Josh On Dec 4, 2008, at 7:50 AM, Josh Hursey wrote: Matthias, T

Re: [OMPI users] Checkpointing fails with BLCR 0.8.0b2

2008-12-04 Thread Josh Hursey
Matthias, Thank you for the heads up. I'll work on a fix that uses the cr_request_checkpoint() interface instead of cr_request_file() when appropriate. I filed a ticket about it if you are interested in tracking the progress on this bug: https://svn.open-mpi.org/trac/ompi/ticket/1691

[OMPI users] Checkpointing fails with BLCR 0.8.0b2

2008-12-04 Thread Matthias Hovestadt
Hi! Berkely recently released a new version of their BLCR. They already marked the function cr_request_file as deprecated in BLCR 0.7.3. Now they removed deprecated functions from libcr API. Since checkpointing support of OMPI is using cr_request_file, all checkpointing operations fail with BLCR

Re: [OMPI users] Checkpointing a restarted app fails

2008-09-24 Thread Matthias Hovestadt
Hi Josh! I believe this is now fixed in the trunk. I was able to reproduce with the current trunk and committed a fix a few minutes ago in r19601. So the fix should be in tonight's tarball (or you can grab it from SVN). I've made a request to have the patch applied to v1.3, but that may take a d

Re: [OMPI users] Checkpointing a restarted app fails

2008-09-22 Thread Josh Hursey
I believe this is now fixed in the trunk. I was able to reproduce with the current trunk and committed a fix a few minutes ago in r19601. So the fix should be in tonight's tarball (or you can grab it from SVN). I've made a request to have the patch applied to v1.3, but that may take a day o

Re: [OMPI users] Checkpointing a restarted app fails

2008-09-17 Thread Matthias Hovestadt
Hi Josh! First of all, thanks a lot for replying. :-) When executing this checkpoint command, the running application directly aborts, even though I did not specify the "--term" option: -- mpirun noticed that process ran

Re: [OMPI users] Checkpointing a restarted app fails

2008-09-17 Thread Josh Hursey
On Sep 16, 2008, at 11:18 PM, Matthias Hovestadt wrote: Hi! Since I am interested in fault tolerance, checkpointing and restart of OMPI is an intersting feature for me. So I installed BLCR 0.7.3 as well as OMPI from SVN (rev. 19553). For OMPI I followed the instructions in the "Fault Tolerance

[OMPI users] Checkpointing a restarted app fails

2008-09-16 Thread Matthias Hovestadt
Hi! Since I am interested in fault tolerance, checkpointing and restart of OMPI is an intersting feature for me. So I installed BLCR 0.7.3 as well as OMPI from SVN (rev. 19553). For OMPI I followed the instructions in the "Fault Tolerance Guide" in the OMPI wiki: ./autogen.sh ./configure --with-