Nope. I probably won't get to it for a while. I'll let you know if I do. On Aug 13, 2010, at 12:17 PM, <ananda.mu...@wipro.com> <ananda.mu...@wipro.com> wrote:
> OK, I will do that. > > But did you try this program on a system where the latest trunk is > installed? Were you successful in checkpointing? > > - Ananda > -----Original Message----- > Message: 9 > Date: Fri, 13 Aug 2010 10:21:29 -0400 > From: Joshua Hursey <jjhur...@open-mpi.org> > Subject: Re: [OMPI users] users Digest, Vol 1658, Issue 2 > To: Open MPI Users <us...@open-mpi.org> > Message-ID: <7a43615b-a462-4c72-8112-496653d8f...@open-mpi.org> > Content-Type: text/plain; charset=us-ascii > > I probably won't have an opportunity to work on reproducing this on the > 1.4.2. The trunk has a bunch of bug fixes that probably will not be > backported to the 1.4 series (things have changed too much since that > branch). So I would suggest trying the 1.5 series. > > -- Josh > > On Aug 13, 2010, at 10:12 AM, <ananda.mu...@wipro.com> > <ananda.mu...@wipro.com> wrote: > >> Josh >> >> I am having problems compiling the sources from the latest trunk. It >> complains of libgomp.spec missing even though that file exists on my >> system. I will see if I have to change any other environment variables >> to have a successful compilation. I will keep you posted. >> >> BTW, were you successful in reproducing the problem on a system with >> OpenMPI 1.4.2? >> >> Thanks >> Ananda >> -----Original Message----- >> Date: Thu, 12 Aug 2010 09:12:26 -0400 >> From: Joshua Hursey <jjhur...@open-mpi.org> >> Subject: Re: [OMPI users] Checkpointing mpi4py program >> To: Open MPI Users <us...@open-mpi.org> >> Message-ID: <1f1445ab-9208-4ef0-af25-5926bd53c...@open-mpi.org> >> Content-Type: text/plain; charset=us-ascii >> >> Can you try this with the current trunk (r23587 or later)? >> >> I just added a number of new features and bug fixes, and I would be >> interested to see if it fixes the problem. In particular I suspect > that >> this might be related to the Init/Finalize bounding of the checkpoint >> region. >> >> -- Josh >> >> On Aug 10, 2010, at 2:18 PM, <ananda.mu...@wipro.com> >> <ananda.mu...@wipro.com> wrote: >> >>> Josh >>> >>> Please find attached is the python program that reproduces the hang >> that >>> I described. Initial part of this file describes the prerequisite >>> modules and the steps to reproduce the problem. Please let me know if >>> you have any questions in reproducing the hang. >>> >>> Please note that, if I add the following lines at the end of the >> program >>> (in case sleep_time is True), the problem disappears ie; program >> resumes >>> successfully after successful completion of checkpoint. >>> # Add following lines at the end for sleep_time is True >>> else: >>> time.sleep(0.1) >>> # End of added lines >>> >>> >>> Thanks a lot for your time in looking into this issue. >>> >>> Regards >>> Ananda >>> >>> Ananda B Mudar, PMP >>> Senior Technical Architect >>> Wipro Technologies >>> Ph: 972 765 8093 >>> ananda.mu...@wipro.com >>> >>> >>> -----Original Message----- >>> Date: Mon, 9 Aug 2010 16:37:58 -0400 >>> From: Joshua Hursey <jjhur...@open-mpi.org> >>> Subject: Re: [OMPI users] Checkpointing mpi4py program >>> To: Open MPI Users <us...@open-mpi.org> >>> Message-ID: <270bd450-743a-4662-9568-1fedfcc6f...@open-mpi.org> >>> Content-Type: text/plain; charset=windows-1252 >>> >>> I have not tried to checkpoint an mpi4py application, so I cannot say >>> for sure if it works or not. You might be hitting something with the >>> Python runtime interacting in an odd way with either Open MPI or > BLCR. >>> >>> Can you attach a debugger and get a backtrace on a stuck checkpoint? >>> That might show us where things are held up. >>> >>> -- Josh >>> >>> >>> On Aug 9, 2010, at 4:04 PM, <ananda.mu...@wipro.com> >>> <ananda.mu...@wipro.com> wrote: >>> >>>> Hi >>>> >>>> I have integrated mpi4py with openmpi 1.4.2 that was built with BLCR >>> 0.8.2. When I run ompi-checkpoint on the program written using > mpi4py, >> I >>> see that program doesn?t resume sometimes after successful checkpoint >>> creation. This doesn?t occur always meaning the program resumes after >>> successful checkpoint creation most of the time and completes >>> successfully. Has anyone tested the checkpoint/restart functionality >>> with mpi4py programs? Are there any best practices that I should keep >> in >>> mind while checkpointing mpi4py programs? >>>> >>>> Thanks for your time >>>> - Ananda >>>> Please do not print this email unless it is absolutely necessary. >>>> >>>> The information contained in this electronic message and any >>> attachments to this message are intended for the exclusive use of the >>> addressee(s) and may contain proprietary, confidential or privileged >>> information. If you are not the intended recipient, you should not >>> disseminate, distribute or copy this e-mail. Please notify the sender >>> immediately and destroy all copies of this message and any >> attachments. >>>> >>>> WARNING: Computer viruses can be transmitted via email. The > recipient >>> should check this email and any attachments for the presence of >> viruses. >>> The company accepts no liability for any damage caused by any virus >>> transmitted by this email. >>>> >>>> www.wipro.com >>>> >>>> _______________________________________________ >>>> users mailing list >>>> us...@open-mpi.org >>>> http://www.open-mpi.org/mailman/listinfo.cgi/users >> >> Please do not print this email unless it is absolutely necessary. >> >> The information contained in this electronic message and any > attachments to this message are intended for the exclusive use of the > addressee(s) and may contain proprietary, confidential or privileged > information. If you are not the intended recipient, you should not > disseminate, distribute or copy this e-mail. Please notify the sender > immediately and destroy all copies of this message and any attachments. >> >> WARNING: Computer viruses can be transmitted via email. The recipient > should check this email and any attachments for the presence of viruses. > The company accepts no liability for any damage caused by any virus > transmitted by this email. >> >> www.wipro.com >> >> _______________________________________________ >> users mailing list >> us...@open-mpi.org >> http://www.open-mpi.org/mailman/listinfo.cgi/users > > Please do not print this email unless it is absolutely necessary. > > The information contained in this electronic message and any attachments to > this message are intended for the exclusive use of the addressee(s) and may > contain proprietary, confidential or privileged information. If you are not > the intended recipient, you should not disseminate, distribute or copy this > e-mail. Please notify the sender immediately and destroy all copies of this > message and any attachments. > > WARNING: Computer viruses can be transmitted via email. The recipient should > check this email and any attachments for the presence of viruses. The company > accepts no liability for any damage caused by any virus transmitted by this > email. > > www.wipro.com > > _______________________________________________ > users mailing list > us...@open-mpi.org > http://www.open-mpi.org/mailman/listinfo.cgi/users >