Nope. I probably won't get to it for a while. I'll let you know if I do.

On Aug 13, 2010, at 12:17 PM, <ananda.mu...@wipro.com> <ananda.mu...@wipro.com> 
wrote:

> OK, I will do that.
> 
> But did you try this program on a system where the latest trunk is
> installed? Were you successful in checkpointing?
> 
> - Ananda
> -----Original Message-----
> Message: 9
> Date: Fri, 13 Aug 2010 10:21:29 -0400
> From: Joshua Hursey <jjhur...@open-mpi.org>
> Subject: Re: [OMPI users] users Digest, Vol 1658, Issue 2
> To: Open MPI Users <us...@open-mpi.org>
> Message-ID: <7a43615b-a462-4c72-8112-496653d8f...@open-mpi.org>
> Content-Type: text/plain; charset=us-ascii
> 
> I probably won't have an opportunity to work on reproducing this on the
> 1.4.2. The trunk has a bunch of bug fixes that probably will not be
> backported to the 1.4 series (things have changed too much since that
> branch). So I would suggest trying the 1.5 series.
> 
> -- Josh
> 
> On Aug 13, 2010, at 10:12 AM, <ananda.mu...@wipro.com>
> <ananda.mu...@wipro.com> wrote:
> 
>> Josh
>> 
>> I am having problems compiling the sources from the latest trunk. It
>> complains of libgomp.spec missing even though that file exists on my
>> system. I will see if I have to change any other environment variables
>> to have a successful compilation. I will keep you posted.
>> 
>> BTW, were you successful in reproducing the problem on a system with
>> OpenMPI 1.4.2?
>> 
>> Thanks
>> Ananda
>> -----Original Message-----
>> Date: Thu, 12 Aug 2010 09:12:26 -0400
>> From: Joshua Hursey <jjhur...@open-mpi.org>
>> Subject: Re: [OMPI users] Checkpointing mpi4py program
>> To: Open MPI Users <us...@open-mpi.org>
>> Message-ID: <1f1445ab-9208-4ef0-af25-5926bd53c...@open-mpi.org>
>> Content-Type: text/plain; charset=us-ascii
>> 
>> Can you try this with the current trunk (r23587 or later)?
>> 
>> I just added a number of new features and bug fixes, and I would be
>> interested to see if it fixes the problem. In particular I suspect
> that
>> this might be related to the Init/Finalize bounding of the checkpoint
>> region.
>> 
>> -- Josh
>> 
>> On Aug 10, 2010, at 2:18 PM, <ananda.mu...@wipro.com>
>> <ananda.mu...@wipro.com> wrote:
>> 
>>> Josh
>>> 
>>> Please find attached is the python program that reproduces the hang
>> that
>>> I described. Initial part of this file describes the prerequisite
>>> modules and the steps to reproduce the problem. Please let me know if
>>> you have any questions in reproducing the hang.
>>> 
>>> Please note that, if I add the following lines at the end of the
>> program
>>> (in case sleep_time is True), the problem disappears ie; program
>> resumes
>>> successfully after successful completion of checkpoint.
>>> # Add following lines at the end for sleep_time is True
>>> else:
>>>     time.sleep(0.1)
>>> # End of added lines
>>> 
>>> 
>>> Thanks a lot for your time in looking into this issue.
>>> 
>>> Regards
>>> Ananda
>>> 
>>> Ananda B Mudar, PMP
>>> Senior Technical Architect
>>> Wipro Technologies
>>> Ph: 972 765 8093
>>> ananda.mu...@wipro.com
>>> 
>>> 
>>> -----Original Message-----
>>> Date: Mon, 9 Aug 2010 16:37:58 -0400
>>> From: Joshua Hursey <jjhur...@open-mpi.org>
>>> Subject: Re: [OMPI users] Checkpointing mpi4py program
>>> To: Open MPI Users <us...@open-mpi.org>
>>> Message-ID: <270bd450-743a-4662-9568-1fedfcc6f...@open-mpi.org>
>>> Content-Type: text/plain; charset=windows-1252
>>> 
>>> I have not tried to checkpoint an mpi4py application, so I cannot say
>>> for sure if it works or not. You might be hitting something with the
>>> Python runtime interacting in an odd way with either Open MPI or
> BLCR.
>>> 
>>> Can you attach a debugger and get a backtrace on a stuck checkpoint?
>>> That might show us where things are held up.
>>> 
>>> -- Josh
>>> 
>>> 
>>> On Aug 9, 2010, at 4:04 PM, <ananda.mu...@wipro.com>
>>> <ananda.mu...@wipro.com> wrote:
>>> 
>>>> Hi
>>>> 
>>>> I have integrated mpi4py with openmpi 1.4.2 that was built with BLCR
>>> 0.8.2. When I run ompi-checkpoint on the program written using
> mpi4py,
>> I
>>> see that program doesn?t resume sometimes after successful checkpoint
>>> creation. This doesn?t occur always meaning the program resumes after
>>> successful checkpoint creation most of the time and completes
>>> successfully. Has anyone tested the checkpoint/restart functionality
>>> with mpi4py programs? Are there any best practices that I should keep
>> in
>>> mind while checkpointing mpi4py programs?
>>>> 
>>>> Thanks for your time
>>>> -          Ananda
>>>> Please do not print this email unless it is absolutely necessary.
>>>> 
>>>> The information contained in this electronic message and any
>>> attachments to this message are intended for the exclusive use of the
>>> addressee(s) and may contain proprietary, confidential or privileged
>>> information. If you are not the intended recipient, you should not
>>> disseminate, distribute or copy this e-mail. Please notify the sender
>>> immediately and destroy all copies of this message and any
>> attachments.
>>>> 
>>>> WARNING: Computer viruses can be transmitted via email. The
> recipient
>>> should check this email and any attachments for the presence of
>> viruses.
>>> The company accepts no liability for any damage caused by any virus
>>> transmitted by this email.
>>>> 
>>>> www.wipro.com
>>>> 
>>>> _______________________________________________
>>>> users mailing list
>>>> us...@open-mpi.org
>>>> http://www.open-mpi.org/mailman/listinfo.cgi/users
>> 
>> Please do not print this email unless it is absolutely necessary. 
>> 
>> The information contained in this electronic message and any
> attachments to this message are intended for the exclusive use of the
> addressee(s) and may contain proprietary, confidential or privileged
> information. If you are not the intended recipient, you should not
> disseminate, distribute or copy this e-mail. Please notify the sender
> immediately and destroy all copies of this message and any attachments. 
>> 
>> WARNING: Computer viruses can be transmitted via email. The recipient
> should check this email and any attachments for the presence of viruses.
> The company accepts no liability for any damage caused by any virus
> transmitted by this email. 
>> 
>> www.wipro.com
>> 
>> _______________________________________________
>> users mailing list
>> us...@open-mpi.org
>> http://www.open-mpi.org/mailman/listinfo.cgi/users
> 
> Please do not print this email unless it is absolutely necessary. 
> 
> The information contained in this electronic message and any attachments to 
> this message are intended for the exclusive use of the addressee(s) and may 
> contain proprietary, confidential or privileged information. If you are not 
> the intended recipient, you should not disseminate, distribute or copy this 
> e-mail. Please notify the sender immediately and destroy all copies of this 
> message and any attachments. 
> 
> WARNING: Computer viruses can be transmitted via email. The recipient should 
> check this email and any attachments for the presence of viruses. The company 
> accepts no liability for any damage caused by any virus transmitted by this 
> email. 
> 
> www.wipro.com
> 
> _______________________________________________
> users mailing list
> us...@open-mpi.org
> http://www.open-mpi.org/mailman/listinfo.cgi/users
> 


Reply via email to