I probably won't have an opportunity to work on reproducing this on the 1.4.2. 
The trunk has a bunch of bug fixes that probably will not be backported to the 
1.4 series (things have changed too much since that branch). So I would suggest 
trying the 1.5 series.

-- Josh

On Aug 13, 2010, at 10:12 AM, <ananda.mu...@wipro.com> <ananda.mu...@wipro.com> 
wrote:

> Josh
> 
> I am having problems compiling the sources from the latest trunk. It
> complains of libgomp.spec missing even though that file exists on my
> system. I will see if I have to change any other environment variables
> to have a successful compilation. I will keep you posted.
> 
> BTW, were you successful in reproducing the problem on a system with
> OpenMPI 1.4.2?
> 
> Thanks
> Ananda
> -----Original Message-----
> Date: Thu, 12 Aug 2010 09:12:26 -0400
> From: Joshua Hursey <jjhur...@open-mpi.org>
> Subject: Re: [OMPI users] Checkpointing mpi4py program
> To: Open MPI Users <us...@open-mpi.org>
> Message-ID: <1f1445ab-9208-4ef0-af25-5926bd53c...@open-mpi.org>
> Content-Type: text/plain; charset=us-ascii
> 
> Can you try this with the current trunk (r23587 or later)?
> 
> I just added a number of new features and bug fixes, and I would be
> interested to see if it fixes the problem. In particular I suspect that
> this might be related to the Init/Finalize bounding of the checkpoint
> region.
> 
> -- Josh
> 
> On Aug 10, 2010, at 2:18 PM, <ananda.mu...@wipro.com>
> <ananda.mu...@wipro.com> wrote:
> 
>> Josh
>> 
>> Please find attached is the python program that reproduces the hang
> that
>> I described. Initial part of this file describes the prerequisite
>> modules and the steps to reproduce the problem. Please let me know if
>> you have any questions in reproducing the hang.
>> 
>> Please note that, if I add the following lines at the end of the
> program
>> (in case sleep_time is True), the problem disappears ie; program
> resumes
>> successfully after successful completion of checkpoint.
>> # Add following lines at the end for sleep_time is True
>> else:
>>      time.sleep(0.1)
>> # End of added lines
>> 
>> 
>> Thanks a lot for your time in looking into this issue.
>> 
>> Regards
>> Ananda
>> 
>> Ananda B Mudar, PMP
>> Senior Technical Architect
>> Wipro Technologies
>> Ph: 972 765 8093
>> ananda.mu...@wipro.com
>> 
>> 
>> -----Original Message-----
>> Date: Mon, 9 Aug 2010 16:37:58 -0400
>> From: Joshua Hursey <jjhur...@open-mpi.org>
>> Subject: Re: [OMPI users] Checkpointing mpi4py program
>> To: Open MPI Users <us...@open-mpi.org>
>> Message-ID: <270bd450-743a-4662-9568-1fedfcc6f...@open-mpi.org>
>> Content-Type: text/plain; charset=windows-1252
>> 
>> I have not tried to checkpoint an mpi4py application, so I cannot say
>> for sure if it works or not. You might be hitting something with the
>> Python runtime interacting in an odd way with either Open MPI or BLCR.
>> 
>> Can you attach a debugger and get a backtrace on a stuck checkpoint?
>> That might show us where things are held up.
>> 
>> -- Josh
>> 
>> 
>> On Aug 9, 2010, at 4:04 PM, <ananda.mu...@wipro.com>
>> <ananda.mu...@wipro.com> wrote:
>> 
>>> Hi
>>> 
>>> I have integrated mpi4py with openmpi 1.4.2 that was built with BLCR
>> 0.8.2. When I run ompi-checkpoint on the program written using mpi4py,
> I
>> see that program doesn?t resume sometimes after successful checkpoint
>> creation. This doesn?t occur always meaning the program resumes after
>> successful checkpoint creation most of the time and completes
>> successfully. Has anyone tested the checkpoint/restart functionality
>> with mpi4py programs? Are there any best practices that I should keep
> in
>> mind while checkpointing mpi4py programs?
>>> 
>>> Thanks for your time
>>> -          Ananda
>>> Please do not print this email unless it is absolutely necessary.
>>> 
>>> The information contained in this electronic message and any
>> attachments to this message are intended for the exclusive use of the
>> addressee(s) and may contain proprietary, confidential or privileged
>> information. If you are not the intended recipient, you should not
>> disseminate, distribute or copy this e-mail. Please notify the sender
>> immediately and destroy all copies of this message and any
> attachments.
>>> 
>>> WARNING: Computer viruses can be transmitted via email. The recipient
>> should check this email and any attachments for the presence of
> viruses.
>> The company accepts no liability for any damage caused by any virus
>> transmitted by this email.
>>> 
>>> www.wipro.com
>>> 
>>> _______________________________________________
>>> users mailing list
>>> us...@open-mpi.org
>>> http://www.open-mpi.org/mailman/listinfo.cgi/users
> 
> Please do not print this email unless it is absolutely necessary. 
> 
> The information contained in this electronic message and any attachments to 
> this message are intended for the exclusive use of the addressee(s) and may 
> contain proprietary, confidential or privileged information. If you are not 
> the intended recipient, you should not disseminate, distribute or copy this 
> e-mail. Please notify the sender immediately and destroy all copies of this 
> message and any attachments. 
> 
> WARNING: Computer viruses can be transmitted via email. The recipient should 
> check this email and any attachments for the presence of viruses. The company 
> accepts no liability for any damage caused by any virus transmitted by this 
> email. 
> 
> www.wipro.com
> 
> _______________________________________________
> users mailing list
> us...@open-mpi.org
> http://www.open-mpi.org/mailman/listinfo.cgi/users
> 


Reply via email to