from:"Joshua Hursey"

Re: [OMPI users] Related to project ideas in OpenMPI

2011-08-27 Thread Joshua Hursey

com >>>>>>>> +91-8149399160 >>>>>>>> ___ >>>>>>>> users mailing list >>>>>>>> us...@open-mpi.org >>>>>>>> http://www.open-mpi.org/mailman/listin

Re: [OMPI users] BLCR support not building on 1.5.3

2011-05-27 Thread Joshua Hursey

his. I've replied further below: > > > - Original Message - >> From: Joshua Hursey > [...] >> What other configure options are you passing to Open MPI? Specifically the >> configure test will always fail if '--with-ft=cr' is not specified - by >> defaul

Re: [OMPI users] BLCR support not building on 1.5.3

2011-05-27 Thread Joshua Hursey

What version of BLCR are you using? What other configure options are you passing to Open MPI? Specifically the configure test will always fail if '--with-ft=cr' is not specified - by default Open MPI will only build the BLCR component if C/R FT is requested by the user. Can you send a zip'ed up

Re: [OMPI users] Unknown overhead in "mpirun -am ft-enable-cr"

2011-03-03 Thread Joshua Hursey

here are also 2 sample result files (cpu.256^3.8N.*) which show the > execution time difference between 2 cases. > Hope you can take some time to find the problem. > Thanks for your kindness. > > Best Regards, > Nguyen Toan > > On Wed, Mar 2, 2011 at 3:00 AM, Joshua Hurs

Re: [OMPI users] Unknown overhead in "mpirun -am ft-enable-cr"

2011-03-01 Thread Joshua Hursey

eter you mentioned but it did not help, the unknown > overhead still exists. > Here I attach the output of 'ompi_info', both version 1.5 and 1.5.1. > Hope you can find out the problem. > Thank you. > > Regards, > Nguyen Toan > > On Wed, Feb 9, 2011 at 11:08 PM, Josh

Re: [OMPI users] --without-tm [SEC=UNCLASSIFIED]

2011-02-21 Thread Joshua Hursey

mailing list > us...@open-mpi.org > http://www.open-mpi.org/mailman/listinfo.cgi/users > Joshua Hursey Postdoctoral Research Associate Oak Ridge National Laboratory http://users.nccs.gov/~jjhursey

Re: [OMPI users] Unknown overhead in "mpirun -am ft-enable-cr"

2011-02-09 Thread Joshua Hursey

nd MPI_Wait. Also I want to make only one checkpoint > per application execution for my purpose, but the unknown overhead exists > even when no checkpoint was taken. > > Do you have any other idea? > > Regards, > Nguyen Toan > > > On Wed, Feb 9, 2011 at 12:41 AM, Josh

Re: [OMPI users] Unknown overhead in "mpirun -am ft-enable-cr"

2011-02-08 Thread Joshua Hursey

ead, and how to eliminate it? > Thanks. > > Regards, > Nguyen > ___ > users mailing list > us...@open-mpi.org > http://www.open-mpi.org/mailman/listinfo.cgi/users Joshua Hursey Postdoctoral Research Associate Oak Ridge National Laboratory http://users.nccs.gov/~jjhursey

Re: [OMPI users] allow job to survive process death

2011-01-27 Thread Joshua Hursey

On Jan 27, 2011, at 9:47 AM, Reuti wrote: > Am 27.01.2011 um 15:23 schrieb Joshua Hursey: > >> The current version of Open MPI does not support continued operation of an >> MPI application after process failure within a job. If a process dies, so >> will the MPI job. N

Re: [OMPI users] allow job to survive process death

2011-01-27 Thread Joshua Hursey

this group into > a working communicator? > > Thanks, > Kirk > ___ > users mailing list > us...@open-mpi.org > http://www.open-mpi.org/mailman/listinfo.cgi/users > Joshua Hursey Postdoctoral

[OMPI users] Fwd: BLCR at SC10

2010-11-14 Thread Joshua Hursey

roup > HPC Research Department Tel: +1-510-495-2352 > Lawrence Berkeley National Laboratory Fax: +1-510-486-6900 > Joshua Hursey Postdoctoral Research Associate Oak Ridge National Laboratory http://users.nccs.gov/~jjhursey

Re: [OMPI users] Running on crashing nodes

2010-09-24 Thread Joshua Hursey

drei > > > ___ > users mailing list > us...@open-mpi.org > http://www.open-mpi.org/mailman/listinfo.cgi/users > > > _______ > users mailing list > us...@open-mpi.org > http://www.open-mpi.org/mailman/listinfo.cgi/users > > Joshua Hursey Postdoctoral Research Associate Oak Ridge National Laboratory http://www.cs.indiana.edu/~jjhursey

Re: [OMPI users] Question on staging in checkpoint

2010-09-13 Thread Joshua Hursey

should > check this email and any attachments for the presence of viruses. The company > accepts no liability for any damage caused by any virus transmitted by this > email. > > www.wipro.com > > Joshua Hursey Postdoctoral Research Associate Oak Ridge National Laboratory http://www.cs.indiana.edu/~jjhursey

Re: [OMPI users] High Checkpoint Overhead Ratio

2010-08-31 Thread Joshua Hursey

Running - ... > [blade02:27130] [221.25 / 221.71] Finished - > ompi_global_snapshot_27115.ckpt > Snapshot Ref.: 0 ompi_global_snapshot_27115.ckpt > > As you see, it takes 200+ secconds to checkpoint. btw, what the former and > latter number represen

Re: [OMPI users] Checkpoint problem with BLCR + OpenMPI

2010-08-27 Thread Joshua Hursey

nting directly to the shared file system causes the application to remain suspended until its file is completely written, which may take a considerable amount of time depending on the speed of the file system. Staging considerably reduces the impact of checkpointing on application runtime. I sugg

Re: [OMPI users] OpenMPI with BLCR runtime problem

2010-08-24 Thread Joshua Hursey

doc/html/FAQ.html#prelink If that doesn't work then I would suggest trying the current Open MPI trunk. There should not be any problem with using NFS, since this is occurring in MPI_Init, this is well before we ever try to use the file system. I also test with NFS, and local staging on a f

[OMPI users] Checkpoint/Restart Process Migration and Automatic Recovery Support

2010-08-19 Thread Joshua Hursey

I am pleased to announce that Open MPI now supports checkpoint/restart process migration and automatic recovery. This is in addition to our current support for more traditional checkpoint/restart fault tolerance. These new features were introduced in the Open MPI development trunk in commit r235

Re: [OMPI users] Checkpointing mpi4py program

2010-08-18 Thread Joshua Hursey

hed the stack traces of all the MPI processes that are part of > the mpirun. I really appreciate if you can take a look at the stack trace and > let m e know the potential problem. I am kind of stuck at this point and need > your assistance to move forward. Please let me know if you

Re: [OMPI users] Checkpointing mpi4py program

2010-08-13 Thread Joshua Hursey

anda > -Original Message- > Message: 9 > Date: Fri, 13 Aug 2010 10:21:29 -0400 > From: Joshua Hursey > Subject: Re: [OMPI users] users Digest, Vol 1658, Issue 2 > To: Open MPI Users > Message-ID: <7a43615b-a462-4c72-8112-496653d8f...@open-mpi.org> > Content-Typ

Re: [OMPI users] users Digest, Vol 1658, Issue 2

2010-08-13 Thread Joshua Hursey

I will keep you posted. > > BTW, were you successful in reproducing the problem on a system with > OpenMPI 1.4.2? > > Thanks > Ananda > -Original Message- > Date: Thu, 12 Aug 2010 09:12:26 -0400 > From: Joshua Hursey > Subject: Re: [OMPI users] Checkpoint

Re: [OMPI users] Checkpointing mpi4py program

2010-08-12 Thread Joshua Hursey

...@wipro.com > > > -Original Message- > Date: Mon, 9 Aug 2010 16:37:58 -0400 > From: Joshua Hursey > Subject: Re: [OMPI users] Checkpointing mpi4py program > To: Open MPI Users > Message-ID: <270bd450-743a-4662-9568-1fedfcc6f...@open-mpi.org> > Content-Ty

Re: [OMPI users] Checkpointing mpi4py program

2010-08-09 Thread Joshua Hursey

I have not tried to checkpoint an mpi4py application, so I cannot say for sure if it works or not. You might be hitting something with the Python runtime interacting in an odd way with either Open MPI or BLCR. Can you attach a debugger and get a backtrace on a stuck checkpoint? That might show

Re: [OMPI users] Segmentation fault (11)

2010-03-31 Thread Joshua Hursey

That is interesting. I cannot think of any reason why this might be causing a problem just in Open MPI. popen() is similar to fork()/system() so you have to be careful with interconnects that do not play nice with fork(), like openib. But since it looks like you are excluding openib, this should

Re: [OMPI users] low efficiency when we use --am ft-enable-cr to checkpoint

2010-03-05 Thread Joshua Hursey

cr_thread_sleep_wait=1000 Which will throttle down the thread when the application is in the MPI library. You might want to play around with these MCA parameters to tune the aggressiveness of the C/R thread to your performance needs. In the mean time I will look into finding better default para

Re: [OMPI users] low efficiency when we use --am ft-enable-cr to checkpoint

2010-03-04 Thread Joshua Hursey

There is some overhead involved when activating the current C/R functionality in Open MPI due to the wrapping of the internal point-to-point stack. The wrapper (CRCP framework) tracks the signature of each message (not the buffer, so constant time for any size MPI message) so that when we need t

Re: [OMPI users] checkpointing multi node and multi process applications

2010-03-04 Thread Joshua Hursey

On Mar 4, 2010, at 8:17 AM, Fernando Lemos wrote: > On Wed, Mar 3, 2010 at 10:24 PM, Fernando Lemos wrote: > >> Is there anything I can do to provide more information about this bug? >> E.g. try to compile the code in the SVN trunk? I also have kept the >> snapshots intact, I can tar them up an

Re: [OMPI users] Segfault in ompi-restart (ft-enable-cr)

2010-03-03 Thread Joshua Hursey

On Mar 3, 2010, at 3:42 PM, Fernando Lemos wrote: > On Wed, Mar 3, 2010 at 5:31 PM, Joshua Hursey wrote: > >> >> Yes, ompi-restart should be printing a helpful message and exiting normally. >> Thanks for the bug report. I believe that I have seen and fixed this on

Re: [OMPI users] Segfault in ompi-restart (ft-enable-cr)

2010-03-03 Thread Joshua Hursey

On Mar 2, 2010, at 9:17 AM, Fernando Lemos wrote: > On Sun, Feb 28, 2010 at 11:11 PM, Fernando Lemos > wrote: >> Hello, >> >> >> I'm trying to come up with a fault tolerant OpenMPI setup for research >> purposes. I'm doing some tests now, but I'm stuck with a segfault when >> I try to restart

Re: [OMPI users] OpenMPI checkpoint/restart on multiple nodes

2010-02-08 Thread Joshua Hursey

You can use the 'checkpoint to local disk' example to checkpoint and restart without access to a globally shared storage devices. There is an example on the website that does not use a globally mounted file system: http://www.osl.iu.edu/research/ft/ompi-cr/examples.php#uc-ckpt-local What versi

Re: [OMPI users] Checkpoint/Restart error

2010-01-14 Thread Joshua Hursey

On Jan 14, 2010, at 8:20 AM, Andreea Costea wrote: > Hi, > > I wanted to try the C/R feature in OpenMPI version 1.4.1 that I have > downloaded today. When I want to checkpoint I am having the following error > message: > [[65192,0],0] ORTE_ERROR_LOG: Not found in file orte-checkpoint.c at line

Re: [OMPI users] OpenMPI checkpoint/restart

2010-01-14 Thread Joshua Hursey

On Jan 14, 2010, at 2:50 AM, Andreea Costea wrote: > Hei there > > I have some questions regarding checkpoint/restart: > > 1. Until recently I thought that ompi-restart and ompi-restart are used to > checkpoint a process inside an MPI application. Now I reread this and I > realized that actua

Re: [OMPI users] Elementary question on openMPI application location when using PBS submission

2009-12-02 Thread Joshua Hursey

The --preload-* options to 'mpirun' currently use the ssh/scp commands (or rsh/rcp via an MCA parameter) to move files from the machine local to the 'mpirun' command to the compute nodes during launch. This assumes that you have Open MPI already installed on all of the machines. It was an option

Re: [OMPI users] error in checkpointing in open mpi

2009-09-25 Thread Joshua Hursey

On Sep 25, 2009, at 7:10 AM, Mallikarjuna Shastry wrote: dear sir i am sending the details as follows 1. i am using openmpi-1.3.3 and blcr 0.8.2 2. i have installed blcr 0.8.2 first under /root/MS 3. then i installed openmpi 1.3.3 under /root/MS 4 i have configured and installed open mpi as

Re: [OMPI users] How to build OMPI with Checkpoint/restart.

2009-09-17 Thread Joshua Hursey

On Sep 16, 2009, at 8:30 AM, Marcin Stolarek wrote: Hi, It seems I solved my problem. Root of the error was, that I haven't loaded blcr module. So I couldn't checkpoint even one therad application. I am glad to hear that you have things working now. However I stil can't find MCA:blcr i

Re: [OMPI users] Related to project ideas in OpenMPI

Re: [OMPI users] BLCR support not building on 1.5.3

Re: [OMPI users] BLCR support not building on 1.5.3

Re: [OMPI users] Unknown overhead in "mpirun -am ft-enable-cr"

Re: [OMPI users] Unknown overhead in "mpirun -am ft-enable-cr"

Re: [OMPI users] --without-tm [SEC=UNCLASSIFIED]

Re: [OMPI users] Unknown overhead in "mpirun -am ft-enable-cr"

Re: [OMPI users] Unknown overhead in "mpirun -am ft-enable-cr"

Re: [OMPI users] allow job to survive process death

Re: [OMPI users] allow job to survive process death

[OMPI users] Fwd: BLCR at SC10

Re: [OMPI users] Running on crashing nodes

Re: [OMPI users] Question on staging in checkpoint

Re: [OMPI users] High Checkpoint Overhead Ratio

Re: [OMPI users] Checkpoint problem with BLCR + OpenMPI

Re: [OMPI users] OpenMPI with BLCR runtime problem

[OMPI users] Checkpoint/Restart Process Migration and Automatic Recovery Support

Re: [OMPI users] Checkpointing mpi4py program

Re: [OMPI users] Checkpointing mpi4py program

Re: [OMPI users] users Digest, Vol 1658, Issue 2

Re: [OMPI users] Checkpointing mpi4py program

Re: [OMPI users] Checkpointing mpi4py program

Re: [OMPI users] Segmentation fault (11)

Re: [OMPI users] low efficiency when we use --am ft-enable-cr to checkpoint

Re: [OMPI users] low efficiency when we use --am ft-enable-cr to checkpoint

Re: [OMPI users] checkpointing multi node and multi process applications

Re: [OMPI users] Segfault in ompi-restart (ft-enable-cr)

Re: [OMPI users] Segfault in ompi-restart (ft-enable-cr)

Re: [OMPI users] OpenMPI checkpoint/restart on multiple nodes

Re: [OMPI users] Checkpoint/Restart error

Re: [OMPI users] OpenMPI checkpoint/restart

Re: [OMPI users] Elementary question on openMPI application location when using PBS submission

Re: [OMPI users] error in checkpointing in open mpi

Re: [OMPI users] How to build OMPI with Checkpoint/restart.

34 matches

Site Navigation

Mail list logo

Footer information