from:"ananda.mudar"

[OMPI users] Question on staging in checkpoint

2010-09-13 Thread ananda.mudar

Hi I was trying out the staging option in checkpoint where I save the checkpoint image in local file system and have the image transferred to global filesystem in the background. As part of the background process I see that the "scp" command is launched to transfer the images from local file sys

[OMPI users] MPI_Bcast() Vs paired MPI_Send() & MPI_Recv()

2010-09-01 Thread ananda.mudar

Hi If I replace MPI_Bcast() with a paired MPI_Send() and MPI_Recv() calls, what kind of impact does it have on the performance of the program? Are there any benchmarks of MPI_Bcast() vs paired MPI_Send() and MPI_Recv()?? Thanks Ananda Please do not print this email unless it is absolutely

Re: [OMPI users] Checkpointing mpi4py program (Probably bcast issue)

2010-08-20 Thread ananda.mudar

Josh I have few more observations that I want to share with you. I modified the earlier C program little bit by making two MPI_Bcast() calls inside while loop for 10 seconds. The issue of MPI_Bcast() failing with ERR_TRUNCATE error message resurfaces when I call checkpoint on this program. Int

Re: [OMPI users] Checkpointing mpi4py program (Probably bcast issue)

2010-08-18 Thread ananda.mudar

Josh Thanks for addressing the issue. I will try the new version that has your fix and let you know. BTW, I have been in touch with mpi4py team also to debug this issue. According to mpi4py team, MPI_Bcast() is implemented with two collective calls: First one with MPI_Bcast() of single intege

Re: [OMPI users] Checkpointing mpi4py program

2010-08-16 Thread ananda.mudar

Josh I have one more update on my observation while analyzing this issue. Just to refresh, I am using openmpi-trunk release 23596 with mpi4py-1.2.1 and BLCR 0.8.2. When I checkpoint the python script written using mpi4py, the program doesn't progress after the checkpoint is taken successfully

Re: [OMPI users] Checkpointing mpi4py program

2010-08-16 Thread ananda.mudar

Josh I tried running the mpi4py program with the latest trunk version of openmpi. I have compiled openmpi-1.7a1r23596 from trunk and recompiled mpi4py to use this library. Unfortunately I see the same behavior as I have seen with openmpi 1.4.2 ie; checkpoint will be successful but the program does

Re: [OMPI users] Checkpointing mpi4py program

2010-08-13 Thread ananda.mudar

OK, I will do that. But did you try this program on a system where the latest trunk is installed? Were you successful in checkpointing? - Ananda -Original Message- Message: 9 List-Post: users@lists.open-mpi.org Date: Fri, 13 Aug 2010 10:21:29 -0400 From: Joshua Hursey Subject: Re: [OMPI

Re: [OMPI users] Checkpointing mpi4py program

2010-08-13 Thread ananda.mudar

Josh I have stack traces of all 8 python processes when I observed the hang after successful completion of checkpoint. They are in the attached document. Please see if these stack traces provide any clue. Thanks Ananda From: Ananda Babu Mudar (WT01 - Energy an

Re: [OMPI users] users Digest, Vol 1658, Issue 2

2010-08-13 Thread ananda.mudar

Josh I am having problems compiling the sources from the latest trunk. It complains of libgomp.spec missing even though that file exists on my system. I will see if I have to change any other environment variables to have a successful compilation. I will keep you posted. BTW, were you successful

Re: [OMPI users] Checkpointing mpi4py program

2010-08-10 Thread ananda.mudar

Josh Please find attached is the python program that reproduces the hang that I described. Initial part of this file describes the prerequisite modules and the steps to reproduce the problem. Please let me know if you have any questions in reproducing the hang. Please note that, if I add the foll

[OMPI users] Checkpointing mpi4py program

2010-08-09 Thread ananda.mudar

Hi I have integrated mpi4py with openmpi 1.4.2 that was built with BLCR 0.8.2. When I run ompi-checkpoint on the program written using mpi4py, I see that program doesn't resume sometimes after successful checkpoint creation. This doesn't occur always meaning the program resumes after successful

Re: [OMPI users] opal_cr_tmp_dir

2010-05-18 Thread ananda.mudar

That's correct. I have prefixed them with OMPI_MCA_ when I defined them in my environment. Despite that I still see some of these files being created under the default directory /tmp which is different from what I had set. Thanks Ananda From: Josh Hursey Subject

[OMPI users] ompi-restart fails with "found pid in use"

2010-05-14 Thread ananda.mudar

Hi I am using open mpi v1.3.4 with BLCR 0.8.2. I have been testing my openmpi based program on a 3-node cluster (each node is a Intel Nehalem based dual quad core) and I have been successful in checkpointing and restarting the program successfully multiple times. Recently I moved to a 15 node

Re: [OMPI users] opal_cr_tmp_dir

2010-05-12 Thread ananda.mudar

Ralph Defining these parameters in my environment also did not resolve the problem. Whenever I restart my program, the temporary files are getting stored in the default /tmp directory instead of the directory I had defined. Thanks Ananda = Subject: Re: [OMPI users] opal_cr_tmp_

Re: [OMPI users] opal_cr_tmp_dir

2010-05-12 Thread ananda.mudar

Ralph When you say manually, do you mean setting these parameters in the command line while calling mpirun, ompi-restart, and ompi-checkpoint? Or is there another way to set these parameters? Thanks Ananda == Subject: Re: [OMPI users] opal_cr_tmp_dir From: Ralph Castain (rhc_at

[OMPI users] (no subject)

2010-05-12 Thread ananda.mudar

Ralph When you say manually, do you mean setting these parameters in the command line while calling mpirun, ompi-restart, and ompi-checkpoint? Or is there another way to set these parameters? Thanks Ananda == Subject: Re: [OMPI users] opal_cr_tmp_dir From: Ralph Castain (rhc_at

Re: [OMPI users] opal_cr_tmp_dir

2010-05-12 Thread ananda.mudar

Ralph I have these parameters set in ~/.openmpi/mca-params.conf file $ cat ~/.openmpi/mca-params.conf orte_tmpdir_base = /home/ananda/ORTE opal_cr_tmp_dir = /home/ananda/OPAL $ Should I be setting OMPI_MCA_opal_cr_tmp_dir? FYI, I am using openmpi 1.3.4 with blcr 0.8.2 Thanks Ananda =

Re: [OMPI users] opal_cr_tmp_dir

2010-05-12 Thread ananda.mudar

Thanks Ralph. Another question. Even though I am setting opal_cr_tmp_dir to a directory other than /tmp while calling ompi-restart command, this setting is not getting passed to the mpirun command that gets generated by ompi-restart. How do I overcome this constraint? Thanks Ananda ==

[OMPI users] opal_cr_tmp_dir

2010-05-12 Thread ananda.mudar

I am setting the MCA parameter "opal_cr_tmp_dir" to a directory other than /tmp while calling "mpirun", "ompi-restart", and "ompi-checkpoint" commands so that I don't fill up /tmp filesystem. But I see that openmpi-sessions* directory is still getting created under /tmp. How do I overcome this prob

[OMPI users] ompi-checkpoint fails sometimes

2010-05-11 Thread ananda.mudar

Hi I am using open-mpi 1.3.4 with BLCR. Sometimes I am running into a strange problem with ompi-checkpoint command. Even though I see that all MPI processes (equal to np argument) are running, ompi-checkpoint command fails at times. I have seen this failure always when the MPI processes spawned

[OMPI users] Meaning and the significance of MCA parameter "opal_cr_use_thread"

2010-03-24 Thread ananda.mudar

The description for MCA parameter "opal_cr_use_thread" is very short at URL: http://osl.iu.edu/research/ft/ompi-cr/api.php Can someone explain the usefulness of enabling this parameter vs disabling it? In other words, what are pros/cons of disabling it? I found that this gets enabled automa

[OMPI users] mpirun with -am ft-enable-cr option runs slow if hyperthreading is disabled

2010-03-22 Thread ananda.mudar

Hi If the run my compute intensive openmpi based program using regular invocation of mpirun (ie; mpirun -host -np ), it gets completed in few seconds but if I run the same program with "-am ft-enable-cr" option, the program takes 10x time to complete. If I enable hyperthreading on my cluster

[OMPI users] top command output shows huge CPU utilization when openmpi processes resume after the checkpoint

2010-03-20 Thread ananda.mudar

When I checkpoint my openmpi application using ompi_checkpoint, I see that top command suddenly shows some really huge numbers in "CPU %" field such as 150% 200% etc. After sometime, these numbers do come back to the normal numbers under 100%. This happens exactly around the time checkpoint is comp

[OMPI users] mpirun with -am ft-enable-cr option takes longer time on certain configurations

2010-03-20 Thread ananda.mudar

I am observing a very strange performance issue with my openmpi program. I have compute intensive openmpi based application that keeps the data in memory, process the data and then dumps it to GPFS parallel file system. GPFS parallel file system server is connected to a QDR infiniband switch fro

[OMPI users] Question on staging in checkpoint

[OMPI users] MPI_Bcast() Vs paired MPI_Send() & MPI_Recv()

Re: [OMPI users] Checkpointing mpi4py program (Probably bcast issue)

Re: [OMPI users] Checkpointing mpi4py program (Probably bcast issue)

Re: [OMPI users] Checkpointing mpi4py program

Re: [OMPI users] Checkpointing mpi4py program

Re: [OMPI users] Checkpointing mpi4py program

Re: [OMPI users] Checkpointing mpi4py program

Re: [OMPI users] users Digest, Vol 1658, Issue 2

Re: [OMPI users] Checkpointing mpi4py program

[OMPI users] Checkpointing mpi4py program

Re: [OMPI users] opal_cr_tmp_dir

[OMPI users] ompi-restart fails with "found pid in use"

Re: [OMPI users] opal_cr_tmp_dir

Re: [OMPI users] opal_cr_tmp_dir

[OMPI users] (no subject)

Re: [OMPI users] opal_cr_tmp_dir

Re: [OMPI users] opal_cr_tmp_dir

[OMPI users] opal_cr_tmp_dir

[OMPI users] ompi-checkpoint fails sometimes

[OMPI users] Meaning and the significance of MCA parameter "opal_cr_use_thread"

[OMPI users] mpirun with -am ft-enable-cr option runs slow if hyperthreading is disabled

[OMPI users] top command output shows huge CPU utilization when openmpi processes resume after the checkpoint

[OMPI users] mpirun with -am ft-enable-cr option takes longer time on certain configurations

24 matches

Site Navigation

Mail list logo

Footer information