[OMPI users] OpenMPI Checkpoint/Restart components

2009-12-06 Thread Andreea Costea
Hi there Lately I've been reading lots of papers about fault tolerance for MPI applications. All seemed very nice and clear. But as soon as I pass the reading part to start testing I had my surprise as there I can not find implementations. The best I could find is the possibility of manually check

Re: [OMPI users] Mimicking timeout for MPI_Wait

2009-12-06 Thread Katz, Jacob
Thanks. Yes, I meant in the question that I was looking for something creative, both fast responding and not using 100% CPU all the time. I guess I’m not the first one to face this question. Have anyone done anything “better” than the simple solution? Jacob M. Kat

Re: [OMPI users] Tons of warnings in running my first openmpi job

2009-12-06 Thread Jeff Squyres
It looks like your version of Open MPI is compiled and linked against the DAPL library, but the DAPL library is not present on your system (or it is incorrectly installed...? I'm not very familiar with DAPL). You should probably contact your sysadmin to ask about the DAPL installation. FWIW,

Re: [OMPI users] Mimicking timeout for MPI_Wait

2009-12-06 Thread Douglas Guptill
On Sun, Dec 06, 2009 at 02:29:01PM +0200, Katz, Jacob wrote: > Thanks. > Yes, I meant in the question that I was looking for something creative, both > fast responding and not using 100% CPU all the time. > I guess I’m not the first one to face this question. Have anyone done > anything “better”

[OMPI users] How to detect a failure to start-up and MPI_Init()?

2009-12-06 Thread Katz, Jacob
Hi, Is there a way to detect a situation than one of the processes in an MPI application exits without even calling MPI_Init()? I have a case in which all the processes except one are stuck forever in MPI_Init(), and that one exits before being able to call MPI_Init()... I tried using the mca par

Re: [OMPI users] Mimicking timeout for MPI_Wait

2009-12-06 Thread Katz, Jacob
Thanks, Douglas. I found your code in the archive. Jacob M. Katz | jacob.k...@intel.com | Work: +972-4-865-5726 | iNet: (8)-465-5726 -Original Message- From: users-boun...@open-mpi.org [mailto:users-boun...@open-mpi.org] On Behalf Of Douglas Guptill Sen

Re: [OMPI users] How to detect a failure to start-up and MPI_Init()?

2009-12-06 Thread Ralph Castain
The system should see that app fail and abort the job - whether it calls MPI_Init first or not is irrelevant. What version are you using? On Sun, Dec 6, 2009 at 8:40 AM, Katz, Jacob wrote: > Hi, > > Is there a way to detect a situation than one of the processes in an MPI > application exits wit

[OMPI users] a good grid simulator to run open MPI applications

2009-12-06 Thread Kritiraj Sajadah
Hi All, Can you recommend me a good open source Grid simulation tool to execute open mpi applcaiton. Thanks Raj

Re: [OMPI users] How to detect a failure to start-up and MPI_Init()?

2009-12-06 Thread Katz, Jacob
I'm using 1.3.3. The job isn't aborted in my case when the failing process haven't called MPI_Init... It is aborted if the process have called MPI_Init... Jacob M. Katz | jacob.k...@intel.com | Work: +972-4-865-5726 | iNet: (8)-465-5

Re: [OMPI users] Mimicking timeout for MPI_Wait

2009-12-06 Thread Katz, Jacob
By the way, there is no way to time-out a call to MPI_Init(), or is there? Jacob M. Katz | jacob.k...@intel.com | Work: +972-4-865-5726 | iNet: (8)-465-5726 -Original Message- From: users-boun...@open-mpi.org [mailto:users-boun...@open-mpi.org] On Behalf

Re: [OMPI users] How to detect a failure to start-up and MPI_Init()?

2009-12-06 Thread Ralph Castain
I'll look into it - sounds like a bug Thanks! On Sun, Dec 6, 2009 at 9:13 AM, Katz, Jacob wrote: > I’m using 1.3.3. > > The job isn’t aborted in my case when the failing process haven’t called > MPI_Init… It is aborted if the process have called MPI_Init… > > > > -

Re: [OMPI users] Mimicking timeout for MPI_Wait

2009-12-06 Thread Eugene Loh
Douglas Guptill wrote: On Sun, Dec 06, 2009 at 02:29:01PM +0200, Katz, Jacob wrote: Yes, I meant in the question that I was looking for something creative, both fast responding and not using 100% CPU all the time. I guess I’m not the first one to face this question. Have anyone don