Re: [OMPI users] Fault Tolerant Features in OpenMPI

2013-08-12 Thread Edson Tavares de Camargo
Hi, George! I had studied the ULFM document before begin the tests with failure detection in open mpi and seems me a good choice. But I'm having trouble with the ULFM-enabled version of Open MPI (openmpi-1.7ft_b3.tar.gz). I follow the UFML setup (in http://fault-tolerance.org/ulfm/ulfm-setup/). T

Re: [OMPI users] Fault Tolerant Features in OpenMPI

2013-08-12 Thread George Bosilca
Edson, Based on your questions I would suggest you take a look at the ULFM-enabled version of Open MPI. You can find it at http://fault-tolerance.org/. George. On Aug 11, 2013, at 15:33 , Edson Tavares de Camargo wrote: > Thanks a lot for your reply, Ralph! > > Could you tell me in what si

Re: [OMPI users] Fault Tolerant Features in OpenMPI

2013-08-11 Thread Ralph Castain
On Aug 11, 2013, at 6:33 AM, Edson Tavares de Camargo wrote: > Thanks a lot for your reply, Ralph! > > Could you tell me in what situation the error handler would be called in > the 1.6.5 version? Only when an error is detected in the MPI layer > > I had thought that a failure in a process

Re: [OMPI users] Fault Tolerant Features in OpenMPI

2013-08-11 Thread Edson Tavares de Camargo
Thanks a lot for your reply, Ralph! Could you tell me in what situation the error handler would be called in the 1.6.5 version? I had thought that a failure in a process would be catched by the error handler. Kill, or abort, the process wouldn't the same behaviour? In the 1.7.4 release if a proc

Re: [OMPI users] Fault Tolerant Features in OpenMPI

2013-08-10 Thread Ralph Castain
The error handler wouldn't be called in that situation - we simply abort the job. We expect to provide that integration in something like the 1.7.4 release milestone. On Aug 10, 2013, at 11:07 AM, Edson Tavares de Camargo wrote: > Hi All, > > I was looking for posts about fault tolerant in

[OMPI users] Fault Tolerant Features in OpenMPI

2013-08-10 Thread Edson Tavares de Camargo
Hi All, I was looking for posts about fault tolerant in MPI and I found the post below: http://www.open-mpi.org/community/lists/users/2012/06/19658.php I am trying to understand all work about failures detection present in open-mpi. So, I began with a simple application, a ring application (rin

[OMPI users] Re: [OMPI users] 回复: [OMPI users] Fault Tolerant Features in OpenMPI

2012-06-25 Thread Josh Hursey
The official support page for the C/R features is hosted by Indiana University (linked from the Open MPI FAQs): http://osl.iu.edu/research/ft/ompi-cr/ The instructions probably need to be cleaned up (some of the release references are not quite correct any longer). But the following should give

[OMPI users] 回复: [OMPI users] Fault Tolerant Features in OpenMPI

2012-06-25 Thread 陈松
THANK YOU for your detailed answer.[quote]If you want a fault tolerance feature, such as automaticcheckpoint/restart recovery, you will need to create a build of OpenMPI with that feature enabled. There are instructions on the variouslinks above about how to do so.[/quote]Could you give me some