Dear all,
May I help in this context ? I can't promise to do big things or high
availability in this regard, because I may get more busy in my work.
And also I am not sure that my
company will allow this or not. Any way I may do this in my spare time.
Thanks & Regards,
On 12/23/09, Ralph Castai
That's just OMPI's default behavior - as Josh said, we are working towards
allowing other behaviors, but for now, this is what we have.
On Dec 23, 2009, at 5:40 AM, vipin kumar wrote:
> Thank you Ralph,
>
> I did as you said. Programs are running fine, But still killing one process
> leads to
Thank you Ralph,
I did as you said. Programs are running fine, But still killing one process
leads to terminate all processes. Am I missing something? Any thing else to
be called with MPI::Comm::Disconnect()?
Thanks & Regards,
On Mon, Dec 21, 2009 at 8:00 PM, Ralph Castain wrote:
> Disconnect
Disconnect is a -collective- operation. Both parent and child have to call it.
Your child process is "hanging" while it waits for the parent.
On Dec 21, 2009, at 1:37 AM, vipin kumar wrote:
> Hello folks,
>
> As I explained my problem earlier, I am looking for Fault Tolerance in MPI
> Programs
Hello folks,
As I explained my problem earlier, I am looking for Fault Tolerance in MPI
Programs. I read in Open MPI 2.1 standard document that two DISCONNECTED
processes does not affect each other, i.e. they can die or can be killed
without whithout affecting other processes.
So, I was trying th
Unfortunately I cannot provide a precise time frame for availability
at this point, but we are targeting the v1.5 release series. There is
a handful of core developers working on this issue at the moment.
Pieces of this work have already made it into the Open MPI
development trunk. If you
Hi Josh,
It is good to hear from you that work is in progress towards resiliency of
Open-MPI. I was and I am waiting for this capability in Open-MPI. I have
almost finished my development work and waiting for this to happen so that I
can test my programs. It will be good if you can tell how long i
Task-farm or manager/worker recovery models typically depend on
intercommunicators (i.e., from MPI_Comm_spawn) and a resilient MPI
implementation. William Gropp and Ewing Lusk have a paper entitled
"Fault Tolerance in MPI Programs" that outlines how an application
might take advantage of th
Is that kind of approach possible within an MPI framework? Perhaps a
grid approach would be better. More experienced people, speak up,
please?
(The reason I say that is that I too am interested in the solution of
that kind of problem, where an individual blade of a blade server
fails and correcting
Hi
I guess "task-farming" could give you a certain amount of the kind of
fault-tolerance you want.
(i.e. a master process distributes tasks to idle slave processors -
however, this will only work
if the slave processes don't need to communicate with each other)
Jody
On Mon, Aug 3, 2009 at 1:24
Hi all,
Thanks Durga for your reply.
Jeff, once you wrote code for Mandelbrot set to demonstrate fault tolerance
in LAM-MPI. i. e. killing any slave process doesn't
affect others. Exact behaviour I am looking for in Open MPI. I attempted,
but no luck. Can you please tell how to write such program
Although I have perhaps the least experience on the topic in this
list, I will take a shot; more experienced people, please correct me:
MPI standards specify communication mechanism, not fault tolerance at
any level. You may achieve network tolerance at the IP level by
implementing 'equal cost mul
12 matches
Mail list logo