On Sun, Sep 25, 2011 at 12:52 PM, Yang <teddyyyy...@gmail.com> wrote:
> Thanks Brandon.
>
> I suspected that, but I think that's precluded as a possibility since
> I setup another background job to do
> echo | nc other_box 7000
> in a loop,
> this job seems to be working fine all the time, so network seems fine.

This isn't measuring latency, however.  That is how the failure
detector works, using probability to estimate the likelihood that a
given host is alive, based on previous history.  The situation on ec2
is something like the following: 99% of pings are 1ms, but sometimes
there are brief periods of 100ms, and this is where the FD says "this
is not realistic, I think the host is dead" but then receives the
ping, and thus the flapping.  I've seen it a million times, increasing
the phi threshold always solves it.

-Brandon

Reply via email to