Re: frequent node UP/Down?

Yang Sun, 25 Sep 2011 11:10:39 -0700

Thanks Brandon.

I'll try this.


but you can also see my later post regarding message drop :
http://mail-archives.apache.org/mod_mbox/cassandra-user/201109.mbox/%3ccaanh3_8aehidyh9ybt82_emh3likbcdsenrak3jhfzaj2l+...@mail.gmail.com%3E

that seems to show something in either code or background load causing
messages to be really dropped


Yang

On Sun, Sep 25, 2011 at 10:59 AM, Brandon Williams <[email protected]> wrote:
> On Sun, Sep 25, 2011 at 12:52 PM, Yang <[email protected]> wrote:
>> Thanks Brandon.
>>
>> I suspected that, but I think that's precluded as a possibility since
>> I setup another background job to do
>> echo | nc other_box 7000
>> in a loop,
>> this job seems to be working fine all the time, so network seems fine.
>
> This isn't measuring latency, however.  That is how the failure
> detector works, using probability to estimate the likelihood that a
> given host is alive, based on previous history.  The situation on ec2
> is something like the following: 99% of pings are 1ms, but sometimes
> there are brief periods of 100ms, and this is where the FD says "this
> is not realistic, I think the host is dead" but then receives the
> ping, and thus the flapping.  I've seen it a million times, increasing
> the phi threshold always solves it.
>
> -Brandon
>

Re: frequent node UP/Down?

Reply via email to