Thanks Brandon. I'll try this.
but you can also see my later post regarding message drop : http://mail-archives.apache.org/mod_mbox/cassandra-user/201109.mbox/%3ccaanh3_8aehidyh9ybt82_emh3likbcdsenrak3jhfzaj2l+...@mail.gmail.com%3E that seems to show something in either code or background load causing messages to be really dropped Yang On Sun, Sep 25, 2011 at 10:59 AM, Brandon Williams <[email protected]> wrote: > On Sun, Sep 25, 2011 at 12:52 PM, Yang <[email protected]> wrote: >> Thanks Brandon. >> >> I suspected that, but I think that's precluded as a possibility since >> I setup another background job to do >> echo | nc other_box 7000 >> in a loop, >> this job seems to be working fine all the time, so network seems fine. > > This isn't measuring latency, however. That is how the failure > detector works, using probability to estimate the likelihood that a > given host is alive, based on previous history. The situation on ec2 > is something like the following: 99% of pings are 1ms, but sometimes > there are brief periods of 100ms, and this is where the FD says "this > is not realistic, I think the host is dead" but then receives the > ping, and thus the flapping. I've seen it a million times, increasing > the phi threshold always solves it. > > -Brandon >
