I have this happening on 0.8.x It looks to me as this happens when the node is under heavy load such as unthrottled compactions or a huge GC.
2011/9/24 Yang <teddyyyy...@gmail.com> > I'm using 1.0.0 > > > there seems to be too many node Up/Dead events detected by the failure > detector. > I'm using a 2 node cluster on EC2, in the same region, same security > group, so I assume the message drop > rate should be fairly low. > but in about every 5 minutes, I'm seeing some node detected as down, > and then Up again quickly, like the following > > > INFO 20:30:12,726 InetAddress /10.71.111.222 is now dead. > INFO 20:30:32,154 InetAddress /10.71.111.222 is now UP > > > does the "1 in every 5 minutes" sound roughly right for your setup? I > just want to make sure the unresponsiveness is not > caused by something like memtable flushing, or GC, which I can > probably further tune. > > > Thanks > Yang >