Were you bootstrapping or otherwise moving nodes around? I don't think anyone's tracked this bug down farther than "if you restart the entire cluster, it goes away."
On Wed, May 19, 2010 at 10:05 PM, Keith Thornhill <ke...@raptr.com> wrote: > in a 5 node cluster, i noticed in our client error log that one of the > nodes was consistently throwing cassandra_UnavailableException during > a read operation. > > looking into jmx, it was obvious that one of the node's view of the > ring was out of sync. > > $ nodetool -host 192.168.20.150 ring > Address Status Load Range > Ring > > 139508497374977076191526400448759597506 > 192.168.20.156Up 5.73 GB > 733665530305941485083898696792520436 |<--| > 192.168.20.158Up 3.41 GB > 9629533262984150011756238989685472219 | ^ > 192.168.20.154Up 2.44 GB > 31048334058970902242412812423471654868 v | > 192.168.20.150Up 4.89 GB > 105769574715070648260922426249777160699 | ^ > 192.168.20.152Up 5.24 GB > 139508497374977076191526400448759597506 |-->| > > $ nodetool -host 192.168.20.158 ring > Address Status Load Range > Ring > 192.168.20.158Up 3.41 GB > 9629533262984150011756238989685472219 |<--| > > looking at the CF stats on that node, it is obvious that reads and > writes are happening, but i have to assume that those are coming from > proxy connections via the other nodes. > > when restarting that node, the error logs in the other cluster nodes > show that they detect the server going away and then coming back into > the ring. > > INFO [WRITE-/192.168.20.158] 2010-05-19 21:27:39,448 > OutboundTcpConnection.java (line 102) error writing to /192.168.20.158 > INFO [WRITE-/192.168.20.158] 2010-05-19 21:27:55,475 > OutboundTcpConnection.java (line 102) error writing to /192.168.20.158 > INFO [GMFD:1] 2010-05-19 21:27:56,481 Gossiper.java (line 582) Node > /192.168.20.158 has restarted, now UP again > INFO [GMFD:1] 2010-05-19 21:27:56,482 StorageService.java (line 538) > Node /192.168.20.158 state jump to normal > > any ideas on how to kick that node and remind it of its buddies? > > thanks! > -keith > -- Jonathan Ellis Project Chair, Apache Cassandra co-founder of Riptano, the source for professional Cassandra support http://riptano.com