Could this ghost node be causing my hints column family to grow to this size? I also crash after about 24 hours due to commit logs growth taking up all the drive space. A manual nodetool flush keeps it under control though.
Column Family: HintsColumnFamily SSTable count: 6 Space used (live): 666480352 Space used (total): 666480352 Number of Keys (estimate): 768 Memtable Columns Count: 1043 Memtable Data Size: 461773 Memtable Switch Count: 3 Read Count: 38 Read Latency: 131.289 ms. Write Count: 582108 Write Latency: 0.019 ms. Pending Tasks: 0 Key cache capacity: 7 Key cache size: 6 Key cache hit rate: 0.8333333333333334 Row cache: disabled Compacted row minimum size: 2816160 Compacted row maximum size: 386857368 Compacted row mean size: 120432714 Is there a way for me to manually remove this dead node? -----Original Message----- From: Bryce Godfrey [mailto:bryce.godf...@azaleos.com] Sent: Sunday, August 21, 2011 9:09 PM To: user@cassandra.apache.org Subject: RE: Completely removing a node from the cluster It's been at least 4 days now. -----Original Message----- From: aaron morton [mailto:aa...@thelastpickle.com] Sent: Sunday, August 21, 2011 3:16 PM To: user@cassandra.apache.org Subject: Re: Completely removing a node from the cluster I see the mistake I made about ring, gets the endpoint list from the same place but uses the token's to drive the whole process. I'm guessing here, don't have time to check all the code. But there is a 3 day timeout in the gossip system. Not sure if it applies in this case. Anyone know ? Cheers ----------------- Aaron Morton Freelance Cassandra Developer @aaronmorton http://www.thelastpickle.com On 22/08/2011, at 6:23 AM, Bryce Godfrey wrote: > Both .2 and .3 list the same from the mbean that Unreachable is empty > collection, and Live node lists all 3 nodes still: > 192.168.20.2 > 192.168.20.3 > 192.168.20.1 > > The removetoken was done a few days ago, and I believe the remove was done > from .2 > > Here is what ring outlook looks like, not sure why I get that token on the > empty first line either: > Address DC Rack Status State Load Owns > Token > > 85070591730234615865843651857942052864 > 192.168.20.2 datacenter1 rack1 Up Normal 79.53 GB 50.00% > 0 > 192.168.20.3 datacenter1 rack1 Up Normal 42.63 GB 50.00% > 85070591730234615865843651857942052864 > > Yes, both nodes show the same thing when doing a describe cluster, that .1 is > unreachable. > > > -----Original Message----- > From: aaron morton [mailto:aa...@thelastpickle.com] > Sent: Sunday, August 21, 2011 4:23 AM > To: user@cassandra.apache.org > Subject: Re: Completely removing a node from the cluster > > Unreachable nodes in either did not respond to the message or were known to > be down and were not sent a message. > The way the node lists are obtained for the ring command and describe cluster > are the same. So it's a bit odd. > > Can you connect to JMX and have a look at the o.a.c.db.StorageService MBean ? > What do the LiveNode and UnrechableNodes attributes say ? > > Also how long ago did you remove the token and on which machine? Do both 20.2 > and 20.3 think 20.1 is still around ? > > Cheers > > > ----------------- > Aaron Morton > Freelance Cassandra Developer > @aaronmorton > http://www.thelastpickle.com > > On 20/08/2011, at 9:48 AM, Bryce Godfrey wrote: > >> I'm on 0.8.4 >> >> I have removed a dead node from the cluster using nodetool removetoken >> command, and moved one of the remaining nodes to rebalance the tokens. >> Everything looks fine when I run nodetool ring now, as it only lists the >> remaining 2 nodes and they both look fine, owning 50% of the tokens. >> >> However, I can still see it being considered as part of the cluster from the >> Cassandra-cli (192.168.20.1 being the removed node) and I'm worried that the >> cluster is still queuing up hints for the node, or any other issues it may >> cause: >> >> Cluster Information: >> Snitch: org.apache.cassandra.locator.SimpleSnitch >> Partitioner: org.apache.cassandra.dht.RandomPartitioner >> Schema versions: >> dcc8f680-caa4-11e0-0000-553d4dced3ff: [192.168.20.2, 192.168.20.3] >> UNREACHABLE: [192.168.20.1] >> >> >> Do I need to do something else to completely remove this node? >> >> Thanks, >> Bryce >