Take a look a nodetool gossipinfo it will tell you what nodes the node thinks are around.
If you can see something in gossip that should not be there you have a few choices. * if it's less than 3 days since a change to ring topology wait and see if C* sorts it out. * try restarting nodes with -Dcassanda.load_ring_state=false as a JVM opt in cassandra-env.sh. This may not work because when the node restarts others will tell it the bad info * try the unsafeAssassinateEndpoint() call on the Gossiper MBean via JMX https://github.com/apache/cassandra/blob/trunk/src/java/org/apache/cassandra/gms/GossiperMBean.java#L28 Cheers ----------------- Aaron Morton Freelance Cassandra Consultant New Zealand @aaronmorton http://www.thelastpickle.com On 20/03/2013, at 11:10 PM, Andras Szerdahelyi <andras.szerdahe...@ignitionone.com> wrote: > Thanks, Aaron. > > I re-enabled hinted handoff and noted the following > • no host is marked down in nodetool ring > • No host is logged as down or dead in logs > • No "started hinted handoff for.." is logged > • The hinted handoff manager Mbean lists pending hints to .. (drumroll) > 3 non-existent nodes? > > Here's my ring > > Note: Ownership information does not include topology, please specify a > keyspace. > Address DC Rack Status State Load Owns > Token > > 113427455640312821154458202477256070785 > XX.XX.1.113 ione-us-atl rack1 Up Normal 382.08 GB 33.33% > 0 > XX.XX.31.10 ione-us-lvg rack1 Up Normal 266.04 GB 0.00% > 100 > XX.XX.0.71 ione-be-bru rack1 Up Normal 85.86 GB 0.00% > 200 > XX.XX.2.86 ione-analytics-us-atlrack1 Up Normal 153.6 GB > 0.00% 300 > XX.XX.1.45 ione-us-atl-ssdrack1 Up Normal 296.72 GB > 0.00% 400 > XX.XX.2.85 ione-analytics-us-atlrack1 Up Normal 100.3 GB > 33.33% 56713727820156410577229101238628035542 > XX.XX.1.204 ione-us-atl rack1 Up Normal 341.55 GB 16.67% > 85070591730234615865843651857942052864 > XX.XX.11 ione-us-lvg rack1 Up Normal 320.22 GB 0.00% > 85070591730234615865843651857942052964 > XX.XX.2.87 ione-analytics-us-atlrack1 Up Normal 166.48 GB > 16.67% 113427455640312821154458202477256070785 > > And these are nodes pending hints according to the Mbean > > 166860289390734216023086131251507064403 > 143927757573010354572009627285182898319 > 24295500190543334543807902779534181373 > > Err.. Unbalanced ring ? Opscenter says otherwise ( "OpsCenter has detected > that the token ranges are evenly distributed across the nodes in each data > center. Load rebalancing is not necessary at this time." ) > > I appreciate your help so far! In the mean time hintedhandOFF because my > mutation TP can't keep up with this traffic, not to mention compaction.. > > Thanks, > Andras > > ps: all nodes are cassandra-1.1.6-dse-p1 > > From: aaron morton <aa...@thelastpickle.com> > Reply-To: "user@cassandra.apache.org" <user@cassandra.apache.org> > Date: Monday 18 March 2013 17:51 > To: "user@cassandra.apache.org" <user@cassandra.apache.org> > Subject: Re: 33million hinted handoffs from nowhere > > You can check which nodes hints are being held for using the JMX api. Look > for the org.apache.cassandra.db:type=HintedHandoffManager MBean and call the > listEndpointsPendingHints() function. > > There are two points where hints may be stored, if the node is down when the > request started or if the node timed out and did not return before > rpc_timeout. To check for the first, look for log lines about a node being > "dead" on the coordinator. To check for the second look for dropped messages > on the other nodes. This will be logged, or you can use nodetool tpstats to > look for them. > > Cheers > > ----------------- > Aaron Morton > Freelance Cassandra Consultant > New Zealand > > @aaronmorton > http://www.thelastpickle.com > > On 15/03/2013, at 2:30 AM, Andras Szerdahelyi > <andras.szerdahe...@ignitionone.com> wrote: > >> ( The previous letter was sent prematurely, sorry. ) >> >> This node is the only node being written to, but the Cfs being written >> replicate to almost all of the other nodes >> My understanding is that hinted handoff is mutations kept around on the >> coordinator node, to be replayed when the target node re-appears on the >> ring. All my nodes are up and again, no hinted handoff is logged on the node >> itself >> >> Thanks! >> Andras >> >> From: Andras Szerdahelyi <andras.szerdahe...@ignitionone.com> >> Date: Thursday 14 March 2013 14:25 >> To: "user@cassandra.apache.org" <user@cassandra.apache.org> >> Subject: 33million hinted handoffs from nowhere >> >> Hi list, >> >> I am experiencing seemingly uncontrollable and unexplained growth of my >> HintedHandoff CF on a single node. Unexplained because there are no hinted >> handoffs being logged on the node, uncontrollable because I see 33 million >> inserts in cfstats and the size of the stables is over 10 gigs all in an >> hour of uptime. >> >> >> I have done the following to try and reproduce this: >> >> - shut down my cluster >> - on all nodes: remove sstables from the HintsColumnFamily data dir >> - on all nodes: remove commit logs >> - start all nodes but the one that’s showing this problem >> - nothing is writing to any of the nodes. There are no hinted handoff going >> on anywhere >> - bring back the node in question last >> - few seconds after boot: >> >> Column Family: HintsColumnFamily >> SSTable count: 1 >> Space used (live): 44946532 >> Space used (total): 44946532 >> Number of Keys (estimate): 256 >> Memtable Columns Count: 17840 >> Memtable Data Size: 17569909 >> Memtable Switch Count: 2 >> Read Count: 0 >> Read Latency: NaN ms. >> Write Count: 184836 >> Write Latency: 0.668 ms. >> Pending Tasks: 0 >> Bloom Filter False Postives: 0 >> Bloom Filter False Ratio: 0.00000 >> Bloom Filter Space Used: 16 >> Compacted row minimum size: 20924301 >> Compacted row maximum size: 25109160 >> Compacted row mean size: 25109160 >> >> >> >> >