Re: 33million hinted handoffs from nowhere

Andras Szerdahelyi Wed, 20 Mar 2013 03:11:07 -0700

Thanks, Aaron.

I re-enabled hinted handoff and noted the following


  *   no host is marked down in nodetool ring
  *   No host is logged as down or dead in logs
  *   No "started hinted handoff for.." is logged
  *   The hinted handoff manager Mbean lists pending hints to .. (drumroll) 3 
non-existent nodes?

Here's my ring

Note: Ownership information does not include topology, please specify a 
keyspace.
Address         DC          Rack        Status State   Load            Owns     
           Token
                                                                                
           113427455640312821154458202477256070785
XX.XX.1.113    ione-us-atl rack1       Up     Normal  382.08 GB       33.33%    
          0
XX.XX.31.10      ione-us-lvg rack1       Up     Normal  266.04 GB       0.00%   
            100
XX.XX.0.71     ione-be-bru rack1       Up     Normal  85.86 GB        0.00%     
          200
XX.XX.2.86     ione-analytics-us-atlrack1       Up     Normal  153.6 GB        
0.00%               300
XX.XX.1.45     ione-us-atl-ssdrack1       Up     Normal  296.72 GB       0.00%  
             400
XX.XX.2.85     ione-analytics-us-atlrack1       Up     Normal  100.3 GB        
33.33%              56713727820156410577229101238628035542
XX.XX.1.204    ione-us-atl rack1       Up     Normal  341.55 GB       16.67%    
          85070591730234615865843651857942052864
XX.XX.11      ione-us-lvg rack1       Up     Normal  320.22 GB       0.00%      
         85070591730234615865843651857942052964
XX.XX.2.87     ione-analytics-us-atlrack1       Up     Normal  166.48 GB       
16.67%              113427455640312821154458202477256070785

And these are nodes pending hints according to the Mbean

166860289390734216023086131251507064403
143927757573010354572009627285182898319
24295500190543334543807902779534181373

Err.. Unbalanced ring ? Opscenter says otherwise ( "OpsCenter has detected that 
the token ranges are evenly distributed across the nodes in each data center. 
Load rebalancing is not necessary at this time." )

I appreciate your help so far! In the mean time hintedhandOFF because my 
mutation TP can't keep up with this traffic, not to mention compaction..

Thanks,
Andras

ps: all nodes are cassandra-1.1.6-dse-p1

From: aaron morton <aa...@thelastpickle.com<mailto:aa...@thelastpickle.com>>
Reply-To: "user@cassandra.apache.org<mailto:user@cassandra.apache.org>" 
<user@cassandra.apache.org<mailto:user@cassandra.apache.org>>
Date: Monday 18 March 2013 17:51
To: "user@cassandra.apache.org<mailto:user@cassandra.apache.org>" 
<user@cassandra.apache.org<mailto:user@cassandra.apache.org>>
Subject: Re: 33million hinted handoffs from nowhere

You can check which nodes hints are being held for using the JMX api. Look for 
the org.apache.cassandra.db:type=HintedHandoffManager MBean and call the 
listEndpointsPendingHints() function.

There are two points where hints may be stored, if the node is down when the 
request started or if the node timed out and did not return before rpc_timeout. 
To check for the first, look for log lines about a node being "dead" on the 
coordinator. To check for the second look for dropped messages on the other 
nodes. This will be logged, or you can use nodetool tpstats to look for them.

Cheers

-----------------
Aaron Morton
Freelance Cassandra Consultant
New Zealand

@aaronmorton
http://www.thelastpickle.com

On 15/03/2013, at 2:30 AM, Andras Szerdahelyi 
<andras.szerdahe...@ignitionone.com<mailto:andras.szerdahe...@ignitionone.com>> 
wrote:

( The previous letter was sent prematurely, sorry. )

This node is the only node being written to, but the Cfs being written 
replicate to almost all of the other nodes
My understanding is that hinted handoff is mutations kept around on the 
coordinator node, to be replayed when the target node re-appears on the ring. 
All my nodes are up and again, no hinted handoff is logged on the node itself

Thanks!
Andras

From: Andras Szerdahelyi 
<andras.szerdahe...@ignitionone.com<mailto:andras.szerdahe...@ignitionone.com>>
Date: Thursday 14 March 2013 14:25
To: "user@cassandra.apache.org<mailto:user@cassandra.apache.org>" 
<user@cassandra.apache.org<mailto:user@cassandra.apache.org>>
Subject: 33million hinted handoffs from nowhere

Hi list,

I am experiencing seemingly uncontrollable and unexplained growth of my 
HintedHandoff CF on a single node. Unexplained because there are no hinted 
handoffs being logged on the node, uncontrollable because I see 33 million 
inserts in cfstats and the size of the stables is over 10 gigs all in an hour 
of uptime.


I have done the following to try and reproduce this:

- shut down my cluster
- on all nodes: remove sstables from the HintsColumnFamily data dir
- on all nodes: remove commit logs
- start all nodes but the one that’s showing this problem
- nothing is writing to any of the nodes. There are no hinted handoff going on 
anywhere
- bring back the node in question last
- few seconds after boot:

                Column Family: HintsColumnFamily
                SSTable count: 1
                Space used (live): 44946532
                Space used (total): 44946532
                Number of Keys (estimate): 256
                Memtable Columns Count: 17840
                Memtable Data Size: 17569909
                Memtable Switch Count: 2
                Read Count: 0
                Read Latency: NaN ms.
                Write Count: 184836
                Write Latency: 0.668 ms.
                Pending Tasks: 0
                Bloom Filter False Postives: 0
                Bloom Filter False Ratio: 0.00000
                Bloom Filter Space Used: 16
                Compacted row minimum size: 20924301
                Compacted row maximum size: 25109160
                Compacted row mean size: 25109160

Re: 33million hinted handoffs from nowhere

Reply via email to