Re: Hinted Handoff storage inflation

Mattias Larsson Fri, 26 Oct 2012 11:56:49 -0700

On Oct 24, 2012, at 6:05 PM, aaron morton wrote:

> Hints store the columns, row key, KS name and CF id(s) for each mutation to 
> each node. Where an executed mutation will store the most recent columns 
> collated with others under the same row key. So depending on the type of 
> mutation hints will take up more space. 
> 
> The worse case would be lots of overwrites. After that writing a small amount 
> of data to many rows would result in a lot of the serialised space being 
> devoted to row keys, KS name and CF id.
> 
> 16Gb is a lot though. What was the write workload like ?


Each write is new data only (no overwrites). Each mutation adds a row to one 
column family with a column containing about ~100 bytes of data and a new row 
to another column family with a SuperColumn containing 2x17KiB payloads. These 
are sent in batches with several in them, but I found that the storage overhead 
was the same regardless of the size of the batch mutation (i.e., 5 vs 25 
mutations made no difference). A total of 1,000,000 mutations like these are 
sent over the duration of the test.


> You can get an estimate on the number of keys in the Hints CF using nodetool 
> cfstats. Also some metrics in the JMX will tell you how many hints are 
> stored. 
> 
>> This has a huge impact on write performance as well.
> Yup. Hints are added to the same Mutation thread pool as normal mutations. 
> They are processed async to the mutation request but they still take 
> resources to store. 
> 
> You can adjust how long hints a collected for with max_hint_window_in_ms in 
> the yaml file. 
> 
> How long did the test run for ? 
> 

With both data centers functional, the test takes just a few minutes to run, 
with one data center down, 15x the amount of time.

/dml

Re: Hinted Handoff storage inflation

Reply via email to