On Oct 24, 2012, at 6:05 PM, aaron morton wrote: > Hints store the columns, row key, KS name and CF id(s) for each mutation to > each node. Where an executed mutation will store the most recent columns > collated with others under the same row key. So depending on the type of > mutation hints will take up more space. > > The worse case would be lots of overwrites. After that writing a small amount > of data to many rows would result in a lot of the serialised space being > devoted to row keys, KS name and CF id. > > 16Gb is a lot though. What was the write workload like ?
Each write is new data only (no overwrites). Each mutation adds a row to one column family with a column containing about ~100 bytes of data and a new row to another column family with a SuperColumn containing 2x17KiB payloads. These are sent in batches with several in them, but I found that the storage overhead was the same regardless of the size of the batch mutation (i.e., 5 vs 25 mutations made no difference). A total of 1,000,000 mutations like these are sent over the duration of the test. > You can get an estimate on the number of keys in the Hints CF using nodetool > cfstats. Also some metrics in the JMX will tell you how many hints are > stored. > >> This has a huge impact on write performance as well. > Yup. Hints are added to the same Mutation thread pool as normal mutations. > They are processed async to the mutation request but they still take > resources to store. > > You can adjust how long hints a collected for with max_hint_window_in_ms in > the yaml file. > > How long did the test run for ? > With both data centers functional, the test takes just a few minutes to run, with one data center down, 15x the amount of time. /dml