We have a very hot CF which we use essentially as a durable memory cache for our application. It is about 70MBytes in size after being fully populated. We completely overwrite this entire CF every few minutes (not delete). Our hope was that the CF would stay around 70MB in size, but it grows to multiple Gigabytes in size rather quickly (less than an hour). I've heard that doing major compactions using nodetool is no longer recommended, but when we force a compaction on this CF using nodetool compact, then perform GC, size on disk shrinks to the expected 70MB.
I'm wondering if we are doing something wrong here, we thought we were avoiding tombstones since we are just overwriting each column using the same keys. Is the fact that we have to do a GC to get the size on disk to shrink significantly a smoking gun that we have a bunch of tombstones? We've row cached the entire CF to make reads really fast, and writes are definitely fast enough, it's this growing disk space that has us concerned. Here's the output from nodetool cfstats for the CF in question (hrm, I just noticed that we still have a key cache for this CF which is rather dumb): Column Family: Test SSTable count: 4 Space used (live): 309767193 Space used (total): 926926841 Number of Keys (estimate): 275456 Memtable Columns Count: 37510 Memtable Data Size: 15020598 Memtable Switch Count: 22 Read Count: 4827496 Read Latency: 0.010 ms. Write Count: 1615946 Write Latency: 0.095 ms. Pending Tasks: 0 Key cache capacity: 150000 Key cache size: 55762 Key cache hit rate: 0.030557854052177317 Row cache capacity: 150000 Row cache size: 68752 Row cache hit rate: 1.0 Compacted row minimum size: 925 Compacted row maximum size: 1109 Compacted row mean size: 1109 Any insight appreciated. Thanks, -Derek