I wonder why are memtable estimations so bad.
1. its not possible to run them more often? There should be some limit -
run live/serialized calculation at least once per hour. They took just
few seconds.
2. Why not use data from FlusherWriter to update estimations? Flusher
knows number of ops and serialized size after sstable is written to
disk. These values should be used for updating memtable live/serialized
ratio.
INFO [OptionalTasks:1] 2012-03-23 09:33:51,765 MeteredFlusher.java
(line 62) flushing high-traffic column family CFS(Keyspace='whois',
ColumnFamily='ipbans') (estimated 105363280 bytes)
INFO [OptionalTasks:1] 2012-03-23 09:33:51,796 ColumnFamilyStore.java
(line 704) Enqueuing flush of
Memtable-ipbans@481336682(1317041/105363280 serialized/live bytes, 16755
ops)
** Here should be noted that live/serialized size is ESTIMATED!! **
INFO [FlushWriter:314] 2012-03-23 09:33:51,796 Memtable.java (line
246) Writing Memtable-ipbans@481336682(1317041/105363280 serialized/live
bytes, 16755 ops)
INFO [FlushWriter:314] 2012-03-23 09:33:51,799 Memtable.java (line
283) Completed flushing
/var/lib/cassandra/data/whois/ipbans-hc-16775-Data.db (1355 bytes)