I wonder why are memtable estimations so bad.

1. its not possible to run them more often? There should be some limit - run live/serialized calculation at least once per hour. They took just few seconds. 2. Why not use data from FlusherWriter to update estimations? Flusher knows number of ops and serialized size after sstable is written to disk. These values should be used for updating memtable live/serialized ratio.

INFO [OptionalTasks:1] 2012-03-23 09:33:51,765 MeteredFlusher.java (line 62) flushing high-traffic column family CFS(Keyspace='whois', ColumnFamily='ipbans') (estimated 105363280 bytes) INFO [OptionalTasks:1] 2012-03-23 09:33:51,796 ColumnFamilyStore.java (line 704) Enqueuing flush of Memtable-ipbans@481336682(1317041/105363280 serialized/live bytes, 16755 ops)
 ** Here should be noted that live/serialized size is ESTIMATED!! **
INFO [FlushWriter:314] 2012-03-23 09:33:51,796 Memtable.java (line 246) Writing Memtable-ipbans@481336682(1317041/105363280 serialized/live bytes, 16755 ops) INFO [FlushWriter:314] 2012-03-23 09:33:51,799 Memtable.java (line 283) Completed flushing /var/lib/cassandra/data/whois/ipbans-hc-16775-Data.db (1355 bytes)

Reply via email to