> 1. its not possible to run them more often? There should be some limit - run > live/serialized calculation at least once per hour. They took just few > seconds. The live ratio is updated every time the operation count (since startup) for the CF doubles.
> 2. Why not use data from FlusherWriter to update estimations? Flusher knows > number of ops and serialized size after sstable is written to disk. These > values should be used for updating memtable live/serialized ratio. The problem is tracking the live memory usage. Ops count and serialised bytes are tracked by the memtable, not that serialised bytes is the throughput bytes no the amount that will be written to disk. > INFO [OptionalTasks:1] 2012-03-23 09:33:51,796 ColumnFamilyStore.java (line > 704) Enqueuing flush of Memtable-ipbans@481336682(1317041/105363280 > serialized/live bytes, 16755 ops) > ** Here should be noted that live/serialized size is ESTIMATED!! ** serialised is the serialised by throughput for the memtable, including overwrites. The ratio here is a strange 105363280 100.48 MB / 1317041 / 1.26 Mb = 80. The live ratio is capped at 64. Can you see any log messages about the live ratio for this CF ? > INFO [FlushWriter:314] 2012-03-23 09:33:51,799 Memtable.java (line 283) > Completed flushing /var/lib/cassandra/data/whois/ipbans-hc-16775-Data.db > (1355 bytes) Small file may be the result of a lot of overwrites and something odd happening with the live ratio. Is compression on ? Cheers ----------------- Aaron Morton Freelance Developer @aaronmorton http://www.thelastpickle.com On 23/03/2012, at 9:44 PM, Radim Kolar wrote: > I wonder why are memtable estimations so bad. > > 1. its not possible to run them more often? There should be some limit - run > live/serialized calculation at least once per hour. They took just few > seconds. > 2. Why not use data from FlusherWriter to update estimations? Flusher knows > number of ops and serialized size after sstable is written to disk. These > values should be used for updating memtable live/serialized ratio. > > INFO [OptionalTasks:1] 2012-03-23 09:33:51,765 MeteredFlusher.java (line 62) > flushing high-traffic column family CFS(Keyspace='whois', > ColumnFamily='ipbans') (estimated 105363280 bytes) > INFO [OptionalTasks:1] 2012-03-23 09:33:51,796 ColumnFamilyStore.java (line > 704) Enqueuing flush of Memtable-ipbans@481336682(1317041/105363280 > serialized/live bytes, 16755 ops) > ** Here should be noted that live/serialized size is ESTIMATED!! ** > INFO [FlushWriter:314] 2012-03-23 09:33:51,796 Memtable.java (line 246) > Writing Memtable-ipbans@481336682(1317041/105363280 serialized/live bytes, > 16755 ops) > INFO [FlushWriter:314] 2012-03-23 09:33:51,799 Memtable.java (line 283) > Completed flushing /var/lib/cassandra/data/whois/ipbans-hc-16775-Data.db > (1355 bytes) >