thanks for the input. if that's the case, I think the solution would be to sort the CFs to flush by a more complex criteria than just size. for example the number of dirty commit logs that contain this CF should be considered as a score.
Yang On Thu, Sep 22, 2011 at 10:40 PM, Philippe <watche...@gmail.com> wrote: > It sure looks like what I'm seeing on my cluster where a 100G commit lot > partition fills up in 12 hours (0.8.x) > > Le 23 sept. 2011 03:45, "Yang" <teddyyyy...@gmail.com> a écrit : >> in 1.0.0 we don't have memtable_throughput for each individual CF , >> and instead >> which memtable/CF to flush is determined by "largest >> getTotalMemtableLiveSize() ". >> (MeteredFlusher.java line 81) >> >> >> what would happen in the following case ? : I have only 2 CF, the >> traffic for one CF is 1000 times that >> of the second CF, >> so the high-traffic CF constantly triggers total mem threshold , and >> every time, the busy CF is flushed. >> >> but the light-traffic CF is never flushed ( well, until we have >> flushed about 1000 times the busy CF), >> now we are left with many commit logs , each of them containing a few >> entries for the light-traffic table. we have to keep these commit logs >> because these entries are not flushed to sstable yet. >> >> then there are 2 problems: 1) to persist the few records from the >> light-traffic CF, you have to keep 1000 times the commit logs >> necessary, taking up disk space 2) when you do a recover on server >> restart, you'll have to read through all those commit logs . >> >> does the above hypothesis sound right? >> >> Thanks >> Yang >