thanks for the input.

if that's the case, I think the solution would be to sort the CFs to
flush by a more complex criteria than just size. for example the
number of dirty commit logs that contain this CF should be considered
as a score.

Yang

On Thu, Sep 22, 2011 at 10:40 PM, Philippe <watche...@gmail.com> wrote:
> It sure looks like what I'm seeing on my cluster where a 100G commit lot
> partition fills up in 12 hours (0.8.x)
>
> Le 23 sept. 2011 03:45, "Yang" <teddyyyy...@gmail.com> a écrit :
>> in 1.0.0 we don't have memtable_throughput for each individual CF ,
>> and instead
>> which memtable/CF to flush is determined by "largest
>> getTotalMemtableLiveSize() ".
>> (MeteredFlusher.java line 81)
>>
>>
>> what would happen in the following case ? : I have only 2 CF, the
>> traffic for one CF is 1000 times that
>> of the second CF,
>> so the high-traffic CF constantly triggers total mem threshold , and
>> every time, the busy CF is flushed.
>>
>> but the light-traffic CF is never flushed ( well, until we have
>> flushed about 1000 times the busy CF),
>> now we are left with many commit logs , each of them containing a few
>> entries for the light-traffic table. we have to keep these commit logs
>> because these entries are not flushed to sstable yet.
>>
>> then there are 2 problems: 1) to persist the few records from the
>> light-traffic CF, you have to keep 1000 times the commit logs
>> necessary, taking up disk space 2) when you do a recover on server
>> restart, you'll have to read through all those commit logs .
>>
>> does the above hypothesis sound right?
>>
>> Thanks
>> Yang
>

Reply via email to