Re: improving read performance

Carl Bruecken Mon, 20 Sep 2010 10:33:20 -0700


 On 9/20/10 12:47 PM, Peter Schuller wrote:

This drawback is unfortunate for systems that use time-based row keys.    In
such systems, row data will generally not be fragmented very much, if at
all, but reads suffer because the assumption is that all data is fragmented.
    Even further, in a real-time system where reads occur quickly after
writes, if the data is in memory, the sstables are still checked.

Perhaps I am misunderstanding you, but why is this a problem (in the
particular case of time based row keys) given that existence of the
bloom filters which should eliminate the need to go down to the
sstables to any extent more than that they actually contain data for
the row (in almost all cases, subject to bloom filter false
positives)?


Also, for the case of the edges where memtables are flushed, a
write-through row cache should help alleviate that. I forget off hand
whether the row cache is in fact write-through or not though.

Hi

Actually, the points you make are things I have overlooked and actuallymake me feel more comfortable about how cassandra will perform for myuse cases. I'm interested, in my case, to find out what the bloomfilter false-positive rate is. Hopefully, a stat is kept on this. Aslong as ALL of the bloom filters are in memory, the hit should beminimal for a false positive, since the index read should subsequentlyreveal the row to not be in the correspending SSTABLE.

Good point on the row cache. I had actually misread the comments inthe yaml, mistaking "do not use on ColumnFamilies with LARGE ROWS" , as"do not use on ColumnFamilies with a LARGE NUMBER OF ROWS". I don'tknow if this will improve performance much since I don't understand yetif this eliminates the need to check for the data in the SStables. Ifit doesn't then what is the point of the row cache since the data isalso in an in-memory memtable?

That aside, splitting the memtable in 2, could make checking the bloomfilters unnecessary in most cases for me, but I'm not sure it's worththe effort.

Re: improving read performance

Reply via email to