improving read performance

Carl Bruecken Mon, 20 Sep 2010 07:49:32 -0700

The cassandra FAQ answers the question as to why reads are slower thanwrites as follows:


http://wiki.apache.org/cassandra/FAQ#reads_slower_writes

This drawback is unfortunate for systems that use time-based rowkeys. In such systems, row data will generally not be fragmented verymuch, if at all, but reads suffer because the assumption is that alldata is fragmented. Even further, in a real-time system where readsoccur quickly after writes, if the data is in memory, the sstables arestill checked.

I've been working on a patch that I hope will make read performancecomparable to write performance, if not faster in the cases where nodisk access is involved for the reads. The assumption is that for atime-based row key the data will be fragmented only at the edges ofmemtable flushes. Therefore, only 2 reads need occur either to thecurrent memtable in memory and the newest sstable, or 2 adjacentsstables. In the case of real-time reads, I've further split thesingle memtable into 2 memtables so that the 2 required reads willhappen against 2 memtables. The read algorithm is to search for thefirst fragment until it is found and then only read from the adjacentmemtable or sstable.

I haven't uncovered any showstoppers with this approach, yet. I'mhoping that by posting this message, someone might alert me if theydetect any flaws with this approach.

improving read performance

Reply via email to