Re: SSTable format

Dave Brosius Fri, 13 Jul 2012 17:19:37 -0700

On 07/13/2012 08:00 PM, Michael Theroux wrote:

Hello,


I've been trying to understand in greater detail how SStables are stored, and 
how information is transferred between Cassandra nodes, especially when a new 
node is joining a cluster.

Specifically, Is information stored to SStables ordered by rowkeys?  Some of 
the articles I've read suggests this is the case (although it's a little vague 
if they actually mean that the columns are stored in order, not the rowkeys).  
However, if data is stored in rowkey order, how is this achieved, as sstables 
are immutable?

Thanks for any insights,
-Mike

It depends on what partitioner you use. You should be using theRandomPartitioner, and if so, the rows are sorted by the hash of the rowkey. there are partitioners that sort based on the raw key value butthese partitioners shouldn't be used as they have problems due to unevenpartitioning of data.

As for how this is done, remember an sstable doesn't hold all the datafor a column family. Not only does the data for a column family exist onmultiple servers, there are usually multiple sstable files on disk thatrepresent data from one column family on one machine. So at the time thesstable is written, the rows that are to be put in the sstable aresorted, and written in sorted order. In fact the same rowkey may bewritten in multiple sstables, one sstable having one set of columns forthe key, the other sstable having other columns for the same key.

On query for some row based on a key, cassandra is responsible forfinding where the columns are found in which sstables (potentiallyseveral) and merging the results.

Re: SSTable format

Reply via email to