SSTables are immutable - once they're written to disk, they cannot be changed.

On read C* checks *all* SSTables [1], but to make it faster, it uses Bloom Filters, that can tell you if a row is *not* in a specific SSTable, so you don't have to read it at all. However, *if* you read it in case you have to, you don't read a whole SSTable - there's an in-memory Index Sample, that is used for binary search and returning only a (relatively) small block of real (full, on-disk) index, which you have to scan to find a place to retrieve the data from SSTable. Additionally you have a KeyCache to make reads faster - it points location of data in SSTable, so you don't have to touch Index Sample and Index at all.

Once C* retrieves all data "parts" (including the Memtable part), timestamps are used to find the most recent version of data.

[1] I believe that it's not true for all cases, as I saw a piece of code somewhere in the source, that starts checking SSTables in order from the newest to the oldest one (in terms of data timestamps - AFAIR SSTable MetaData stores info about smallest and largest timestamp in SSTable), and once the newest data for all columns are retrieved (assuming that schema is defined), retrieving data stops and older SSTables are not checked. If someone could confirm that it works this way and it's not something that I saw in my dream and now believe it's real, I'd be glad ;-)

W dniu 17.07.2013 22:58, S Ahmed pisze:
Since SSTables are mutable, and they are ordered, does this mean that there
is a index of key ranges that each SS table holds, and the value could be 1
more sstables that have to be scanned and then the latest one is chosen?

e.g. Say I write a value "abc" to CF1.  This gets stored in a sstable.

Then I write "def" to CF1, this gets stored in another sstable eventually.

How when I go to fetch the value, it has to scan 2 sstables and then figure
out which is the latest entry correct?

So is there an index of key's to sstables, and there can be 1 or more
sstables per key?

(This is assuming compaction hasn't occurred yet).


Reply via email to