SSTables are immutable - once they're written to disk, they cannot be
changed.
On read C* checks *all* SSTables [1], but to make it faster, it uses
Bloom Filters, that can tell you if a row is *not* in a specific
SSTable, so you don't have to read it at all. However, *if* you read it
in case you have to, you don't read a whole SSTable - there's an
in-memory Index Sample, that is used for binary search and returning
only a (relatively) small block of real (full, on-disk) index, which you
have to scan to find a place to retrieve the data from SSTable.
Additionally you have a KeyCache to make reads faster - it points
location of data in SSTable, so you don't have to touch Index Sample and
Index at all.
Once C* retrieves all data "parts" (including the Memtable part),
timestamps are used to find the most recent version of data.
[1] I believe that it's not true for all cases, as I saw a piece of code
somewhere in the source, that starts checking SSTables in order from the
newest to the oldest one (in terms of data timestamps - AFAIR SSTable
MetaData stores info about smallest and largest timestamp in SSTable),
and once the newest data for all columns are retrieved (assuming that
schema is defined), retrieving data stops and older SSTables are not
checked. If someone could confirm that it works this way and it's not
something that I saw in my dream and now believe it's real, I'd be glad ;-)
W dniu 17.07.2013 22:58, S Ahmed pisze:
Since SSTables are mutable, and they are ordered, does this mean that there
is a index of key ranges that each SS table holds, and the value could be 1
more sstables that have to be scanned and then the latest one is chosen?
e.g. Say I write a value "abc" to CF1. This gets stored in a sstable.
Then I write "def" to CF1, this gets stored in another sstable eventually.
How when I go to fetch the value, it has to scan 2 sstables and then figure
out which is the latest entry correct?
So is there an index of key's to sstables, and there can be 1 or more
sstables per key?
(This is assuming compaction hasn't occurred yet).