Thanks! :-)
M.
W dniu 18.07.2013 08:42, Jean-Armel Luce pisze:
@Michal : look a this for the improvement of read performance :
https://issues.apache.org/jira/browse/CASSANDRA-2498
Best regards.
Jean Armel
2013/7/18 Michał Michalski <mich...@opera.com>
SSTables are immutable - once they're written to disk, they cannot be
changed.
On read C* checks *all* SSTables [1], but to make it faster, it uses Bloom
Filters, that can tell you if a row is *not* in a specific SSTable, so you
don't have to read it at all. However, *if* you read it in case you have
to, you don't read a whole SSTable - there's an in-memory Index Sample,
that is used for binary search and returning only a (relatively) small
block of real (full, on-disk) index, which you have to scan to find a
place to retrieve the data from SSTable. Additionally you have a KeyCache
to make reads faster - it points location of data in SSTable, so you don't
have to touch Index Sample and Index at all.
Once C* retrieves all data "parts" (including the Memtable part),
timestamps are used to find the most recent version of data.
[1] I believe that it's not true for all cases, as I saw a piece of code
somewhere in the source, that starts checking SSTables in order from the
newest to the oldest one (in terms of data timestamps - AFAIR SSTable
MetaData stores info about smallest and largest timestamp in SSTable), and
once the newest data for all columns are retrieved (assuming that schema is
defined), retrieving data stops and older SSTables are not checked. If
someone could confirm that it works this way and it's not something that I
saw in my dream and now believe it's real, I'd be glad ;-)
W dniu 17.07.2013 22:58, S Ahmed pisze:
Since SSTables are mutable, and they are ordered, does this mean that
there
is a index of key ranges that each SS table holds, and the value could be
1
more sstables that have to be scanned and then the latest one is chosen?
e.g. Say I write a value "abc" to CF1. This gets stored in a sstable.
Then I write "def" to CF1, this gets stored in another sstable eventually.
How when I go to fetch the value, it has to scan 2 sstables and then
figure
out which is the latest entry correct?
So is there an index of key's to sstables, and there can be 1 or more
sstables per key?
(This is assuming compaction hasn't occurred yet).