The partition index is never updated, as sstables are immutable. On Tue, Mar 21, 2017 at 9:40 AM preetika tyagi <preetikaty...@gmail.com> wrote:
> Thank you Jan & Jeff for the responses. That was really useful. > > Jan - I have one follow-up question. When the data is spread over more > than one SSTable in case of updates as you mentioned, we will need two > seeks per SSTable (one for partition index and another for SSTable itself). > I'm curious to know how partition index is structured internally. I was > assuming it to be a table with <key, disk offset> pairs. In case of an > update to the same key for several times, how it is recorded in the > partition index? > > Thanks, > Preetika > > On Mon, Mar 20, 2017 at 10:37 PM, <j.kes...@enercast.de> wrote: > > Hi, > > > > youre right – one seek with hit in the partition key cache and two if not. > > > > Thats the theory – but two thinge to mention: > > > > First, you need two seeks per sstable not per entire read. So if you data > is spread over multiple sstables on disk you obviously need more then two > reads. Think of often updated partition keys – in combination with memory > preassure you can easily end up with maaany sstables (ok they will be > compacted some time in the future). > > > > Second, there could be fragmentation on disk which leads to seeks during > sequential reads. > > > > Jan > > > > Gesendet von meinem Windows 10 Phone > > > > *Von: *preetika tyagi <preetikaty...@gmail.com> > *Gesendet: *Montag, 20. März 2017 21:18 > *An: *user@cassandra.apache.org > *Betreff: *question on maximum disk seeks > > > > I'm trying to understand the maximum number of disk seeks required in a > read operation in Cassandra. I looked at several online articles including > this one: > https://docs.datastax.com/en/cassandra/3.0/cassandra/dml/dmlAboutReads.html > > As per my understanding, two disk seeks are required in the worst case. > One is for reading the partition index and another is to read the actual > data from the compressed partition. The index of the data in compressed > partitions is obtained from the compression offset tables (which is stored > in memory). Am I on the right track here? Will there ever be a case when > more than 1 disk seek is required to read the data? > > Thanks, > > Preetika > > > > > >