> >> worst case is 2 or 3, depending on row size:
> >>
> >> one seek to read the right row index block
> >> one seek to read the row header (bloom filter + column index)
> >> if it's a big row, one seek to read the column block (block size is
> >> configurable, default is 256KB)
> >
> > [This is all per-sstable that contains the row]
>

I'm confused. That's really worst-case? 3 iops?

What if we have 10B rows in the column family? What sort of index do you use
that would only require one iop to find the row index block?

And what about multiple revisions of data, ie: if there were N updates and M
deletes on the key before a major compaction? And what about Bloom Filter
false positives? What if the client asks a node that doesn't have the key?
None of those cause iops?

Forgive my naïveté, but having worked with large datasets all my life, I'm
having a really hard time wrapping my head around what sort of data
structures and cluster layout would allow you to retrieve data in so few
iops.

-- 
timeless(ness)

Reply via email to