That would work, but I think the best approach would actually push multiple ranges down into ISR itself, otherwise you could waste a lot of time reading the row header redundantly (the skipBloomFilter/deserializeIndex part).
The tricky part would be getting IndexedBlockFetcher to not do extra work in the case where the ranges's index blocks overlap -- in other words, best of both worlds where we "skip ahead" when the index says we can at the end of one range, but doing a seq scan when that is more efficient. (Here's where I admit that I've asked several people to implement 3885 as a technical interview problem for DataStax. For the purposes of that interview, this last part is optional.) On Mon, Apr 2, 2012 at 11:19 PM, David Alves <davidral...@gmail.com> wrote: > Hi guys > > I'm a PhD student and I'm trying to dip my feet in the water wrt to > cassandra development, as I'm a long time fan. > I'm implementing CASSANDRA-3885 which pertains to supporting returning > multiple slices of a row. > > After looking around at the portion of the code that is involved two > implementation options come to mind and I'd like to get feedback from you on > whichever you think might work best (or even if I'm in the right track). > > As a first approach I simply subclassed SliceQueryFilter (setting > start and finish to firstRange.start and lastRange.finish) and made the > subclass not return the elements in between the ranges (spinning to the first > element of the next range whenever the final element of the previous was > found). This approach only uses one IndexedSliceReader but it scans from > firstRange.start to lastRange.finish. > > Still when I was finishing It came to mind that in cases where the > filter's selectivity is very low i.e., the ranges are a sparse selection of > the total number of columns, I might be doing a full row scan for nothing, so > another option came to mind: an iterator of iterators where I use multiple > IndexedSliceReader's for each of the required slice ranges and simply iterate > though them. > > Which do you think is the better option? Am I making any sense, or am > I completely off track? > > Any help would be greatly appreciated. > > Cheers > David Ribeiro Alves > > > -- Jonathan Ellis Project Chair, Apache Cassandra co-founder of DataStax, the source for professional Cassandra support http://www.datastax.com