Re: implementation choice with regard to multiple range slice query filters

Jonathan Ellis Mon, 02 Apr 2012 21:58:56 -0700

That would work, but I think the best approach would actually push
multiple ranges down into ISR itself, otherwise you could waste a lot
of time reading the row header redundantly (the
skipBloomFilter/deserializeIndex part).


The tricky part would be getting IndexedBlockFetcher to not do extra
work in the case where the ranges's index blocks overlap -- in other
words, best of both worlds where we "skip ahead" when the index says
we can at the end of one range, but doing a seq scan when that is more
efficient.

(Here's where I admit that I've asked several people to implement 3885
as a technical interview problem for DataStax.  For the purposes of
that interview, this last part is optional.)

On Mon, Apr 2, 2012 at 11:19 PM, David Alves <davidral...@gmail.com> wrote:
> Hi guys
>
>        I'm a PhD student and I'm trying to dip my feet in the water wrt to 
> cassandra development, as I'm a long time fan.
>        I'm implementing CASSANDRA-3885 which pertains to supporting returning 
> multiple slices of a row.
>
>        After looking around at the portion of the code that is involved two 
> implementation options come to mind and I'd like to get feedback from you on 
> whichever you think might work best (or even if I'm in the right track).
>
>        As a first approach I simply subclassed SliceQueryFilter (setting 
> start and finish to firstRange.start and lastRange.finish) and made the 
> subclass not return the elements in between the ranges (spinning to the first 
> element of the next range whenever the final element of the previous was 
> found). This approach only uses one IndexedSliceReader but it scans from 
> firstRange.start to lastRange.finish.
>
>        Still when I was finishing It came to mind that in cases where the 
> filter's selectivity is very low i.e., the ranges are a sparse selection of 
> the total number of columns, I might be doing a full row scan for nothing, so 
> another option came to mind: an iterator of iterators where I use multiple 
> IndexedSliceReader's for each of the required slice ranges and simply iterate 
> though them.
>
>        Which do you think is the better option? Am I making any sense, or am 
> I completely off track?
>
>        Any help would be greatly appreciated.
>
> Cheers
> David Ribeiro Alves
>
>
>



-- 
Jonathan Ellis
Project Chair, Apache Cassandra
co-founder of DataStax, the source for professional Cassandra support
http://www.datastax.com

Re: implementation choice with regard to multiple range slice query filters

Reply via email to