Re: implementation choice with regard to multiple range slice query filters

David Alves Tue, 03 Apr 2012 17:23:02 -0700

cool, thanks.

-david


On Apr 4, 2012, at 1:01 AM, Jonathan Ellis wrote:

> You need more than column_index_size_in_kb worth of column data for it
> to generate row header index entries.  We have a cassandra.yaml in
> test/conf that sets that extra low, to 4, to make that easier.  "ant
> test" sets up the environment to point to that yaml, but if you're
> running it from your IDE you might be missing that.
> 
> Assuming that's working correctly, TableTest.testGetSliceFromLarge is
> a relevant example.  In particular, note this part:
> 
>        ArrayList<IndexHelper.IndexInfo> indexes =
> IndexHelper.deserializeIndex(file);
>        assert indexes.size() > 2;
> 
> On Tue, Apr 3, 2012 at 6:23 PM, David Alves <davidral...@gmail.com> wrote:
>> Hi
>> 
>>        Jonathan: Thanks for the tip. Although the first option I proposed 
>> would not incur in that penalty it would not take advantage of the columns 
>> index for the middle ranges.
>> 
>>        On a related matter, I'm struggling to test the IndexedBlockFetcher 
>> implementation (SimpleBlockFetcher is working fine) as none of the tests in 
>> ColumnFamilyStoreTest seem to use it (rowIndexEntry.columnsIndex().isEmpty() 
>> is always true in ISR). Is there an easy way to make the columns index be 
>> built for testing?
>> 
>> Cheers
>> -david
>> 
>> On Apr 3, 2012, at 5:58 AM, Jonathan Ellis wrote:
>> 
>>> That would work, but I think the best approach would actually push
>>> multiple ranges down into ISR itself, otherwise you could waste a lot
>>> of time reading the row header redundantly (the
>>> skipBloomFilter/deserializeIndex part).
>>> 
>>> The tricky part would be getting IndexedBlockFetcher to not do extra
>>> work in the case where the ranges's index blocks overlap -- in other
>>> words, best of both worlds where we "skip ahead" when the index says
>>> we can at the end of one range, but doing a seq scan when that is more
>>> efficient.
>>> 
>>> (Here's where I admit that I've asked several people to implement 3885
>>> as a technical interview problem for DataStax.  For the purposes of
>>> that interview, this last part is optional.)
>>> 
>>> On Mon, Apr 2, 2012 at 11:19 PM, David Alves <davidral...@gmail.com> wrote:
>>>> Hi guys
>>>> 
>>>>        I'm a PhD student and I'm trying to dip my feet in the water wrt to 
>>>> cassandra development, as I'm a long time fan.
>>>>        I'm implementing CASSANDRA-3885 which pertains to supporting 
>>>> returning multiple slices of a row.
>>>> 
>>>>        After looking around at the portion of the code that is involved 
>>>> two implementation options come to mind and I'd like to get feedback from 
>>>> you on whichever you think might work best (or even if I'm in the right 
>>>> track).
>>>> 
>>>>        As a first approach I simply subclassed SliceQueryFilter (setting 
>>>> start and finish to firstRange.start and lastRange.finish) and made the 
>>>> subclass not return the elements in between the ranges (spinning to the 
>>>> first element of the next range whenever the final element of the previous 
>>>> was found). This approach only uses one IndexedSliceReader but it scans 
>>>> from firstRange.start to lastRange.finish.
>>>> 
>>>>        Still when I was finishing It came to mind that in cases where the 
>>>> filter's selectivity is very low i.e., the ranges are a sparse selection 
>>>> of the total number of columns, I might be doing a full row scan for 
>>>> nothing, so another option came to mind: an iterator of iterators where I 
>>>> use multiple IndexedSliceReader's for each of the required slice ranges 
>>>> and simply iterate though them.
>>>> 
>>>>        Which do you think is the better option? Am I making any sense, or 
>>>> am I completely off track?
>>>> 
>>>>        Any help would be greatly appreciated.
>>>> 
>>>> Cheers
>>>> David Ribeiro Alves
>>>> 
>>>> 
>>>> 
>>> 
>>> 
>>> 
>>> --
>>> Jonathan Ellis
>>> Project Chair, Apache Cassandra
>>> co-founder of DataStax, the source for professional Cassandra support
>>> http://www.datastax.com
>> 
> 
> 
> 
> -- 
> Jonathan Ellis
> Project Chair, Apache Cassandra
> co-founder of DataStax, the source for professional Cassandra support
> http://www.datastax.com

Re: implementation choice with regard to multiple range slice query filters

Reply via email to