cool, thanks. -david
On Apr 4, 2012, at 1:01 AM, Jonathan Ellis wrote: > You need more than column_index_size_in_kb worth of column data for it > to generate row header index entries. We have a cassandra.yaml in > test/conf that sets that extra low, to 4, to make that easier. "ant > test" sets up the environment to point to that yaml, but if you're > running it from your IDE you might be missing that. > > Assuming that's working correctly, TableTest.testGetSliceFromLarge is > a relevant example. In particular, note this part: > > ArrayList<IndexHelper.IndexInfo> indexes = > IndexHelper.deserializeIndex(file); > assert indexes.size() > 2; > > On Tue, Apr 3, 2012 at 6:23 PM, David Alves <davidral...@gmail.com> wrote: >> Hi >> >> Jonathan: Thanks for the tip. Although the first option I proposed >> would not incur in that penalty it would not take advantage of the columns >> index for the middle ranges. >> >> On a related matter, I'm struggling to test the IndexedBlockFetcher >> implementation (SimpleBlockFetcher is working fine) as none of the tests in >> ColumnFamilyStoreTest seem to use it (rowIndexEntry.columnsIndex().isEmpty() >> is always true in ISR). Is there an easy way to make the columns index be >> built for testing? >> >> Cheers >> -david >> >> On Apr 3, 2012, at 5:58 AM, Jonathan Ellis wrote: >> >>> That would work, but I think the best approach would actually push >>> multiple ranges down into ISR itself, otherwise you could waste a lot >>> of time reading the row header redundantly (the >>> skipBloomFilter/deserializeIndex part). >>> >>> The tricky part would be getting IndexedBlockFetcher to not do extra >>> work in the case where the ranges's index blocks overlap -- in other >>> words, best of both worlds where we "skip ahead" when the index says >>> we can at the end of one range, but doing a seq scan when that is more >>> efficient. >>> >>> (Here's where I admit that I've asked several people to implement 3885 >>> as a technical interview problem for DataStax. For the purposes of >>> that interview, this last part is optional.) >>> >>> On Mon, Apr 2, 2012 at 11:19 PM, David Alves <davidral...@gmail.com> wrote: >>>> Hi guys >>>> >>>> I'm a PhD student and I'm trying to dip my feet in the water wrt to >>>> cassandra development, as I'm a long time fan. >>>> I'm implementing CASSANDRA-3885 which pertains to supporting >>>> returning multiple slices of a row. >>>> >>>> After looking around at the portion of the code that is involved >>>> two implementation options come to mind and I'd like to get feedback from >>>> you on whichever you think might work best (or even if I'm in the right >>>> track). >>>> >>>> As a first approach I simply subclassed SliceQueryFilter (setting >>>> start and finish to firstRange.start and lastRange.finish) and made the >>>> subclass not return the elements in between the ranges (spinning to the >>>> first element of the next range whenever the final element of the previous >>>> was found). This approach only uses one IndexedSliceReader but it scans >>>> from firstRange.start to lastRange.finish. >>>> >>>> Still when I was finishing It came to mind that in cases where the >>>> filter's selectivity is very low i.e., the ranges are a sparse selection >>>> of the total number of columns, I might be doing a full row scan for >>>> nothing, so another option came to mind: an iterator of iterators where I >>>> use multiple IndexedSliceReader's for each of the required slice ranges >>>> and simply iterate though them. >>>> >>>> Which do you think is the better option? Am I making any sense, or >>>> am I completely off track? >>>> >>>> Any help would be greatly appreciated. >>>> >>>> Cheers >>>> David Ribeiro Alves >>>> >>>> >>>> >>> >>> >>> >>> -- >>> Jonathan Ellis >>> Project Chair, Apache Cassandra >>> co-founder of DataStax, the source for professional Cassandra support >>> http://www.datastax.com >> > > > > -- > Jonathan Ellis > Project Chair, Apache Cassandra > co-founder of DataStax, the source for professional Cassandra support > http://www.datastax.com