Yes Aaron, I was not clear about Bloom Filters. I was thinking about the column bloom filters when I specify an absolute value for Part1 of the composite column and a start/end value for Part2 of the composite column
It is slowly dawning on me that I need a super-column to use column blooms effectively and at the same time don't want the entire sub-column list deserialized. In fact, for my use-case I also do not need a column sampling index. Rather I would much prefer a multi-level skip-list Is there a way to customize how cassandra writes/reads it's key/column indexes to SSTables. Any hooks/API that is available as of now should be greatly helpful On Fri, Sep 14, 2012 at 10:33 AM, aaron morton <aa...@thelastpickle.com>wrote: > Range queries do not use bloom filters. > > Are you talking about row range queries ? Or a slice of columns in a row ? > > If you are getting a slice of columns from a single row, a bloom filter is > used to locate the row. > If you are getting a slice of columns from a range of rows, the bloom > filter is used to locate the first row. After that is a scan. > > There are also row level bloom filters for columns on a row. These are > used when you columns by names. If you are doing a slice with a start the > bloom filter is not used, instead the row level column index is used (if > present). > > Hope that helps. > > > ----------------- > Aaron Morton > Freelance Developer > @aaronmorton > http://www.thelastpickle.com > > On 13/09/2012, at 2:30 AM, Ravikumar Govindarajan < > ravikumar.govindara...@gmail.com> wrote: > > Thanks for the clarification. Even though compression solves disk space > issue, we might still have Memtable bloat right? > > There is another issue to be handled for us. The queries are always going > to be range queries with absolute match on part1 and range on part 2 of the > composite columns > > Ex: Query <some-key> <Column-part-1> <Start-Id-part-2> <Limit> > > Range queries do not use bloom filters. It holds good for > composite-columns also right? I believe I will end up writing BF bytes only > to skip it later. > > If sharing had been possible, then <Column-part-1> alone could have gone > into the bloom-filter, speeding up my queries really effectively. > > But as I understand, there are many levels of nesting possible in a > composite type and casing at every level is a big task > > May be casing for the top-level or the first-part should be a good start? > > -- > Ravi > > On Wed, Sep 12, 2012 at 5:46 PM, Sylvain Lebresne <sylv...@datastax.com>wrote: > >> > Is every <string>/<id> combination stored separately in disk >> >> Yes, each combination is stored separately on disk (the storage engine >> itself doesn't have special casing for composite column, at least not >> yet). But as far as disk space is concerned, I suspect that sstable >> compression makes this largely a non issue. >> >> -- >> Sylvain >> > > >