> It is slowly dawning on me that I need a super-column to use column blooms > effectively and at the same time don't want the entire sub-column list > deserialized. Queries by name use the row level bloom filter, regardless of the CF type.
> In fact, for my use-case I also do not need a column sampling index. Rather I > would much prefer a multi-level skip-list Are you thinking about performance or functionality ? If it's performance do you have an example of something that needs optimisation ? > Is there a way to customize how cassandra writes/reads it's key/column > indexes to SSTables. No. Cheers ----------------- Aaron Morton Freelance Developer @aaronmorton http://www.thelastpickle.com On 18/09/2012, at 2:44 AM, Ravikumar Govindarajan <ravikumar.govindara...@gmail.com> wrote: > Yes Aaron, I was not clear about Bloom Filters. I was thinking about the > column bloom filters when I specify an absolute value for Part1 of the > composite column and a start/end value for Part2 of the composite column > > It is slowly dawning on me that I need a super-column to use column blooms > effectively and at the same time don't want the entire sub-column list > deserialized. > > In fact, for my use-case I also do not need a column sampling index. Rather I > would much prefer a multi-level skip-list > > Is there a way to customize how cassandra writes/reads it's key/column > indexes to SSTables. Any hooks/API that is available as of now should be > greatly helpful > > On Fri, Sep 14, 2012 at 10:33 AM, aaron morton <aa...@thelastpickle.com> > wrote: >> Range queries do not use bloom filters. > Are you talking about row range queries ? Or a slice of columns in a row ? > > If you are getting a slice of columns from a single row, a bloom filter is > used to locate the row. > If you are getting a slice of columns from a range of rows, the bloom filter > is used to locate the first row. After that is a scan. > > There are also row level bloom filters for columns on a row. These are used > when you columns by names. If you are doing a slice with a start the bloom > filter is not used, instead the row level column index is used (if present). > > Hope that helps. > > > ----------------- > Aaron Morton > Freelance Developer > @aaronmorton > http://www.thelastpickle.com > > On 13/09/2012, at 2:30 AM, Ravikumar Govindarajan > <ravikumar.govindara...@gmail.com> wrote: > >> Thanks for the clarification. Even though compression solves disk space >> issue, we might still have Memtable bloat right? >> >> There is another issue to be handled for us. The queries are always going to >> be range queries with absolute match on part1 and range on part 2 of the >> composite columns >> >> Ex: Query <some-key> <Column-part-1> <Start-Id-part-2> <Limit> >> >> Range queries do not use bloom filters. It holds good for composite-columns >> also right? I believe I will end up writing BF bytes only to skip it later. >> >> If sharing had been possible, then <Column-part-1> alone could have gone >> into the bloom-filter, speeding up my queries really effectively. >> >> But as I understand, there are many levels of nesting possible in a >> composite type and casing at every level is a big task >> >> May be casing for the top-level or the first-part should be a good start? >> >> -- >> Ravi >> >> On Wed, Sep 12, 2012 at 5:46 PM, Sylvain Lebresne <sylv...@datastax.com> >> wrote: >> > Is every <string>/<id> combination stored separately in disk >> >> Yes, each combination is stored separately on disk (the storage engine >> itself doesn't have special casing for composite column, at least not >> yet). But as far as disk space is concerned, I suspect that sstable >> compression makes this largely a non issue. >> >> -- >> Sylvain >> > >