> It is slowly dawning on me that I need a super-column to use column blooms 
> effectively and at the same time don't want the entire sub-column list 
> deserialized. 
Queries by name use the row level bloom filter, regardless of the CF type. 

> In fact, for my use-case I also do not need a column sampling index. Rather I 
> would much prefer a multi-level skip-list
Are you thinking about performance or functionality ? If it's performance do 
you have an example of something that needs optimisation ?

> Is there a way to customize how cassandra writes/reads it's key/column 
> indexes to SSTables.
No.

Cheers

-----------------
Aaron Morton
Freelance Developer
@aaronmorton
http://www.thelastpickle.com

On 18/09/2012, at 2:44 AM, Ravikumar Govindarajan 
<ravikumar.govindara...@gmail.com> wrote:

> Yes Aaron, I was not clear about Bloom Filters. I was thinking about the 
> column bloom filters when I specify an absolute value for Part1 of the 
> composite column and a start/end value for Part2 of the composite column
> 
> It is slowly dawning on me that I need a super-column to use column blooms 
> effectively and at the same time don't want the entire sub-column list 
> deserialized. 
> 
> In fact, for my use-case I also do not need a column sampling index. Rather I 
> would much prefer a multi-level skip-list
> 
> Is there a way to customize how cassandra writes/reads it's key/column 
> indexes to SSTables. Any hooks/API that is available as of now should be 
> greatly helpful
> 
> On Fri, Sep 14, 2012 at 10:33 AM, aaron morton <aa...@thelastpickle.com> 
> wrote:
>> Range queries do not use bloom filters. 
> Are you talking about row range queries ? Or a slice of columns in a row ? 
> 
> If you are getting a slice of columns from a single row, a bloom filter is 
> used to locate the row. 
> If you are getting a slice of columns from a range of rows, the bloom filter 
> is used to locate the first row. After that is a scan. 
> 
> There are also row level bloom filters for columns on a row. These are used 
> when you columns by names. If you are doing a slice with a start the bloom 
> filter is not used, instead the row level column index is used (if present). 
> 
> Hope that helps. 
> 
> 
> -----------------
> Aaron Morton
> Freelance Developer
> @aaronmorton
> http://www.thelastpickle.com
> 
> On 13/09/2012, at 2:30 AM, Ravikumar Govindarajan 
> <ravikumar.govindara...@gmail.com> wrote:
> 
>> Thanks for the clarification. Even though compression solves disk space 
>> issue, we might still have Memtable bloat right?
>> 
>> There is another issue to be handled for us. The queries are always going to 
>> be range queries with absolute match on part1 and range on part 2 of the 
>> composite columns
>> 
>> Ex: Query <some-key> <Column-part-1> <Start-Id-part-2> <Limit> 
>> 
>> Range queries do not use bloom filters. It holds good for composite-columns 
>> also right? I believe I will end up writing BF bytes only to skip it later.
>> 
>> If sharing had been possible, then <Column-part-1> alone could have gone 
>> into the bloom-filter, speeding up my queries really effectively.
>> 
>> But as I understand, there are many levels of nesting possible in a 
>> composite type and casing at every level is a big task
>> 
>> May be casing for the top-level or the first-part should be a good start?
>> 
>> --
>> Ravi
>> 
>> On Wed, Sep 12, 2012 at 5:46 PM, Sylvain Lebresne <sylv...@datastax.com> 
>> wrote:
>> > Is every <string>/<id> combination stored separately in disk
>> 
>> Yes, each combination is stored separately on disk (the storage engine
>> itself doesn't have special casing for composite column, at least not
>> yet). But as far as disk space is concerned, I suspect that sstable
>> compression makes this largely a non issue.
>> 
>> --
>> Sylvain
>> 
> 
> 

Reply via email to