Re: Bitmap indexes - reviving CASSANDRA-1472

Jonathan Ellis Fri, 12 Apr 2013 07:52:55 -0700

Something like this?

SELECT * FROM users
WHERE user_id IN (select user_id from events where type in (1, 2, 3))
  AND user_id NOT IN (select user_id from events where type=4)


This doesn't really look like a Cassandra query to me.  More like a
query for Hive (or Drill, or Impala).

But, I know Sylvain is looking forward to adding index support to
Collections [1], so something like this might fit:

SELECT * FROM users
WHERE (events CONTAINS 1 OR events CONTAINS 2 OR events CONTAINS 3)
   AND NOT (events CONTAINS 4)

However, even this is more than our current query planner can handle;
we don't really handle disjunctions at all, except for the special
case of IN on the partition key (which translates to multiget), let
alone arbitrary logical predicates.

I think that between "bitmap indexes" and "query planning," the latter
is actually the hard part.  QueryProcessor is about at the limits of
tractable complexity already; I think we'd need a new approach if we
want to handle arbitrarily complex predicates like that.

[1] https://issues.apache.org/jira/browse/CASSANDRA-4511


On Wed, Apr 10, 2013 at 4:40 PM, mrevilgnome <mrevilgn...@gmail.com> wrote:
> What do you think about set manipulation via indexes in Cassandra? I'm
> interested in answering queries such as give me all users that performed
> event 1, 2, and 3, but not 4. If the answer is yes than I can make a case
> for spending my time on C*. The only downside for us would be our current
> prototype is in C++ so we would loose some performance and the ability to
> dedicate an entire machine to caching/performing queries.
>
>
> On Wed, Apr 10, 2013 at 11:57 AM, Jonathan Ellis <jbel...@gmail.com> wrote:
>
>> If you mean, "Can someone help me figure out how to get started updating
>> these old patches to trunk and cleaning out the Avro?" then yes, I've been
>> knee-deep in indexing code recently.
>>
>>
>> On Wed, Apr 10, 2013 at 11:34 AM, mrevilgnome <mrevilgn...@gmail.com>
>> wrote:
>>
>> > I'm currently building a distributed cluster on top of cassandra to
>> perform
>> > fast set manipulation via bitmap indexes. This gives me the ability to
>> > perform unions, intersections, and set subtraction across sub-queries.
>> > Currently I'm storing index information for thousands of dimensions as
>> > cassandra rows, and my cluster keeps this information cached, distributed
>> > and replicated in order to answer queries.
>> >
>> > Every couple of days I think to myself this should really exist in C*.
>> > Given all the benifits would there be any interest in
>> > reviving CASSANDRA-1472?
>> >
>> > Some downsides are that this is very memory intensive, even for sparse
>> > bitmaps.
>> >
>>
>>
>>
>> --
>> Jonathan Ellis
>> Project Chair, Apache Cassandra
>> co-founder, http://www.datastax.com
>> @spyced
>>



-- 
Jonathan Ellis
Project Chair, Apache Cassandra
co-founder, http://www.datastax.com
@spyced

Re: Bitmap indexes - reviving CASSANDRA-1472

Reply via email to