Re: Bitmap indexes - reviving CASSANDRA-1472

Edward Capriolo Fri, 12 Apr 2013 08:04:38 -0700

I am not sure about the collection case. But for compact storage you can
specify multiple-ranges in a slice query.


https://issues.apache.org/jira/browse/CASSANDRA-3885

I am not sure this will get you all the way to bit-map indexes but in a
wide row scenario it seems like you could support a "event contains 1 or
event contains 2 or event contains 3"

I am not sure how arbitrarily complex the CQL query handler can/will
become. For intravert (something I am dabling with) the concept is to apply
a server side function to the result of a slice.

https://github.com/zznate/intravert-ug/wiki/Filter-mode

There is a huge win in having multiple indexes behind the plugable index
support, not all of the plugable indexes and query options will be easy to
CQL-ify.




On Fri, Apr 12, 2013 at 10:52 AM, Jonathan Ellis <jbel...@gmail.com> wrote:

> Something like this?
>
> SELECT * FROM users
> WHERE user_id IN (select user_id from events where type in (1, 2, 3))
>   AND user_id NOT IN (select user_id from events where type=4)
>
> This doesn't really look like a Cassandra query to me.  More like a
> query for Hive (or Drill, or Impala).
>
> But, I know Sylvain is looking forward to adding index support to
> Collections [1], so something like this might fit:
>
> SELECT * FROM users
> WHERE (events CONTAINS 1 OR events CONTAINS 2 OR events CONTAINS 3)
>    AND NOT (events CONTAINS 4)
>
> However, even this is more than our current query planner can handle;
> we don't really handle disjunctions at all, except for the special
> case of IN on the partition key (which translates to multiget), let
> alone arbitrary logical predicates.
>
> I think that between "bitmap indexes" and "query planning," the latter
> is actually the hard part.  QueryProcessor is about at the limits of
> tractable complexity already; I think we'd need a new approach if we
> want to handle arbitrarily complex predicates like that.
>
> [1] https://issues.apache.org/jira/browse/CASSANDRA-4511
>
>
> On Wed, Apr 10, 2013 at 4:40 PM, mrevilgnome <mrevilgn...@gmail.com>
> wrote:
> > What do you think about set manipulation via indexes in Cassandra? I'm
> > interested in answering queries such as give me all users that performed
> > event 1, 2, and 3, but not 4. If the answer is yes than I can make a case
> > for spending my time on C*. The only downside for us would be our current
> > prototype is in C++ so we would loose some performance and the ability to
> > dedicate an entire machine to caching/performing queries.
> >
> >
> > On Wed, Apr 10, 2013 at 11:57 AM, Jonathan Ellis <jbel...@gmail.com>
> wrote:
> >
> >> If you mean, "Can someone help me figure out how to get started updating
> >> these old patches to trunk and cleaning out the Avro?" then yes, I've
> been
> >> knee-deep in indexing code recently.
> >>
> >>
> >> On Wed, Apr 10, 2013 at 11:34 AM, mrevilgnome <mrevilgn...@gmail.com>
> >> wrote:
> >>
> >> > I'm currently building a distributed cluster on top of cassandra to
> >> perform
> >> > fast set manipulation via bitmap indexes. This gives me the ability to
> >> > perform unions, intersections, and set subtraction across sub-queries.
> >> > Currently I'm storing index information for thousands of dimensions as
> >> > cassandra rows, and my cluster keeps this information cached,
> distributed
> >> > and replicated in order to answer queries.
> >> >
> >> > Every couple of days I think to myself this should really exist in C*.
> >> > Given all the benifits would there be any interest in
> >> > reviving CASSANDRA-1472?
> >> >
> >> > Some downsides are that this is very memory intensive, even for sparse
> >> > bitmaps.
> >> >
> >>
> >>
> >>
> >> --
> >> Jonathan Ellis
> >> Project Chair, Apache Cassandra
> >> co-founder, http://www.datastax.com
> >> @spyced
> >>
>
>
>
> --
> Jonathan Ellis
> Project Chair, Apache Cassandra
> co-founder, http://www.datastax.com
> @spyced
>

Re: Bitmap indexes - reviving CASSANDRA-1472

Reply via email to