Something like this? SELECT * FROM users WHERE user_id IN (select user_id from events where type in (1, 2, 3)) AND user_id NOT IN (select user_id from events where type=4)
This doesn't really look like a Cassandra query to me. More like a query for Hive (or Drill, or Impala). But, I know Sylvain is looking forward to adding index support to Collections [1], so something like this might fit: SELECT * FROM users WHERE (events CONTAINS 1 OR events CONTAINS 2 OR events CONTAINS 3) AND NOT (events CONTAINS 4) However, even this is more than our current query planner can handle; we don't really handle disjunctions at all, except for the special case of IN on the partition key (which translates to multiget), let alone arbitrary logical predicates. I think that between "bitmap indexes" and "query planning," the latter is actually the hard part. QueryProcessor is about at the limits of tractable complexity already; I think we'd need a new approach if we want to handle arbitrarily complex predicates like that. [1] https://issues.apache.org/jira/browse/CASSANDRA-4511 On Wed, Apr 10, 2013 at 4:40 PM, mrevilgnome <mrevilgn...@gmail.com> wrote: > What do you think about set manipulation via indexes in Cassandra? I'm > interested in answering queries such as give me all users that performed > event 1, 2, and 3, but not 4. If the answer is yes than I can make a case > for spending my time on C*. The only downside for us would be our current > prototype is in C++ so we would loose some performance and the ability to > dedicate an entire machine to caching/performing queries. > > > On Wed, Apr 10, 2013 at 11:57 AM, Jonathan Ellis <jbel...@gmail.com> wrote: > >> If you mean, "Can someone help me figure out how to get started updating >> these old patches to trunk and cleaning out the Avro?" then yes, I've been >> knee-deep in indexing code recently. >> >> >> On Wed, Apr 10, 2013 at 11:34 AM, mrevilgnome <mrevilgn...@gmail.com> >> wrote: >> >> > I'm currently building a distributed cluster on top of cassandra to >> perform >> > fast set manipulation via bitmap indexes. This gives me the ability to >> > perform unions, intersections, and set subtraction across sub-queries. >> > Currently I'm storing index information for thousands of dimensions as >> > cassandra rows, and my cluster keeps this information cached, distributed >> > and replicated in order to answer queries. >> > >> > Every couple of days I think to myself this should really exist in C*. >> > Given all the benifits would there be any interest in >> > reviving CASSANDRA-1472? >> > >> > Some downsides are that this is very memory intensive, even for sparse >> > bitmaps. >> > >> >> >> >> -- >> Jonathan Ellis >> Project Chair, Apache Cassandra >> co-founder, http://www.datastax.com >> @spyced >> -- Jonathan Ellis Project Chair, Apache Cassandra co-founder, http://www.datastax.com @spyced