How does this compare with Druid? https://github.com/metamx/druid
We're currently evaluating Acunu, Vertica and Druid... http://brianoneill.blogspot.com/2013/04/bianalytics-on-big-datacassandra.html With its bitmapped indexes, Druid appears to have the most potential. They boast some pretty impressive stats, especially WRT handling "real-time" updates and adding new dimensions. They also use a compression algorithm, CONCISE, to cut down on the space requirements. http://ricerca.mat.uniroma3.it/users/colanton/concise.html I haven't looked too deep into the Druid code, but I've been meaning to see if it could be backed by C*. We'd be game to join the hunt if you pursue such a beast. (with your code, or with portions of Druid) -brian On Apr 10, 2013, at 5:40 PM, mrevilgnome wrote: > What do you think about set manipulation via indexes in Cassandra? I'm > interested in answering queries such as give me all users that performed > event 1, 2, and 3, but not 4. If the answer is yes than I can make a case > for spending my time on C*. The only downside for us would be our current > prototype is in C++ so we would loose some performance and the ability to > dedicate an entire machine to caching/performing queries. > > > On Wed, Apr 10, 2013 at 11:57 AM, Jonathan Ellis <jbel...@gmail.com> wrote: > >> If you mean, "Can someone help me figure out how to get started updating >> these old patches to trunk and cleaning out the Avro?" then yes, I've been >> knee-deep in indexing code recently. >> >> >> On Wed, Apr 10, 2013 at 11:34 AM, mrevilgnome <mrevilgn...@gmail.com> >> wrote: >> >>> I'm currently building a distributed cluster on top of cassandra to >> perform >>> fast set manipulation via bitmap indexes. This gives me the ability to >>> perform unions, intersections, and set subtraction across sub-queries. >>> Currently I'm storing index information for thousands of dimensions as >>> cassandra rows, and my cluster keeps this information cached, distributed >>> and replicated in order to answer queries. >>> >>> Every couple of days I think to myself this should really exist in C*. >>> Given all the benifits would there be any interest in >>> reviving CASSANDRA-1472? >>> >>> Some downsides are that this is very memory intensive, even for sparse >>> bitmaps. >>> >> >> >> >> -- >> Jonathan Ellis >> Project Chair, Apache Cassandra >> co-founder, http://www.datastax.com >> @spyced >> -- Brian ONeill Lead Architect, Health Market Science (http://healthmarketscience.com) mobile:215.588.6024 blog: http://weblogs.java.net/blog/boneill42/ blog: http://brianoneill.blogspot.com/