I'm using Cassandra as a big graph database, loading large volumes of data live 
and linking on the fly. 
The number of edges grow geometrically with data added, and need to be read to 
continue linking the graph on the fly. 


Consequently, my problem is constrained by:
 * Predominantly read - especially when data gets large and reads are quasi 
random
 * I have lots of data to plow in, to be read
 * Although the problem scale out and possibly all be in RAM, it requires too 
much kit for the to be viable 

So, my findings with Cassandra are:
 * Compaction is expensive, I need it but
   1) It takes away disk IO from my reads
   2) Destroys the file cache
   I've not had chance to do extensive tests with the Level db compaction
 * Compaction has been too hard to configure historically
 * Memory hungry

So for me the biggest features would be
 * Cheaper compaction -   
 * Lower memory usage
 * Indexing dynamic colnames (eg Lucene TermEnum against rowkey:colkey)
   I do a lot of checking against dynamic colnames  
 
The great features are that redundancy, and live addition of shards is 
available out of the box. 


I've also experimented with Golden Orb and Triggered updates, I think there is 
a fair bit that can be achieved in my problem with local data access. Through 
GoldenOrb and Hadoop writables a managed to get both a BigTable and Pregel 
access model onto my Cassandra data. It was schema specific, but provided a 
local compute model. 

p 


________________________________
From: Jonathan Ellis <jbel...@gmail.com>
To: user <user@cassandra.apache.org>
Sent: Tuesday, 1 November 2011, 22:59
Subject: Second Cassandra users survey

Hi all,

Two years ago I asked for Cassandra use cases and feature requests.
[1]  The results [2] have been extremely useful in setting and
prioritizing goals for Cassandra development.  But with the release of
1.0 we've accomplished basically everything from our original wish
list. [3]

I'd love to hear from modern Cassandra users again, especially if
you're usually a quiet lurker.  What does Cassandra do well?  What are
your pain points?  What's your feature wish list?

As before, if you're in stealth mode or don't want to say anything in
public, feel free to reply to me privately and I will keep it off the
record.

[1] http://www.mail-archive.com/cassandra-dev@incubator.apache.org/msg01148.html
[2] 
http://www.mail-archive.com/cassandra-user@incubator.apache.org/msg01446.html
[3] http://www.mail-archive.com/dev@cassandra.apache.org/msg01524.html

-- 
Jonathan Ellis
Project Chair, Apache Cassandra
co-founder of DataStax, the source for professional Cassandra support
http://www.datastax.com

Reply via email to