If you're worried about management overhead, you may find Brisk interesting. This is a DataStax product which combines Cassandra 0.8 with Hadoop and Hive in an easy-to-use package. I have no Java expertise but I find it very easy to get along with. I think it will come down to the data model as to whether Cassandra's column-based store suits your use case better.
http://www.datastax.com/docs/0.8/brisk/about_brisk Dave On 29 June 2011 12:46, Evans, Matthew <mev...@verivue.com> wrote: > Hi, > > LevelDB seems very interesting. Looking at the specs it appears that it > should be more than fast enough. > > Thanks > > Matt > > ________________________________________ > From: David Smith [diz...@basho.com] > Sent: Tuesday, June 28, 2011 10:04 PM > To: Evans, Matthew > Cc: riak-users@lists.basho.com > Subject: Re: Riak or Cassandra for this... > > On Tue, Jun 28, 2011 at 9:17 AM, Evans, Matthew <mev...@verivue.com> > wrote: > > > > The disadvantages of Riak are: > > > > 1) Write performance. We need to handle ~50,000 writes per second. > > I would note that we are investigating the use of LevelDB in lieu of > InnoDB due to superior write performance and better crash recovery. > 50k writes per second is pretty steep, but my belief (although you > should only trust your own benchmarking!) is that LevelDB will get you > close with sufficient nodes. > > > I would recommend running our client app from within the same Erlang VM > as Riak so hopefully we can gain something here. Alternatively use innostore > Erlang API directly for writes. > > I would recommend against running in the same Erlang VM, if at all > possible. Running within the same Erlang VM makes it much more > difficult for you to distinguish performance/latency characteristics > of your app versus Riak. It also means that your app or Riak could > adversely affect one another if they cause the VM to crash; you limit > possibilities for a robust system. We encourage people to use the > Protocol Buffers (PBC) API if performance/latency is a concern -- odds > are you'll spend more time waiting on disk I/O than messaging between > two VMs on a loopback socket once the dataset gets big. > > Of course, there are always points in time where it's ok to break the > rules -- engineering is the art of tradeoffs, after all. :) I would > just suggest that you shouldn't start with a single VM -- measure > first performance over PBC and see where you can get. > > > Questions: > > > > 1) Is Riak a good database for this application? > > There are certainly people who are using Riak for large, time-ordered > datasets and are (insofar as I know) generally happy with their choice > -- for all the advantages you listed earlier. From a features > standpoint, we are also working on a number of features that would > further improve Riak's performance and applicability to this domain > (improvements to MapReduce on large datasets, secondary indexing for > more efficient range searches, availability of LevelDB, etc). > > > > > 2) Can we write to InnoDB directly and still leverage the map-reduce > queries on the data? > > Nope, sorry. Riak does essential coordination between nodes and needs > to manage the data storage aspect to ensure predictable semantics. > > Hope that helps, > > D. > > -- > Dave Smith > Director, Engineering > Basho Technologies, Inc. > diz...@basho.com > > _______________________________________________ > riak-users mailing list > riak-users@lists.basho.com > http://lists.basho.com/mailman/listinfo/riak-users_lists.basho.com >
_______________________________________________ riak-users mailing list riak-users@lists.basho.com http://lists.basho.com/mailman/listinfo/riak-users_lists.basho.com