Re: Riak or Cassandra for this...

Dave Gardner Wed, 29 Jun 2011 05:30:41 -0700

If you're worried about management overhead, you may find Brisk interesting.
This is a DataStax product which combines Cassandra 0.8 with Hadoop and Hive
in an easy-to-use package. I have no Java expertise but I find it very easy
to get along with. I think it will come down to the data model as to whether
Cassandra's column-based store suits your use case better.


http://www.datastax.com/docs/0.8/brisk/about_brisk

Dave

On 29 June 2011 12:46, Evans, Matthew <mev...@verivue.com> wrote:

> Hi,
>
> LevelDB seems very interesting. Looking at the specs it appears that it
> should be more than fast enough.
>
> Thanks
>
> Matt
>
> ________________________________________
> From: David Smith [diz...@basho.com]
> Sent: Tuesday, June 28, 2011 10:04 PM
> To: Evans, Matthew
> Cc: riak-users@lists.basho.com
> Subject: Re: Riak or Cassandra for this...
>
> On Tue, Jun 28, 2011 at 9:17 AM, Evans, Matthew <mev...@verivue.com>
> wrote:
> >
> > The disadvantages of Riak are:
> >
> > 1) Write performance. We need to handle ~50,000 writes per second.
>
> I would note that we are investigating the use of LevelDB in lieu of
> InnoDB due to superior write performance and better crash recovery.
> 50k writes per second is pretty steep, but my belief (although you
> should only trust your own benchmarking!) is that LevelDB will get you
> close with sufficient nodes.
>
> > I would recommend running our client app from within the same Erlang VM
> as Riak so hopefully we can gain something here. Alternatively use innostore
> Erlang API directly for writes.
>
> I would recommend against running in the same Erlang VM, if at all
> possible. Running within the same Erlang VM makes it much more
> difficult for you to distinguish performance/latency characteristics
> of your app versus Riak. It also means that your app or Riak could
> adversely affect one another if they cause the VM to crash; you limit
> possibilities for a robust system. We encourage people to use the
> Protocol Buffers (PBC) API if performance/latency is a concern -- odds
> are you'll spend more time waiting on disk I/O than messaging between
> two VMs on a loopback socket once the dataset gets big.
>
> Of course, there are always points in time where it's ok to break the
> rules -- engineering is the art of tradeoffs, after all. :) I would
> just suggest that you shouldn't start with a single VM -- measure
> first performance over PBC and see where you can get.
>
> > Questions:
> >
> > 1) Is Riak a good database for this application?
>
> There are certainly people who are using Riak for large, time-ordered
> datasets and are (insofar as I know) generally happy with their choice
> -- for all the advantages you listed earlier. From a features
> standpoint, we are also working on a number of features that would
> further improve Riak's performance and applicability to this domain
> (improvements to MapReduce on large datasets, secondary indexing for
> more efficient range searches, availability of LevelDB, etc).
>
> >
> > 2) Can we write to InnoDB directly and still leverage the map-reduce
> queries on the data?
>
> Nope, sorry. Riak does essential coordination between nodes and needs
> to manage the data storage aspect to ensure predictable semantics.
>
> Hope that helps,
>
> D.
>
> --
> Dave Smith
> Director, Engineering
> Basho Technologies, Inc.
> diz...@basho.com
>
> _______________________________________________
> riak-users mailing list
> riak-users@lists.basho.com
> http://lists.basho.com/mailman/listinfo/riak-users_lists.basho.com
>

_______________________________________________
riak-users mailing list
riak-users@lists.basho.com
http://lists.basho.com/mailman/listinfo/riak-users_lists.basho.com

Re: Riak or Cassandra for this...

Reply via email to