On Mon, May 10, 2010 at 1:23 PM, Peter Hsu <pe...@motivecast.com> wrote: > Thanks for the response, Paul. > ... > > * Cassandra and its siblings are weak at ad hoc queries on tables > that you did not think to index in advance > > What is the normal way of dealing with this in Cassandra? Would you just > create a new "index" and bring a big honking machine to the table to process > all the existing data in the database and store the new "index"?
The latest version of Cassandra introduces a "map/reduce" paradigm which is the main tool you'd use for batch processing of data. You could either use that to DO your ad hoc query or to process the data into an index for more efficient ad hoc queries in the future. * http://en.wikipedia.org/wiki/MapReduce * http://en.wikipedia.org/wiki/Hadoop * http://architects.dzone.com/news/cassandra-adds-hadoop You can read criticisms of MapReduce in the first link there. > On May 10, 2010, at 11:22 AM, Paul Prescod wrote: > > This is a very, very big topic. For the most part, the issues are > covered in the various SQL versus NoSQL debates all over the Internet. > For example: > > * Cassandra and its NoSQL siblings have no concept of an in-database "join" > > * Cassandra and its NoSQL siblings do not allow you to update > multiple "tables" in a single transactions > > * Cassandra's API is specific to it, and not portable to any other data > store > > * Cassandra currently has simplistic facilities to deal with various > kinds of conflicting write. > > * Cassandra is strongly optimized for multiple machine distributions, > whereas relational databases tend to be optimized for a single > powerful machine. > > * Cassandra and its siblings are weak at ad hoc queries on tables > that you did not think to index in advance > > On Mon, May 10, 2010 at 11:06 AM, Peter Hsu <pe...@motivecast.com> wrote: > > I've seen a lot of threads and posts about why Cassandra is great. I'm > fairly sold on the features, and the few big deployments on Cassandra give > it a lot of credibility. > > However, I don't believe in magic bullets, so I really want to understand > the potential downsides of Cassandra. Right now, I don't really have a clue > as to what Cassandra is bad at. I took a look at > http://wiki.apache.org/cassandra/CassandraLimitations which is helpful, but > doesn't characterize its weaknesses in ways that I can really comprehend > until I've actually used Cassandra and understand some of the internals. It > seems that the community would benefit from being able to answer some of > these questions in terms of real world use cases. > > My main questions: > > * Are there designs in which a SQL database out-performs or out-scales > Cassandra? > > * Is there a pros vs cons page of Cassandra against an open source SQL > database (MySQL or Postgres)? > > I do plan on attending the training session next Friday in Palo Alto, but > it'd be great if I had some more food for thought before I attend. > > > >