Cassandra is an excellent choice for write heavy applications. Reading large sets of data is not as fast and not as easy, you may need to have your client paging thru it and you may need slice queries and proper PK+Indexes to think of in advance.
Regards, Arthur From: Carl Lerche Sent: Thursday, August 01, 2013 3:03 PM To: user@cassandra.apache.org ; Arthur Zubarev Subject: Re: How often to run `nodetool repair` Arthur, Yes, my use case for this Cassandra cluster is analytics. I am building a google dapper (application tracing) like system. I collect application traces and write them to Cassandra. Then, I have periodic rollup tasks that read the data, do some summarization and write it back. Thoughts on how to manage a write heavy cluster? Thanks, Carl On Thu, Aug 1, 2013 at 11:28 AM, Arthur Zubarev <arthur.zuba...@aol.com> wrote: Hi Carl, The ‘repair’ is for data reads. Compaction will take care of the expired data. The fact a repair runs long makes me think the nodes receive unbalanced amounts of writes rather. Regards, Arthur From: Carl Lerche Sent: Thursday, August 01, 2013 12:35 PM To: user@cassandra.apache.org Subject: How often to run `nodetool repair` Hello, I read in the docs that `nodetool repair` should be regularly run unless no delete is ever performed. In my app, I never delete, but I heavily use the ttl feature. Should repair still be run regularly? Also, does repair take less time if it is run regularly? If not, is there a way to incrementally run it? It seems that when I do run repair, it takes a long time and causes high amounts CPU usage and iowait. Thoughts? Thanks, Carl