Re: Cassandra Maintenance Best practices

2014-12-09 Thread Jonathan Haddad
I did a presentation on diagnosing performance problems in production at the US & Euro summits, in which I covered quite a few tools & preventative measures you should know when running a production cluster. You may find it useful: http://rustyrazorblade.com/2014/09/cassandra-summit-recap-diagnosi

Re: upgrade cassandra from 2.0.6 to 2.1.2

2014-12-09 Thread Jonathan Haddad
Yes. It is, in general, a best practice to upgrade to the latest bug fix release before doing an upgrade to the next point release. On Tue Dec 09 2014 at 6:58:24 PM wyang wrote: > I looked some upgrade documentations and am a little puzzled. > > > According to > https://github.com/apache/cassan

[Cassandra][SStableLoader Out of Heap Memory]

2014-12-09 Thread 严超
Hi, Everyone: I'm importing a CSV file into Cassandra using SStableLoader. And I'm following the example here: https://github.com/yukim/cassandra-bulkload-example/ When i try to run the sstableloader, it fails with OOM. I also changed the sstableloader.sh script (that runs t

Cassandra Maintenance Best practices

2014-12-09 Thread Neha Trivedi
Hi, We have Two Node Cluster Configuration in production with RF=2. Which means that the data is written in both the clusters and it's running for about a month now and has good amount of data. Questions? 1. What are the best practices for maintenance? 2. Is OPScenter required to be installed or

upgrade cassandra from 2.0.6 to 2.1.2

2014-12-09 Thread wyang
I looked some upgrade documentations and am a little puzzled. According tohttps://github.com/apache/cassandra/blob/cassandra-2.1/NEWS.txt, “Rolling upgrades from anything pre-2.0.7 is not supported”. It means we should upgrade to 2.0.7 or later first? Can we rolling upgrade to 2.0.7? Do we need

Re: Cassandra Files Taking up Much More Space than CF

2014-12-09 Thread Nate Yoder
Thanks Rob. Definitely good advice that I wish I had come across a couple of months ago... That said, it still definitely points me in the right direction as to what to do now. -- *Nathanael Yoder* Principal Engineer & Data Scientist, Whistle 415-944-7344 // n...@whistle.com On Tue, Dec 9, 2014

Re: Cassandra Files Taking up Much More Space than CF

2014-12-09 Thread Robert Coli
On Mon, Dec 8, 2014 at 5:12 PM, Nate Yoder wrote: > I am currently running a 6 node Cassandra 2.1.1 cluster on EC2 using > C3.2XLarge nodes which overall is working very well for us. However, after > letting it run for a while I seem to get into a situation where the amount > of disk space used

Best practice for emulating a Cassandra timeout during unit tests?

2014-12-09 Thread Clint Kelly
Hi all, I'd like to write some tests for my code that uses the Cassandra Java driver to see how it behaves if there is a read timeout while accessing Cassandra. Is there a best-practice for getting this done? I was thinking about adjusting the settings in the cluster builder to adjust the timeou

Re: Cassandra Files Taking up Much More Space than CF

2014-12-09 Thread Nate Yoder
Hi All, Thanks for the help but after yet another day of investigation I think I might be running into this https://issues.apache.org/jira/browse/CASSANDRA-8061 issue where tmplink files aren't removed until Cassandra is restarted. Thanks again for all the suggestions! Nate -- *Nathanael Yoder*

Re: Cassandra Files Taking up Much More Space than CF

2014-12-09 Thread Nate Yoder
Hi Reynald, Good idea but I have incremental backups turned off and other than *.db files nothing else appears to be in the data directory for that table. Is there any other output that would be helpful in helping you all help me? Thanks, Nate -- *Nathanael Yoder* Principal Engineer & Data Scie

Observations/concerns with repair and hinted handoff

2014-12-09 Thread Robert Wille
I have spent a lot of time working with single-node, RF=1 clusters in my development. Before I deploy a cluster to our live environment, I have spent some time learning how to work with a multi-node cluster with RF=3. There were some surprises. I’m wondering if people here can enlighten me. I do

Re: Cassandra Files Taking up Much More Space than CF

2014-12-09 Thread Reynald Bourtembourg
Hi Nate, Are you using incremental backups? Extract from the documentation ( http://www.datastax.com/documentation/cassandra/2.1/cassandra/operations/ops_backup_incremental_t.html ): /When incremental backups are enabled (disabled by default), Cassandra hard-links each flushed SSTable to a

Re: Cassandra Files Taking up Much More Space than CF

2014-12-09 Thread Nate Yoder
Thanks for the advice. Totally makes sense. Once I figure out how to make my data stop taking up more than 2x more space without being useful I'll definitely make the change :) Nate -- *Nathanael Yoder* Principal Engineer & Data Scientist, Whistle 415-944-7344 // n...@whistle.com On Tue, Dec

Re: Cassandra Files Taking up Much More Space than CF

2014-12-09 Thread Jonathan Haddad
Well, I personally don't like RF=2. It means if you're using CL=QUORUM and a node goes down, you're going to have a bad time. (downtime) If you're using CL=ONE then you'd be ok. However, I am not wild about losing a node and having only 1 copy of my data available in prod. On Tue Dec 09 2014 at

Re: Cassandra Files Taking up Much More Space than CF

2014-12-09 Thread Nate Yoder
Thanks Jonathan. So there is nothing too idiotic about my current set-up with 6 boxes each with 256 vnodes each and a RF of 2? I appreciate the help, Nate -- *Nathanael Yoder* Principal Engineer & Data Scientist, Whistle 415-944-7344 // n...@whistle.com On Tue, Dec 9, 2014 at 8:31 AM, Jonatha

Re: Cassandra Files Taking up Much More Space than CF

2014-12-09 Thread Jonathan Haddad
You don't need a prime number of nodes in your ring, but it's not a bad idea to it be a multiple of your RF when your cluster is small. On Tue Dec 09 2014 at 8:29:35 AM Nate Yoder wrote: > Hi Ian, > > Thanks for the suggestion but I had actually already done that prior to > the scenario I descr

Re: Cassandra Files Taking up Much More Space than CF

2014-12-09 Thread Nate Yoder
Hi Ian, Thanks for the suggestion but I had actually already done that prior to the scenario I described (to get myself some free space) and when I ran nodetool cfstats it listed 0 snapshots as expected, so unfortunately I don't think that is where my space went. One additional piece of informati

Re: Cassandra Files Taking up Much More Space than CF

2014-12-09 Thread Ian Rose
Try `nodetool clearsnapshot` which will delete any snapshots you have. I have never taken a snapshot with nodetool yet I found several snapshots on my disk recently (which can take a lot of space). So perhaps they are automatically generated by some operation? No idea. Regardless, nuking those

Re: How to model data to achieve specific data locality

2014-12-09 Thread Kai Wang
Some of the sequences grow so fast that sub-partition is inevitable. I may need to try different bucket sizes to get the optimal throughput. Thank you all for the advice. On Mon, Dec 8, 2014 at 9:55 AM, Eric Stevens wrote: > The upper bound for the data size of a single column is 2GB, and the up