Re: How to model data to achieve specific data locality

2014-12-09 Thread Kai Wang
Some of the sequences grow so fast that sub-partition is inevitable. I may need to try different bucket sizes to get the optimal throughput. Thank you all for the advice. On Mon, Dec 8, 2014 at 9:55 AM, Eric Stevens wrote: > The upper bound for the data size of a single column is 2GB, and the up

Re: Cassandra Files Taking up Much More Space than CF

2014-12-09 Thread Ian Rose
Try `nodetool clearsnapshot` which will delete any snapshots you have. I have never taken a snapshot with nodetool yet I found several snapshots on my disk recently (which can take a lot of space). So perhaps they are automatically generated by some operation? No idea. Regardless, nuking those

Re: Cassandra Files Taking up Much More Space than CF

2014-12-09 Thread Nate Yoder
Hi Ian, Thanks for the suggestion but I had actually already done that prior to the scenario I described (to get myself some free space) and when I ran nodetool cfstats it listed 0 snapshots as expected, so unfortunately I don't think that is where my space went. One additional piece of informati

Re: Cassandra Files Taking up Much More Space than CF

2014-12-09 Thread Jonathan Haddad
You don't need a prime number of nodes in your ring, but it's not a bad idea to it be a multiple of your RF when your cluster is small. On Tue Dec 09 2014 at 8:29:35 AM Nate Yoder wrote: > Hi Ian, > > Thanks for the suggestion but I had actually already done that prior to > the scenario I descr

Re: Cassandra Files Taking up Much More Space than CF

2014-12-09 Thread Nate Yoder
Thanks Jonathan. So there is nothing too idiotic about my current set-up with 6 boxes each with 256 vnodes each and a RF of 2? I appreciate the help, Nate -- *Nathanael Yoder* Principal Engineer & Data Scientist, Whistle 415-944-7344 // n...@whistle.com On Tue, Dec 9, 2014 at 8:31 AM, Jonatha

Re: Cassandra Files Taking up Much More Space than CF

2014-12-09 Thread Jonathan Haddad
Well, I personally don't like RF=2. It means if you're using CL=QUORUM and a node goes down, you're going to have a bad time. (downtime) If you're using CL=ONE then you'd be ok. However, I am not wild about losing a node and having only 1 copy of my data available in prod. On Tue Dec 09 2014 at

Re: Cassandra Files Taking up Much More Space than CF

2014-12-09 Thread Nate Yoder
Thanks for the advice. Totally makes sense. Once I figure out how to make my data stop taking up more than 2x more space without being useful I'll definitely make the change :) Nate -- *Nathanael Yoder* Principal Engineer & Data Scientist, Whistle 415-944-7344 // n...@whistle.com On Tue, Dec

Re: Cassandra Files Taking up Much More Space than CF

2014-12-09 Thread Reynald Bourtembourg
Hi Nate, Are you using incremental backups? Extract from the documentation ( http://www.datastax.com/documentation/cassandra/2.1/cassandra/operations/ops_backup_incremental_t.html ): /When incremental backups are enabled (disabled by default), Cassandra hard-links each flushed SSTable to a

Observations/concerns with repair and hinted handoff

2014-12-09 Thread Robert Wille
I have spent a lot of time working with single-node, RF=1 clusters in my development. Before I deploy a cluster to our live environment, I have spent some time learning how to work with a multi-node cluster with RF=3. There were some surprises. I’m wondering if people here can enlighten me. I do

Re: Cassandra Files Taking up Much More Space than CF

2014-12-09 Thread Nate Yoder
Hi Reynald, Good idea but I have incremental backups turned off and other than *.db files nothing else appears to be in the data directory for that table. Is there any other output that would be helpful in helping you all help me? Thanks, Nate -- *Nathanael Yoder* Principal Engineer & Data Scie

Re: Cassandra Files Taking up Much More Space than CF

2014-12-09 Thread Nate Yoder
Hi All, Thanks for the help but after yet another day of investigation I think I might be running into this https://issues.apache.org/jira/browse/CASSANDRA-8061 issue where tmplink files aren't removed until Cassandra is restarted. Thanks again for all the suggestions! Nate -- *Nathanael Yoder*

Best practice for emulating a Cassandra timeout during unit tests?

2014-12-09 Thread Clint Kelly
Hi all, I'd like to write some tests for my code that uses the Cassandra Java driver to see how it behaves if there is a read timeout while accessing Cassandra. Is there a best-practice for getting this done? I was thinking about adjusting the settings in the cluster builder to adjust the timeou

Re: Cassandra Files Taking up Much More Space than CF

2014-12-09 Thread Robert Coli
On Mon, Dec 8, 2014 at 5:12 PM, Nate Yoder wrote: > I am currently running a 6 node Cassandra 2.1.1 cluster on EC2 using > C3.2XLarge nodes which overall is working very well for us. However, after > letting it run for a while I seem to get into a situation where the amount > of disk space used

Re: Cassandra Files Taking up Much More Space than CF

2014-12-09 Thread Nate Yoder
Thanks Rob. Definitely good advice that I wish I had come across a couple of months ago... That said, it still definitely points me in the right direction as to what to do now. -- *Nathanael Yoder* Principal Engineer & Data Scientist, Whistle 415-944-7344 // n...@whistle.com On Tue, Dec 9, 2014

upgrade cassandra from 2.0.6 to 2.1.2

2014-12-09 Thread wyang
I looked some upgrade documentations and am a little puzzled. According tohttps://github.com/apache/cassandra/blob/cassandra-2.1/NEWS.txt, “Rolling upgrades from anything pre-2.0.7 is not supported”. It means we should upgrade to 2.0.7 or later first? Can we rolling upgrade to 2.0.7? Do we need

Cassandra Maintenance Best practices

2014-12-09 Thread Neha Trivedi
Hi, We have Two Node Cluster Configuration in production with RF=2. Which means that the data is written in both the clusters and it's running for about a month now and has good amount of data. Questions? 1. What are the best practices for maintenance? 2. Is OPScenter required to be installed or

[Cassandra][SStableLoader Out of Heap Memory]

2014-12-09 Thread 严超
Hi, Everyone: I'm importing a CSV file into Cassandra using SStableLoader. And I'm following the example here: https://github.com/yukim/cassandra-bulkload-example/ When i try to run the sstableloader, it fails with OOM. I also changed the sstableloader.sh script (that runs t

Re: upgrade cassandra from 2.0.6 to 2.1.2

2014-12-09 Thread Jonathan Haddad
Yes. It is, in general, a best practice to upgrade to the latest bug fix release before doing an upgrade to the next point release. On Tue Dec 09 2014 at 6:58:24 PM wyang wrote: > I looked some upgrade documentations and am a little puzzled. > > > According to > https://github.com/apache/cassan

Re: Cassandra Maintenance Best practices

2014-12-09 Thread Jonathan Haddad
I did a presentation on diagnosing performance problems in production at the US & Euro summits, in which I covered quite a few tools & preventative measures you should know when running a production cluster. You may find it useful: http://rustyrazorblade.com/2014/09/cassandra-summit-recap-diagnosi