Thanks for the advice. Totally makes sense. Once I figure out how to make my data stop taking up more than 2x more space without being useful I'll definitely make the change :)
Nate -- *Nathanael Yoder* Principal Engineer & Data Scientist, Whistle 415-944-7344 // n...@whistle.com On Tue, Dec 9, 2014 at 9:02 AM, Jonathan Haddad <j...@jonhaddad.com> wrote: > Well, I personally don't like RF=2. It means if you're using CL=QUORUM > and a node goes down, you're going to have a bad time. (downtime) If you're > using CL=ONE then you'd be ok. However, I am not wild about losing a node > and having only 1 copy of my data available in prod. > > > On Tue Dec 09 2014 at 8:40:37 AM Nate Yoder <n...@whistle.com> wrote: > >> Thanks Jonathan. So there is nothing too idiotic about my current set-up >> with 6 boxes each with 256 vnodes each and a RF of 2? >> >> I appreciate the help, >> Nate >> >> >> >> -- >> *Nathanael Yoder* >> Principal Engineer & Data Scientist, Whistle >> 415-944-7344 // n...@whistle.com >> >> On Tue, Dec 9, 2014 at 8:31 AM, Jonathan Haddad <j...@jonhaddad.com> >> wrote: >> >>> You don't need a prime number of nodes in your ring, but it's not a bad >>> idea to it be a multiple of your RF when your cluster is small. >>> >>> >>> On Tue Dec 09 2014 at 8:29:35 AM Nate Yoder <n...@whistle.com> wrote: >>> >>>> Hi Ian, >>>> >>>> Thanks for the suggestion but I had actually already done that prior to >>>> the scenario I described (to get myself some free space) and when I ran >>>> nodetool cfstats it listed 0 snapshots as expected, so unfortunately I >>>> don't think that is where my space went. >>>> >>>> One additional piece of information I forgot to point out is that when >>>> I ran nodetool status on the node it included all 6 nodes. >>>> >>>> I have also heard it mentioned that I may want to have a prime number >>>> of nodes which may help protect against split-brain. Is this true? If so >>>> does it still apply when I am using vnodes? >>>> >>>> Thanks again, >>>> Nate >>>> >>>> -- >>>> *Nathanael Yoder* >>>> Principal Engineer & Data Scientist, Whistle >>>> 415-944-7344 // n...@whistle.com >>>> >>>> On Tue, Dec 9, 2014 at 7:42 AM, Ian Rose <ianr...@fullstory.com> wrote: >>>> >>>>> Try `nodetool clearsnapshot` which will delete any snapshots you >>>>> have. I have never taken a snapshot with nodetool yet I found several >>>>> snapshots on my disk recently (which can take a lot of space). So perhaps >>>>> they are automatically generated by some operation? No idea. Regardless, >>>>> nuking those freed up a ton of space for me. >>>>> >>>>> - Ian >>>>> >>>>> >>>>> On Mon, Dec 8, 2014 at 8:12 PM, Nate Yoder <n...@whistle.com> wrote: >>>>> >>>>>> Hi All, >>>>>> >>>>>> I am new to Cassandra so I apologise in advance if I have missed >>>>>> anything obvious but this one currently has me stumped. >>>>>> >>>>>> I am currently running a 6 node Cassandra 2.1.1 cluster on EC2 using >>>>>> C3.2XLarge nodes which overall is working very well for us. However, >>>>>> after >>>>>> letting it run for a while I seem to get into a situation where the >>>>>> amount >>>>>> of disk space used far exceeds the total amount of data on each node and >>>>>> I >>>>>> haven't been able to get the size to go back down except by stopping and >>>>>> restarting the node. >>>>>> >>>>>> For example, in my data I have almost all of my data in one table. >>>>>> On one of my nodes right now the total space used (as reported by >>>>>> nodetool >>>>>> cfstats) is 57.2 GB and there are no snapshots. However, when I look at >>>>>> the >>>>>> size of the data files (using du) the data file for that table is 107GB. >>>>>> Because the C3.2XLarge only have 160 GB of SSD you can see why this >>>>>> quickly >>>>>> becomes a problem. >>>>>> >>>>>> Running nodetool compact didn't reduce the size and neither does >>>>>> running nodetool repair -pr on the node. I also tried nodetool flush and >>>>>> nodetool cleanup (even though I have not added or removed any nodes >>>>>> recently) but it didn't change anything either. In order to keep my >>>>>> cluster up I then stopped and started that node and the size of the data >>>>>> file dropped to 54GB while the total column family size (as reported by >>>>>> nodetool) stayed about the same. >>>>>> >>>>>> Any suggestions as to what I could be doing wrong? >>>>>> >>>>>> Thanks, >>>>>> Nate >>>>>> >>>>> >>>>> >>>> >>