Thanks Jonathan. So there is nothing too idiotic about my current set-up with 6 boxes each with 256 vnodes each and a RF of 2?
I appreciate the help, Nate -- *Nathanael Yoder* Principal Engineer & Data Scientist, Whistle 415-944-7344 // n...@whistle.com On Tue, Dec 9, 2014 at 8:31 AM, Jonathan Haddad <j...@jonhaddad.com> wrote: > You don't need a prime number of nodes in your ring, but it's not a bad > idea to it be a multiple of your RF when your cluster is small. > > > On Tue Dec 09 2014 at 8:29:35 AM Nate Yoder <n...@whistle.com> wrote: > >> Hi Ian, >> >> Thanks for the suggestion but I had actually already done that prior to >> the scenario I described (to get myself some free space) and when I ran >> nodetool cfstats it listed 0 snapshots as expected, so unfortunately I >> don't think that is where my space went. >> >> One additional piece of information I forgot to point out is that when I >> ran nodetool status on the node it included all 6 nodes. >> >> I have also heard it mentioned that I may want to have a prime number of >> nodes which may help protect against split-brain. Is this true? If so >> does it still apply when I am using vnodes? >> >> Thanks again, >> Nate >> >> -- >> *Nathanael Yoder* >> Principal Engineer & Data Scientist, Whistle >> 415-944-7344 // n...@whistle.com >> >> On Tue, Dec 9, 2014 at 7:42 AM, Ian Rose <ianr...@fullstory.com> wrote: >> >>> Try `nodetool clearsnapshot` which will delete any snapshots you have. >>> I have never taken a snapshot with nodetool yet I found several snapshots >>> on my disk recently (which can take a lot of space). So perhaps they are >>> automatically generated by some operation? No idea. Regardless, nuking >>> those freed up a ton of space for me. >>> >>> - Ian >>> >>> >>> On Mon, Dec 8, 2014 at 8:12 PM, Nate Yoder <n...@whistle.com> wrote: >>> >>>> Hi All, >>>> >>>> I am new to Cassandra so I apologise in advance if I have missed >>>> anything obvious but this one currently has me stumped. >>>> >>>> I am currently running a 6 node Cassandra 2.1.1 cluster on EC2 using >>>> C3.2XLarge nodes which overall is working very well for us. However, after >>>> letting it run for a while I seem to get into a situation where the amount >>>> of disk space used far exceeds the total amount of data on each node and I >>>> haven't been able to get the size to go back down except by stopping and >>>> restarting the node. >>>> >>>> For example, in my data I have almost all of my data in one table. On >>>> one of my nodes right now the total space used (as reported by nodetool >>>> cfstats) is 57.2 GB and there are no snapshots. However, when I look at the >>>> size of the data files (using du) the data file for that table is 107GB. >>>> Because the C3.2XLarge only have 160 GB of SSD you can see why this quickly >>>> becomes a problem. >>>> >>>> Running nodetool compact didn't reduce the size and neither does >>>> running nodetool repair -pr on the node. I also tried nodetool flush and >>>> nodetool cleanup (even though I have not added or removed any nodes >>>> recently) but it didn't change anything either. In order to keep my >>>> cluster up I then stopped and started that node and the size of the data >>>> file dropped to 54GB while the total column family size (as reported by >>>> nodetool) stayed about the same. >>>> >>>> Any suggestions as to what I could be doing wrong? >>>> >>>> Thanks, >>>> Nate >>>> >>> >>> >>