Re: Cassandra Files Taking up Much More Space than CF

Nate Yoder Tue, 09 Dec 2014 09:14:30 -0800

Thanks for the advice.  Totally makes sense.  Once I figure out how to make
my data stop taking up more than 2x more space without being useful I'll
definitely make the change :)


Nate



--
*Nathanael Yoder*
Principal Engineer & Data Scientist, Whistle
415-944-7344 // n...@whistle.com

On Tue, Dec 9, 2014 at 9:02 AM, Jonathan Haddad <j...@jonhaddad.com> wrote:

> Well, I personally don't like RF=2.  It means if you're using CL=QUORUM
> and a node goes down, you're going to have a bad time. (downtime) If you're
> using CL=ONE then you'd be ok.  However, I am not wild about losing a node
> and having only 1 copy of my data available in prod.
>
>
> On Tue Dec 09 2014 at 8:40:37 AM Nate Yoder <n...@whistle.com> wrote:
>
>> Thanks Jonathan.  So there is nothing too idiotic about my current set-up
>> with 6 boxes each with 256 vnodes each and a RF of 2?
>>
>> I appreciate the help,
>> Nate
>>
>>
>>
>> --
>> *Nathanael Yoder*
>> Principal Engineer & Data Scientist, Whistle
>> 415-944-7344 // n...@whistle.com
>>
>> On Tue, Dec 9, 2014 at 8:31 AM, Jonathan Haddad <j...@jonhaddad.com>
>> wrote:
>>
>>> You don't need a prime number of nodes in your ring, but it's not a bad
>>> idea to it be a multiple of your RF when your cluster is small.
>>>
>>>
>>> On Tue Dec 09 2014 at 8:29:35 AM Nate Yoder <n...@whistle.com> wrote:
>>>
>>>> Hi Ian,
>>>>
>>>> Thanks for the suggestion but I had actually already done that prior to
>>>> the scenario I described (to get myself some free space) and when I ran
>>>> nodetool cfstats it listed 0 snapshots as expected, so unfortunately I
>>>> don't think that is where my space went.
>>>>
>>>> One additional piece of information I forgot to point out is that when
>>>> I ran nodetool status on the node it included all 6 nodes.
>>>>
>>>> I have also heard it mentioned that I may want to have a prime number
>>>> of nodes which may help protect against split-brain.  Is this true?  If so
>>>> does it still apply when I am using vnodes?
>>>>
>>>> Thanks again,
>>>> Nate
>>>>
>>>> --
>>>> *Nathanael Yoder*
>>>> Principal Engineer & Data Scientist, Whistle
>>>> 415-944-7344 // n...@whistle.com
>>>>
>>>> On Tue, Dec 9, 2014 at 7:42 AM, Ian Rose <ianr...@fullstory.com> wrote:
>>>>
>>>>> Try `nodetool clearsnapshot` which will delete any snapshots you
>>>>> have.  I have never taken a snapshot with nodetool yet I found several
>>>>> snapshots on my disk recently (which can take a lot of space).  So perhaps
>>>>> they are automatically generated by some operation?  No idea.  Regardless,
>>>>> nuking those freed up a ton of space for me.
>>>>>
>>>>> - Ian
>>>>>
>>>>>
>>>>> On Mon, Dec 8, 2014 at 8:12 PM, Nate Yoder <n...@whistle.com> wrote:
>>>>>
>>>>>> Hi All,
>>>>>>
>>>>>> I am new to Cassandra so I apologise in advance if I have missed
>>>>>> anything obvious but this one currently has me stumped.
>>>>>>
>>>>>> I am currently running a 6 node Cassandra 2.1.1 cluster on EC2 using
>>>>>> C3.2XLarge nodes which overall is working very well for us.  However, 
>>>>>> after
>>>>>> letting it run for a while I seem to get into a situation where the 
>>>>>> amount
>>>>>> of disk space used far exceeds the total amount of data on each node and 
>>>>>> I
>>>>>> haven't been able to get the size to go back down except by stopping and
>>>>>> restarting the node.
>>>>>>
>>>>>> For example, in my data I have almost all of my data in one table.
>>>>>> On one of my nodes right now the total space used (as reported by 
>>>>>> nodetool
>>>>>> cfstats) is 57.2 GB and there are no snapshots. However, when I look at 
>>>>>> the
>>>>>> size of the data files (using du) the data file for that table is 107GB.
>>>>>> Because the C3.2XLarge only have 160 GB of SSD you can see why this 
>>>>>> quickly
>>>>>> becomes a problem.
>>>>>>
>>>>>> Running nodetool compact didn't reduce the size and neither does
>>>>>> running nodetool repair -pr on the node.  I also tried nodetool flush and
>>>>>> nodetool cleanup (even though I have not added or removed any nodes
>>>>>> recently) but it didn't change anything either.  In order to keep my
>>>>>> cluster up I then stopped and started that node and the size of the data
>>>>>> file dropped to 54GB while the total column family size (as reported by
>>>>>> nodetool) stayed about the same.
>>>>>>
>>>>>> Any suggestions as to what I could be doing wrong?
>>>>>>
>>>>>> Thanks,
>>>>>> Nate
>>>>>>
>>>>>
>>>>>
>>>>
>>

Re: Cassandra Files Taking up Much More Space than CF

Reply via email to