Re: Cassandra is consuming a lot of disk space

Jan Kesten Wed, 13 Jan 2016 23:21:07 -0800

Hi Rahul,

just an idea, did you have a look at the data directorys on disk 
(/var/lib/cassandra/data)? It could be that there are some from old keyspaces 
that have been deleted and snapshoted before. Try something like "du -sh 
/var/lib/cassandra/data/*" to verify which keyspace is consuming your space.


Jan

Von meinem iPhone gesendet

> Am 14.01.2016 um 07:25 schrieb Rahul Ramesh <rr.ii...@gmail.com>:
> 
> Thanks for your suggestion. 
> 
> Compaction was happening on one of the large tables. The disk space did not 
> decrease much after the compaction. So I ran an external compaction. The disk 
> space decreased by around 10%. However it is still consuming close to 750Gb 
> for load of 250Gb. 
> 
> I even restarted cassandra thinking there may be some open files. However it 
> didnt help much. 
> 
> Is there any way to find out why so much of data is being consumed? 
> 
> I checked if there are any open files using lsof. There are not any open 
> files.
> 
> Recovery:
> Just a wild thought 
> I am using replication factor of 2 and I have two nodes. If I delete complete 
> data on one of the node, will I be able to recover all the data from the 
> active node? 
> I don't want to pursue this path as I want to find out the root cause of the 
> issue! 
> 
> 
> Any help will be greatly appreciated
> 
> Thank you,
> 
> Rahul
> 
> 
> 
> 
> 
> 
>> On Wed, Jan 13, 2016 at 3:37 PM, Carlos Rolo <r...@pythian.com> wrote:
>> You can check if the snapshot exists in the snapshot folder.
>> Repairs stream sstables over, than can temporary increase disk space. But I 
>> think Carlos Alonso might be correct. Running compactions might be the issue.
>> 
>> Regards,
>> 
>> Carlos Juzarte Rolo
>> Cassandra Consultant
>>  
>> Pythian - Love your data
>> 
>> rolo@pythian | Twitter: @cjrolo | Linkedin: linkedin.com/in/carlosjuzarterolo
>> Mobile: +351 91 891 81 00 | Tel: +1 613 565 8696 x1649
>> www.pythian.com
>> 
>>> On Wed, Jan 13, 2016 at 9:24 AM, Carlos Alonso <i...@mrcalonso.com> wrote:
>>> I'd have a look also at possible running compactions.
>>> 
>>> If you have big column families with STCS then large compactions may be 
>>> happening.
>>> 
>>> Check it with nodetool compactionstats
>>> 
>>> Carlos Alonso | Software Engineer | @calonso
>>> 
>>>> On 13 January 2016 at 05:22, Kevin O'Connor <ke...@reddit.com> wrote:
>>>> Have you tried restarting? It's possible there's open file handles to 
>>>> sstables that have been compacted away. You can verify by doing lsof and 
>>>> grepping for DEL or deleted. 
>>>> 
>>>> If it's not that, you can run nodetool cleanup on each node to scan all of 
>>>> the sstables on disk and remove anything that it's not responsible for. 
>>>> Generally this would only work if you added nodes recently. 
>>>> 
>>>> 
>>>>> On Tuesday, January 12, 2016, Rahul Ramesh <rr.ii...@gmail.com> wrote:
>>>>> We have a 2 node Cassandra cluster with a replication factor of 2. 
>>>>> 
>>>>> The load factor on the nodes is around 350Gb
>>>>> 
>>>>> Datacenter: Cassandra
>>>>> ==========
>>>>> Address      Rack        Status State   Load            Owns              
>>>>>   Token                                       
>>>>>                                                                           
>>>>>   -5072018636360415943                        
>>>>> 172.31.7.91  rack1       Up     Normal  328.5 GB        100.00%           
>>>>>   -7068746880841807701                       
>>>>> 172.31.7.92  rack1       Up     Normal  351.7 GB        100.00%           
>>>>>   -5072018636360415943                        
>>>>> 
>>>>> However,if I use df -h, 
>>>>> 
>>>>> /dev/xvdf       252G  223G   17G  94% /HDD1
>>>>> /dev/xvdg       493G  456G   12G  98% /HDD2
>>>>> /dev/xvdh       197G  167G   21G  90% /HDD3
>>>>> 
>>>>> 
>>>>> HDD1,2,3 contains only cassandra data. It amounts to close to 1Tb in one 
>>>>> of the machine and in another machine it is close to 650Gb. 
>>>>> 
>>>>> I started repair 2 days ago, after running repair, the amount of disk 
>>>>> space consumption has actually increased. 
>>>>> I also checked if this is because of snapshots. nodetool listsnapshot 
>>>>> intermittently lists a snapshot but it goes away after sometime. 
>>>>> 
>>>>> Can somebody please help me understand, 
>>>>> 1. why so much disk space is consumed?
>>>>> 2. Why did it increase after repair?
>>>>> 3. Is there any way to recover from this state.
>>>>> 
>>>>> 
>>>>> Thanks,
>>>>> Rahul
>> 
>> 
>> --
>> 
>

Re: Cassandra is consuming a lot of disk space

Reply via email to