Re: Cassandra is consuming a lot of disk space

Rahul Ramesh Wed, 13 Jan 2016 22:27:01 -0800

Thanks for your suggestion.

Compaction was happening on one of the large tables. The disk space did not
decrease much after the compaction. So I ran an external compaction. The
disk space decreased by around 10%. However it is still consuming close to
750Gb for load of 250Gb.


I even restarted cassandra thinking there may be some open files. However
it didnt help much.

Is there any way to find out why so much of data is being consumed?

I checked if there are any open files using lsof. There are not any open
files.

*Recovery:*
Just a wild thought
I am using replication factor of 2 and I have two nodes. If I delete
complete data on one of the node, will I be able to recover all the data
from the active node?
I don't want to pursue this path as I want to find out the root cause of
the issue!


Any help will be greatly appreciated

Thank you,

Rahul






On Wed, Jan 13, 2016 at 3:37 PM, Carlos Rolo <r...@pythian.com> wrote:

> You can check if the snapshot exists in the snapshot folder.
> Repairs stream sstables over, than can temporary increase disk space. But
> I think Carlos Alonso might be correct. Running compactions might be the
> issue.
>
> Regards,
>
> Carlos Juzarte Rolo
> Cassandra Consultant
>
> Pythian - Love your data
>
> rolo@pythian | Twitter: @cjrolo | Linkedin: *linkedin.com/in/carlosjuzarterolo
> <http://linkedin.com/in/carlosjuzarterolo>*
> Mobile: +351 91 891 81 00 | Tel: +1 613 565 8696 x1649
> www.pythian.com
>
> On Wed, Jan 13, 2016 at 9:24 AM, Carlos Alonso <i...@mrcalonso.com> wrote:
>
>> I'd have a look also at possible running compactions.
>>
>> If you have big column families with STCS then large compactions may be
>> happening.
>>
>> Check it with nodetool compactionstats
>>
>> Carlos Alonso | Software Engineer | @calonso
>> <https://twitter.com/calonso>
>>
>> On 13 January 2016 at 05:22, Kevin O'Connor <ke...@reddit.com> wrote:
>>
>>> Have you tried restarting? It's possible there's open file handles to
>>> sstables that have been compacted away. You can verify by doing lsof and
>>> grepping for DEL or deleted.
>>>
>>> If it's not that, you can run nodetool cleanup on each node to scan all
>>> of the sstables on disk and remove anything that it's not responsible for.
>>> Generally this would only work if you added nodes recently.
>>>
>>>
>>> On Tuesday, January 12, 2016, Rahul Ramesh <rr.ii...@gmail.com> wrote:
>>>
>>>> We have a 2 node Cassandra cluster with a replication factor of 2.
>>>>
>>>> The load factor on the nodes is around 350Gb
>>>>
>>>> Datacenter: Cassandra
>>>> ==========
>>>> Address      Rack        Status State   Load            Owns
>>>>      Token
>>>>
>>>>     -5072018636360415943
>>>> 172.31.7.91  rack1       Up     Normal  328.5 GB        100.00%
>>>>     -7068746880841807701
>>>> 172.31.7.92  rack1       Up     Normal  351.7 GB        100.00%
>>>>     -5072018636360415943
>>>>
>>>> However,if I use df -h,
>>>>
>>>> /dev/xvdf       252G  223G   17G  94% /HDD1
>>>> /dev/xvdg       493G  456G   12G  98% /HDD2
>>>> /dev/xvdh       197G  167G   21G  90% /HDD3
>>>>
>>>>
>>>> HDD1,2,3 contains only cassandra data. It amounts to close to 1Tb in
>>>> one of the machine and in another machine it is close to 650Gb.
>>>>
>>>> I started repair 2 days ago, after running repair, the amount of disk
>>>> space consumption has actually increased.
>>>> I also checked if this is because of snapshots. nodetool listsnapshot
>>>> intermittently lists a snapshot but it goes away after sometime.
>>>>
>>>> Can somebody please help me understand,
>>>> 1. why so much disk space is consumed?
>>>> 2. Why did it increase after repair?
>>>> 3. Is there any way to recover from this state.
>>>>
>>>>
>>>> Thanks,
>>>> Rahul
>>>>
>>>>
>>
>
> --
>
>
>
>

Re: Cassandra is consuming a lot of disk space

Reply via email to