Re: Cassandra is consuming a lot of disk space

Rahul Ramesh Thu, 14 Jan 2016 00:21:00 -0800

Hi Jan,
I checked it. There are no old Key Spaces or tables.
Thanks for your pointer, I started looking inside the directories. I see
lot of snapshots directory inside the table directory. These directories
are consuming space.


However these snapshots are not shown  when I issue listsnapshots
./bin/nodetool listsnapshots
Snapshot Details:
There are no snapshots

Can I safely delete those snapshots? why listsnapshots is not showing the
snapshots? Also in future, how can we find out if there are snapshots?

Thanks,
Rahul



On Thu, Jan 14, 2016 at 12:50 PM, Jan Kesten <j.kes...@enercast.de> wrote:

> Hi Rahul,
>
> just an idea, did you have a look at the data directorys on disk
> (/var/lib/cassandra/data)? It could be that there are some from old
> keyspaces that have been deleted and snapshoted before. Try something like
> "du -sh /var/lib/cassandra/data/*" to verify which keyspace is consuming
> your space.
>
> Jan
>
> Von meinem iPhone gesendet
>
> Am 14.01.2016 um 07:25 schrieb Rahul Ramesh <rr.ii...@gmail.com>:
>
> Thanks for your suggestion.
>
> Compaction was happening on one of the large tables. The disk space did
> not decrease much after the compaction. So I ran an external compaction.
> The disk space decreased by around 10%. However it is still consuming close
> to 750Gb for load of 250Gb.
>
> I even restarted cassandra thinking there may be some open files. However
> it didnt help much.
>
> Is there any way to find out why so much of data is being consumed?
>
> I checked if there are any open files using lsof. There are not any open
> files.
>
> *Recovery:*
> Just a wild thought
> I am using replication factor of 2 and I have two nodes. If I delete
> complete data on one of the node, will I be able to recover all the data
> from the active node?
> I don't want to pursue this path as I want to find out the root cause of
> the issue!
>
>
> Any help will be greatly appreciated
>
> Thank you,
>
> Rahul
>
>
>
>
>
>
> On Wed, Jan 13, 2016 at 3:37 PM, Carlos Rolo <r...@pythian.com> wrote:
>
>> You can check if the snapshot exists in the snapshot folder.
>> Repairs stream sstables over, than can temporary increase disk space. But
>> I think Carlos Alonso might be correct. Running compactions might be the
>> issue.
>>
>> Regards,
>>
>> Carlos Juzarte Rolo
>> Cassandra Consultant
>>
>> Pythian - Love your data
>>
>> rolo@pythian | Twitter: @cjrolo | Linkedin: 
>> *linkedin.com/in/carlosjuzarterolo
>> <http://linkedin.com/in/carlosjuzarterolo>*
>> Mobile: +351 91 891 81 00 | Tel: +1 613 565 8696 x1649
>> www.pythian.com
>>
>> On Wed, Jan 13, 2016 at 9:24 AM, Carlos Alonso <i...@mrcalonso.com>
>> wrote:
>>
>>> I'd have a look also at possible running compactions.
>>>
>>> If you have big column families with STCS then large compactions may be
>>> happening.
>>>
>>> Check it with nodetool compactionstats
>>>
>>> Carlos Alonso | Software Engineer | @calonso
>>> <https://twitter.com/calonso>
>>>
>>> On 13 January 2016 at 05:22, Kevin O'Connor <ke...@reddit.com> wrote:
>>>
>>>> Have you tried restarting? It's possible there's open file handles to
>>>> sstables that have been compacted away. You can verify by doing lsof and
>>>> grepping for DEL or deleted.
>>>>
>>>> If it's not that, you can run nodetool cleanup on each node to scan all
>>>> of the sstables on disk and remove anything that it's not responsible for.
>>>> Generally this would only work if you added nodes recently.
>>>>
>>>>
>>>> On Tuesday, January 12, 2016, Rahul Ramesh <rr.ii...@gmail.com> wrote:
>>>>
>>>>> We have a 2 node Cassandra cluster with a replication factor of 2.
>>>>>
>>>>> The load factor on the nodes is around 350Gb
>>>>>
>>>>> Datacenter: Cassandra
>>>>> ==========
>>>>> Address      Rack        Status State   Load            Owns
>>>>>      Token
>>>>>
>>>>>       -5072018636360415943
>>>>> 172.31.7.91  rack1       Up     Normal  328.5 GB        100.00%
>>>>>       -7068746880841807701
>>>>> 172.31.7.92  rack1       Up     Normal  351.7 GB        100.00%
>>>>>       -5072018636360415943
>>>>>
>>>>> However,if I use df -h,
>>>>>
>>>>> /dev/xvdf       252G  223G   17G  94% /HDD1
>>>>> /dev/xvdg       493G  456G   12G  98% /HDD2
>>>>> /dev/xvdh       197G  167G   21G  90% /HDD3
>>>>>
>>>>>
>>>>> HDD1,2,3 contains only cassandra data. It amounts to close to 1Tb in
>>>>> one of the machine and in another machine it is close to 650Gb.
>>>>>
>>>>> I started repair 2 days ago, after running repair, the amount of disk
>>>>> space consumption has actually increased.
>>>>> I also checked if this is because of snapshots. nodetool listsnapshot
>>>>> intermittently lists a snapshot but it goes away after sometime.
>>>>>
>>>>> Can somebody please help me understand,
>>>>> 1. why so much disk space is consumed?
>>>>> 2. Why did it increase after repair?
>>>>> 3. Is there any way to recover from this state.
>>>>>
>>>>>
>>>>> Thanks,
>>>>> Rahul
>>>>>
>>>>>
>>>
>>
>> --
>>
>>
>>
>>
>

Re: Cassandra is consuming a lot of disk space

Reply via email to