Re: Cassandra is consuming a lot of disk space

Rahul Ramesh Thu, 14 Jan 2016 00:40:07 -0800

One update. I cleared the snapshot using nodetool clearsnapshot command.
Disk space is recovered now.


Because of this issue, I have mounted one more drive to the server and
there are some data files there. How can I migrate the data so that I can
decommission the drive?
Will it work if I just copy all the contents in the table directory to one
of the drives?

Thanks for all the help.

Regards,
Rahul

On Thursday 14 January 2016, Rahul Ramesh <rr.ii...@gmail.com> wrote:

> Hi Jan,
> I checked it. There are no old Key Spaces or tables.
> Thanks for your pointer, I started looking inside the directories. I see
> lot of snapshots directory inside the table directory. These directories
> are consuming space.
>
> However these snapshots are not shown  when I issue listsnapshots
> ./bin/nodetool listsnapshots
> Snapshot Details:
> There are no snapshots
>
> Can I safely delete those snapshots? why listsnapshots is not showing the
> snapshots? Also in future, how can we find out if there are snapshots?
>
> Thanks,
> Rahul
>
>
>
> On Thu, Jan 14, 2016 at 12:50 PM, Jan Kesten <j.kes...@enercast.de
> <javascript:_e(%7B%7D,'cvml','j.kes...@enercast.de');>> wrote:
>
>> Hi Rahul,
>>
>> just an idea, did you have a look at the data directorys on disk
>> (/var/lib/cassandra/data)? It could be that there are some from old
>> keyspaces that have been deleted and snapshoted before. Try something like
>> "du -sh /var/lib/cassandra/data/*" to verify which keyspace is consuming
>> your space.
>>
>> Jan
>>
>> Von meinem iPhone gesendet
>>
>> Am 14.01.2016 um 07:25 schrieb Rahul Ramesh <rr.ii...@gmail.com
>> <javascript:_e(%7B%7D,'cvml','rr.ii...@gmail.com');>>:
>>
>> Thanks for your suggestion.
>>
>> Compaction was happening on one of the large tables. The disk space did
>> not decrease much after the compaction. So I ran an external compaction.
>> The disk space decreased by around 10%. However it is still consuming close
>> to 750Gb for load of 250Gb.
>>
>> I even restarted cassandra thinking there may be some open files. However
>> it didnt help much.
>>
>> Is there any way to find out why so much of data is being consumed?
>>
>> I checked if there are any open files using lsof. There are not any open
>> files.
>>
>> *Recovery:*
>> Just a wild thought
>> I am using replication factor of 2 and I have two nodes. If I delete
>> complete data on one of the node, will I be able to recover all the data
>> from the active node?
>> I don't want to pursue this path as I want to find out the root cause of
>> the issue!
>>
>>
>> Any help will be greatly appreciated
>>
>> Thank you,
>>
>> Rahul
>>
>>
>>
>>
>>
>>
>> On Wed, Jan 13, 2016 at 3:37 PM, Carlos Rolo <r...@pythian.com
>> <javascript:_e(%7B%7D,'cvml','r...@pythian.com');>> wrote:
>>
>>> You can check if the snapshot exists in the snapshot folder.
>>> Repairs stream sstables over, than can temporary increase disk space.
>>> But I think Carlos Alonso might be correct. Running compactions might be
>>> the issue.
>>>
>>> Regards,
>>>
>>> Carlos Juzarte Rolo
>>> Cassandra Consultant
>>>
>>> Pythian - Love your data
>>>
>>> rolo@pythian | Twitter: @cjrolo | Linkedin: 
>>> *linkedin.com/in/carlosjuzarterolo
>>> <http://linkedin.com/in/carlosjuzarterolo>*
>>> Mobile: +351 91 891 81 00 | Tel: +1 613 565 8696 x1649
>>> www.pythian.com
>>>
>>> On Wed, Jan 13, 2016 at 9:24 AM, Carlos Alonso <i...@mrcalonso.com
>>> <javascript:_e(%7B%7D,'cvml','i...@mrcalonso.com');>> wrote:
>>>
>>>> I'd have a look also at possible running compactions.
>>>>
>>>> If you have big column families with STCS then large compactions may be
>>>> happening.
>>>>
>>>> Check it with nodetool compactionstats
>>>>
>>>> Carlos Alonso | Software Engineer | @calonso
>>>> <https://twitter.com/calonso>
>>>>
>>>> On 13 January 2016 at 05:22, Kevin O'Connor <ke...@reddit.com
>>>> <javascript:_e(%7B%7D,'cvml','ke...@reddit.com');>> wrote:
>>>>
>>>>> Have you tried restarting? It's possible there's open file handles to
>>>>> sstables that have been compacted away. You can verify by doing lsof and
>>>>> grepping for DEL or deleted.
>>>>>
>>>>> If it's not that, you can run nodetool cleanup on each node to scan
>>>>> all of the sstables on disk and remove anything that it's not responsible
>>>>> for. Generally this would only work if you added nodes recently.
>>>>>
>>>>>
>>>>> On Tuesday, January 12, 2016, Rahul Ramesh <rr.ii...@gmail.com
>>>>> <javascript:_e(%7B%7D,'cvml','rr.ii...@gmail.com');>> wrote:
>>>>>
>>>>>> We have a 2 node Cassandra cluster with a replication factor of 2.
>>>>>>
>>>>>> The load factor on the nodes is around 350Gb
>>>>>>
>>>>>> Datacenter: Cassandra
>>>>>> ==========
>>>>>> Address      Rack        Status State   Load            Owns
>>>>>>        Token
>>>>>>
>>>>>>       -5072018636360415943
>>>>>> 172.31.7.91  rack1       Up     Normal  328.5 GB        100.00%
>>>>>>       -7068746880841807701
>>>>>> 172.31.7.92  rack1       Up     Normal  351.7 GB        100.00%
>>>>>>       -5072018636360415943
>>>>>>
>>>>>> However,if I use df -h,
>>>>>>
>>>>>> /dev/xvdf       252G  223G   17G  94% /HDD1
>>>>>> /dev/xvdg       493G  456G   12G  98% /HDD2
>>>>>> /dev/xvdh       197G  167G   21G  90% /HDD3
>>>>>>
>>>>>>
>>>>>> HDD1,2,3 contains only cassandra data. It amounts to close to 1Tb in
>>>>>> one of the machine and in another machine it is close to 650Gb.
>>>>>>
>>>>>> I started repair 2 days ago, after running repair, the amount of disk
>>>>>> space consumption has actually increased.
>>>>>> I also checked if this is because of snapshots. nodetool listsnapshot
>>>>>> intermittently lists a snapshot but it goes away after sometime.
>>>>>>
>>>>>> Can somebody please help me understand,
>>>>>> 1. why so much disk space is consumed?
>>>>>> 2. Why did it increase after repair?
>>>>>> 3. Is there any way to recover from this state.
>>>>>>
>>>>>>
>>>>>> Thanks,
>>>>>> Rahul
>>>>>>
>>>>>>
>>>>
>>>
>>> --
>>>
>>>
>>>
>>>
>>
>

Re: Cassandra is consuming a lot of disk space

Reply via email to