Re: Cassandra is consuming a lot of disk space

Jan Kesten Thu, 14 Jan 2016 01:57:08 -0800

Hi Rahul,

it should work as you would expect - simply copy over the sstables from
your extra disk to the original one. To minimize downtime of the node
you can do something like this:


- rsync the files while the node is still running (sstables are
immutable) to copy most of the data
- edit cassandra.yaml to remove the additional datadir
- shutdown the node
- rsync again (just for the case, a new sstable got written while the
first one was running)
- restart

HTH
Jan

Am 14.01.2016 um 08:38 schrieb Rahul Ramesh:
> One update. I cleared the snapshot using nodetool clearsnapshot command.
> Disk space is recovered now. 
> 
> Because of this issue, I have mounted one more drive to the server and
> there are some data files there. How can I migrate the data so that I
> can decommission the drive? 
> Will it work if I just copy all the contents in the table directory to
> one of the drives? 
> 
> Thanks for all the help.
> 
> Regards,
> Rahul
> 
> On Thursday 14 January 2016, Rahul Ramesh <rr.ii...@gmail.com
> <mailto:rr.ii...@gmail.com>> wrote:
> 
>     Hi Jan,
>     I checked it. There are no old Key Spaces or tables.
>     Thanks for your pointer, I started looking inside the directories. I
>     see lot of snapshots directory inside the table directory. These
>     directories are consuming space.
> 
>     However these snapshots are not shown  when I issue listsnapshots
>     ./bin/nodetool listsnapshots
>     Snapshot Details: 
>     There are no snapshots
> 
>     Can I safely delete those snapshots? why listsnapshots is not
>     showing the snapshots? Also in future, how can we find out if there
>     are snapshots?
> 
>     Thanks,
>     Rahul
> 
> 
> 
>     On Thu, Jan 14, 2016 at 12:50 PM, Jan Kesten <j.kes...@enercast.de
>     <javascript:_e(%7B%7D,'cvml','j.kes...@enercast.de');>> wrote:
> 
>         Hi Rahul,
> 
>         just an idea, did you have a look at the data directorys on disk
>         (/var/lib/cassandra/data)? It could be that there are some from
>         old keyspaces that have been deleted and snapshoted before. Try
>         something like "du -sh /var/lib/cassandra/data/*" to verify
>         which keyspace is consuming your space.
> 
>         Jan
> 
>         Von meinem iPhone gesendet
> 
>         Am 14.01.2016 um 07:25 schrieb Rahul Ramesh <rr.ii...@gmail.com
>         <javascript:_e(%7B%7D,'cvml','rr.ii...@gmail.com');>>:
> 
>>         Thanks for your suggestion. 
>>
>>         Compaction was happening on one of the large tables. The disk
>>         space did not decrease much after the compaction. So I ran an
>>         external compaction. The disk space decreased by around 10%.
>>         However it is still consuming close to 750Gb for load of 250Gb. 
>>
>>         I even restarted cassandra thinking there may be some open
>>         files. However it didnt help much. 
>>
>>         Is there any way to find out why so much of data is being
>>         consumed? 
>>
>>         I checked if there are any open files using lsof. There are
>>         not any open files.
>>
>>         *Recovery:*
>>         Just a wild thought 
>>         I am using replication factor of 2 and I have two nodes. If I
>>         delete complete data on one of the node, will I be able to
>>         recover all the data from the active node? 
>>         I don't want to pursue this path as I want to find out the
>>         root cause of the issue! 
>>
>>
>>         Any help will be greatly appreciated
>>
>>         Thank you,
>>
>>         Rahul
>>
>>
>>
>>
>>
>>
>>         On Wed, Jan 13, 2016 at 3:37 PM, Carlos Rolo <r...@pythian.com
>>         <javascript:_e(%7B%7D,'cvml','r...@pythian.com');>> wrote:
>>
>>             You can check if the snapshot exists in the snapshot folder.
>>             Repairs stream sstables over, than can temporary increase
>>             disk space. But I think Carlos Alonso might be correct.
>>             Running compactions might be the issue.
>>
>>             Regards,
>>
>>             Carlos Juzarte Rolo
>>             Cassandra Consultant
>>              
>>             Pythian - Love your data
>>
>>             rolo@pythian | Twitter: @cjrolo | Linkedin:
>>             _linkedin.com/in/carlosjuzarterolo
>>             <http://linkedin.com/in/carlosjuzarterolo>_
>>             Mobile: +351 91 891 81 00
>>             <tel:%2B351%2091%20891%2081%2000> | Tel: +1 613 565 8696
>>             x1649 <tel:%2B1%20613%20565%208696%20x1649>
>>             www.pythian.com <http://www.pythian.com/>
>>
>>             On Wed, Jan 13, 2016 at 9:24 AM, Carlos Alonso
>>             <i...@mrcalonso.com
>>             <javascript:_e(%7B%7D,'cvml','i...@mrcalonso.com');>> wrote:
>>
>>                 I'd have a look also at possible running compactions.
>>
>>                 If you have big column families with STCS then large
>>                 compactions may be happening.
>>
>>                 Check it with nodetool compactionstats
>>
>>                 Carlos Alonso | Software Engineer | @calonso
>>                 <https://twitter.com/calonso>
>>
>>                 On 13 January 2016 at 05:22, Kevin O'Connor
>>                 <ke...@reddit.com
>>                 <javascript:_e(%7B%7D,'cvml','ke...@reddit.com');>> wrote:
>>
>>                     Have you tried restarting? It's possible there's
>>                     open file handles to sstables that have been
>>                     compacted away. You can verify by doing lsof and
>>                     grepping for DEL or deleted. 
>>
>>                     If it's not that, you can run nodetool cleanup on
>>                     each node to scan all of the sstables on disk and
>>                     remove anything that it's not responsible for.
>>                     Generally this would only work if you added nodes
>>                     recently. 
>>
>>
>>                     On Tuesday, January 12, 2016, Rahul Ramesh
>>                     <rr.ii...@gmail.com
>>                     <javascript:_e(%7B%7D,'cvml','rr.ii...@gmail.com');>>
>>                     wrote:
>>
>>                         We have a 2 node Cassandra cluster with a
>>                         replication factor of 2. 
>>
>>                         The load factor on the nodes is around 350Gb
>>
>>                         Datacenter: Cassandra
>>                         ==========
>>                         Address      Rack        Status State   Load  
>>                                  Owns                Token            
>>                                                   
>>                                                                      
>>                                                      
>>                         -5072018636360415943                        
>>                         172.31.7.91  rack1       Up     Normal  328.5
>>                         GB        100.00%            
>>                         -7068746880841807701                        
>>                         172.31.7.92  rack1       Up     Normal  351.7
>>                         GB        100.00%            
>>                         -5072018636360415943                        
>>
>>                         However,if I use df -h, 
>>
>>                         /dev/xvdf       252G  223G   17G  94% /HDD1
>>                         /dev/xvdg       493G  456G   12G  98% /HDD2
>>                         /dev/xvdh       197G  167G   21G  90% /HDD3
>>
>>
>>                         HDD1,2,3 contains only cassandra data. It
>>                         amounts to close to 1Tb in one of the machine
>>                         and in another machine it is close to 650Gb. 
>>
>>                         I started repair 2 days ago, after running
>>                         repair, the amount of disk space consumption
>>                         has actually increased. 
>>                         I also checked if this is because of
>>                         snapshots. nodetool listsnapshot
>>                         intermittently lists a snapshot but it goes
>>                         away after sometime. 
>>
>>                         Can somebody please help me understand, 
>>                         1. why so much disk space is consumed?
>>                         2. Why did it increase after repair?
>>                         3. Is there any way to recover from this state.
>>
>>
>>                         Thanks,
>>                         Rahul
>>
>>
>>
>>
>>             --
>>
>>
>>
>>
> 

-- 
i.A. Jan Kesten, Softwaredeveloper

enercast GmbH
Universitätsplatz 12
D-34127 Kassel
Tel.: +49 561 / 47 39 664-0
Fax: +49 561 / 47 39 664-9
mailto: j.kes...@enercast.de
http://www.enercast.de
AG Kassel HRB 15471
Geschäftsführung: Dipl.-Ing. Thomas Landgraf (CEO) | Bernd Kratz (CTO) |
Philipp Rinder (CSO)

Diese E-Mail und etwaige Anhänge können vertrauliche und/oder rechtlich
geschützte Informationen enthalten. Falls Sie nicht der angegebene
Empfänger sind oder falls diese E-Mail irrtümlich an Sie adressiert
wurde, benachrichtigen Sie uns bitte sofort durch Antwort-E-Mail und
löschen Sie diese E-Mail nebst etwaigen Anlagen von Ihrem System. Ebenso
dürfen Sie diese E-Mail oder ihre Anlagen nicht kopieren oder an Dritte
weitergeben. Vielen Dank.

This e-mail and any attachment may contain confidential and/or
privileged information. If you are not the named addressee or if this
transmission has been addressed to you in error, please notify us
immediately by reply e-mail and then delete this e-mail and any
attachment from your system. Please understand that you must not copy
this e-mail or any attachment or disclose the contents to any other
person. Thank you for your cooperation.

Re: Cassandra is consuming a lot of disk space

Reply via email to