Thanks for your suggestion. Compaction was happening on one of the large tables. The disk space did not decrease much after the compaction. So I ran an external compaction. The disk space decreased by around 10%. However it is still consuming close to 750Gb for load of 250Gb.
I even restarted cassandra thinking there may be some open files. However it didnt help much. Is there any way to find out why so much of data is being consumed? I checked if there are any open files using lsof. There are not any open files. *Recovery:* Just a wild thought I am using replication factor of 2 and I have two nodes. If I delete complete data on one of the node, will I be able to recover all the data from the active node? I don't want to pursue this path as I want to find out the root cause of the issue! Any help will be greatly appreciated Thank you, Rahul On Wed, Jan 13, 2016 at 3:37 PM, Carlos Rolo <r...@pythian.com> wrote: > You can check if the snapshot exists in the snapshot folder. > Repairs stream sstables over, than can temporary increase disk space. But > I think Carlos Alonso might be correct. Running compactions might be the > issue. > > Regards, > > Carlos Juzarte Rolo > Cassandra Consultant > > Pythian - Love your data > > rolo@pythian | Twitter: @cjrolo | Linkedin: *linkedin.com/in/carlosjuzarterolo > <http://linkedin.com/in/carlosjuzarterolo>* > Mobile: +351 91 891 81 00 | Tel: +1 613 565 8696 x1649 > www.pythian.com > > On Wed, Jan 13, 2016 at 9:24 AM, Carlos Alonso <i...@mrcalonso.com> wrote: > >> I'd have a look also at possible running compactions. >> >> If you have big column families with STCS then large compactions may be >> happening. >> >> Check it with nodetool compactionstats >> >> Carlos Alonso | Software Engineer | @calonso >> <https://twitter.com/calonso> >> >> On 13 January 2016 at 05:22, Kevin O'Connor <ke...@reddit.com> wrote: >> >>> Have you tried restarting? It's possible there's open file handles to >>> sstables that have been compacted away. You can verify by doing lsof and >>> grepping for DEL or deleted. >>> >>> If it's not that, you can run nodetool cleanup on each node to scan all >>> of the sstables on disk and remove anything that it's not responsible for. >>> Generally this would only work if you added nodes recently. >>> >>> >>> On Tuesday, January 12, 2016, Rahul Ramesh <rr.ii...@gmail.com> wrote: >>> >>>> We have a 2 node Cassandra cluster with a replication factor of 2. >>>> >>>> The load factor on the nodes is around 350Gb >>>> >>>> Datacenter: Cassandra >>>> ========== >>>> Address Rack Status State Load Owns >>>> Token >>>> >>>> -5072018636360415943 >>>> 172.31.7.91 rack1 Up Normal 328.5 GB 100.00% >>>> -7068746880841807701 >>>> 172.31.7.92 rack1 Up Normal 351.7 GB 100.00% >>>> -5072018636360415943 >>>> >>>> However,if I use df -h, >>>> >>>> /dev/xvdf 252G 223G 17G 94% /HDD1 >>>> /dev/xvdg 493G 456G 12G 98% /HDD2 >>>> /dev/xvdh 197G 167G 21G 90% /HDD3 >>>> >>>> >>>> HDD1,2,3 contains only cassandra data. It amounts to close to 1Tb in >>>> one of the machine and in another machine it is close to 650Gb. >>>> >>>> I started repair 2 days ago, after running repair, the amount of disk >>>> space consumption has actually increased. >>>> I also checked if this is because of snapshots. nodetool listsnapshot >>>> intermittently lists a snapshot but it goes away after sometime. >>>> >>>> Can somebody please help me understand, >>>> 1. why so much disk space is consumed? >>>> 2. Why did it increase after repair? >>>> 3. Is there any way to recover from this state. >>>> >>>> >>>> Thanks, >>>> Rahul >>>> >>>> >> > > -- > > > >