Re: Insufficient disk space to flush

Jeremiah Jordan Thu, 01 Dec 2011 10:00:10 -0800

If you are writing data with QUORUM or ALL you should be safe to restartcassandra on that node. If the extra space is all from *tmp* files fromcompaction they will get deleted at startup. You will then need to runrepair on that node to get back any data that was missed while it wasfull. If your commit log was on a different device you may not evenhave lost much.


-Jeremiah


On 12/01/2011 04:16 AM, Alexandru Dan Sicoe wrote:

Hello everyone,
 4 node Cassandra 0.8.5 cluster with RF =2.
 One node started throwing exceptions in its log:
ERROR 10:02:46,837 Fatal exception in threadThread[FlushWriter:1317,5,main]java.lang.RuntimeException: java.lang.RuntimeException: Insufficientdisk space to flush 17296 bytesatorg.apache.cassandra.utils.WrappedRunnable.run(WrappedRunnable.java:34)atjava.util.concurrent.ThreadPoolExecutor$Worker.runTask(ThreadPoolExecutor.java:886)atjava.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:908)
        at java.lang.Thread.run(Thread.java:619)
Caused by: java.lang.RuntimeException: Insufficient disk space toflush 17296 bytesatorg.apache.cassandra.db.ColumnFamilyStore.getFlushPath(ColumnFamilyStore.java:714)atorg.apache.cassandra.db.ColumnFamilyStore.createFlushWriter(ColumnFamilyStore.java:2301)atorg.apache.cassandra.db.Memtable.writeSortedContents(Memtable.java:246)
        at org.apache.cassandra.db.Memtable.access$400(Memtable.java:49)
atorg.apache.cassandra.db.Memtable$3.runMayThrow(Memtable.java:270)atorg.apache.cassandra.utils.WrappedRunnable.run(WrappedRunnable.java:30)
        ... 3 more

Checked disk and obviously it's 100% full.
How do I recover from this without loosing the data? I've got plentyof space on the other nodes, so I thought of doing a decommissionwhich I understand reassigns ranges to the other nodes and replicatesdata to them. After that's done I plan on manually deleting the dataon the node and then joining in the same cluster position withauto-bootstrap turned off so that I won't get back the old data and Ican continue getting new data with the node.
Note, I would like to have 4 nodes in because the other three barelytake the input load alone. These are just long running tests until Iget some better machines.
On strange thing I found is that the data folder on the ndoe thatfilled up the disk is 150 GB (as measured with du) while the datafolder on all other 3 nodes is 50 GB. At the same time, DataStaxOpsCenter shows a size of around 50GB for all 4 nodes. I though thatthe node was making a major compaction at which time it filled up thedisk....but even that doesn't make sense because shouldn't a majorcompaction just be capable of doubling the size, not triple-ing it?Doesn anyone know how to explain this behavior?
Thanks,
Alex

Re: Insufficient disk space to flush

Reply via email to