Ok, so my problem persisted. On the node that is filling up the harddisk, I have a 230 GB disk. Right after I restart the node I it deletes tmp files and reaches 55GB of data on disk. Then it start to quickly fill up the disk - I see gigs added fast - it's not real data because other nodes don't have this.
While all this is happening I am seeing the node do a minor compaction of the main data CF but extremely slowly. Today I saw the error: ERROR 09:44:57,605 Fatal exception in thread Thread[CompactionExecutor:15,1,main] java.io.IOException: File too large at java.io.RandomAccessFile.writeBytes(Native Method) at java.io.RandomAccessFile.write(RandomAccessFile.java:466) at org.apache.cassandra.io.util.BufferedRandomAccessFile.flush(BufferedRandomAccessFile.java:168) at org.apache.cassandra.io.util.BufferedRandomAccessFile.reBuffer(BufferedRandomAccessFile.java:242) at org.apache.cassandra.io.util.BufferedRandomAccessFile.writeAtMost(BufferedRandomAccessFile.java:369) at org.apache.cassandra.io.util.BufferedRandomAccessFile.write(BufferedRandomAccessFile.java:348) at org.apache.cassandra.db.compaction.PrecompactedRow.write(PrecompactedRow.java:114) at org.apache.cassandra.io.sstable.SSTableWriter.append(SSTableWriter.java:132) at org.apache.cassandra.db.compaction.CompactionManager.doCompactionWithoutSizeEstimation(CompactionManager.java:576) at org.apache.cassandra.db.compaction.CompactionManager.doCompaction(CompactionManager.java:507) at org.apache.cassandra.db.compaction.CompactionManager$1.call(CompactionManager.java:142) at org.apache.cassandra.db.compaction.CompactionManager$1.call(CompactionManager.java:108) at java.util.concurrent.FutureTask$Sync.innerRun(FutureTask.java:303) at java.util.concurrent.FutureTask.run(FutureTask.java:138) at java.util.concurrent.ThreadPoolExecutor$Worker.runTask(ThreadPoolExecutor.java:886) at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:908) at java.lang.Thread.run(Thread.java:619) which means that it cannot finish that compaction because it hit the max file size. So I checked the file system and block size and I got ext3 and 1K which means that the max file size is 16GB. I didn't know what to do in this case so I just decommisioned the node. Is there a way to get around this max file limit? Is there some Cassandra configuration that helps avoid this? I'm asking here because I couldn't find anything in the documentation about that. I'm waiting for new machines to run Cassandra on....what file systems are people using? Cheers, Alex On Thu, Dec 1, 2011 at 10:08 PM, Jahangir Mohammed <md.jahangi...@gmail.com>wrote: > Yes, mostly sounds like it. In our case failed repairs were causing > accumulation of the tmp files. > > Thanks, > Jahangir Mohammed. > > On Thu, Dec 1, 2011 at 2:43 PM, Alexandru Dan Sicoe < > sicoe.alexan...@googlemail.com> wrote: > >> Hi Jeremiah, >> My commitlog was indeed on another disk. I did what you said and yes the >> node restart brings back the disk size to the around 50 GB I was expecting. >> Still I do not understand how the node managed to get itself in the >> situation of having these tmp files? Could you clarify what these are, how >> they are produced and why? I've tried to find a clear definition but all I >> could come up with is hints that they are produced during compaction. I >> also found a thread that described a similar problem: >> >> http://cassandra-user-incubator-apache-org.3065146.n2.nabble.com/Errors-During-Compaction-td5953493.html >> as described there it seems like compaction fails and tmp files don't get >> cleaned up until they fill the disk. Is this what happened in my case? >> Compactions did not finish properly because the disk utilization was more >> than half and then more and more files tmp started getting accumulated at >> each other attempt. The Cassandra log would indicate this because I get >> many of these: >> ERROR [CompactionExecutor:22850] 2011-12-01 04:12:15,200 >> CompactionManager.java (line 513) insufficie >> nt space to compact even the two smallest files, aborting >> >> before I started getting many of these: >> ERROR [FlushWriter:283] 2011-12-01 04:12:22,917 >> AbstractCassandraDaemon.java (line 139) Fatal exception in thread >> Thread[FlushWriter:283,5,main] java.lang.RuntimeException: >> java.lang.RuntimeException: Insufficient disk space to flush 42531 bytes >> >> I just want to clearly understand what happened. >> >> Thanks, >> Alex >> >> >> On Thu, Dec 1, 2011 at 6:58 PM, Jeremiah Jordan < >> jeremiah.jor...@morningstar.com> wrote: >> >>> If you are writing data with QUORUM or ALL you should be safe to >>> restart cassandra on that node. If the extra space is all from *tmp* files >>> from compaction they will get deleted at startup. You will then need to >>> run repair on that node to get back any data that was missed while it was >>> full. If your commit log was on a different device you may not even have >>> lost much. >>> >>> -Jeremiah >>> >>> >>> On 12/01/2011 04:16 AM, Alexandru Dan Sicoe wrote: >>> >>> Hello everyone, >>> 4 node Cassandra 0.8.5 cluster with RF =2. >>> One node started throwing exceptions in its log: >>> >>> ERROR 10:02:46,837 Fatal exception in thread >>> Thread[FlushWriter:1317,5,main] >>> java.lang.RuntimeException: java.lang.RuntimeException: Insufficient >>> disk space to flush 17296 bytes >>> at >>> org.apache.cassandra.utils.WrappedRunnable.run(WrappedRunnable.java:34) >>> at >>> java.util.concurrent.ThreadPoolExecutor$Worker.runTask(ThreadPoolExecutor.java:886) >>> at >>> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:908) >>> at java.lang.Thread.run(Thread.java:619) >>> Caused by: java.lang.RuntimeException: Insufficient disk space to flush >>> 17296 bytes >>> at >>> org.apache.cassandra.db.ColumnFamilyStore.getFlushPath(ColumnFamilyStore.java:714) >>> at >>> org.apache.cassandra.db.ColumnFamilyStore.createFlushWriter(ColumnFamilyStore.java:2301) >>> at >>> org.apache.cassandra.db.Memtable.writeSortedContents(Memtable.java:246) >>> at org.apache.cassandra.db.Memtable.access$400(Memtable.java:49) >>> at >>> org.apache.cassandra.db.Memtable$3.runMayThrow(Memtable.java:270) >>> at >>> org.apache.cassandra.utils.WrappedRunnable.run(WrappedRunnable.java:30) >>> ... 3 more >>> >>> Checked disk and obviously it's 100% full. >>> >>> How do I recover from this without loosing the data? I've got plenty of >>> space on the other nodes, so I thought of doing a decommission which I >>> understand reassigns ranges to the other nodes and replicates data to them. >>> After that's done I plan on manually deleting the data on the node and then >>> joining in the same cluster position with auto-bootstrap turned off so that >>> I won't get back the old data and I can continue getting new data with the >>> node. >>> >>> Note, I would like to have 4 nodes in because the other three barely >>> take the input load alone. These are just long running tests until I get >>> some better machines. >>> >>> On strange thing I found is that the data folder on the ndoe that filled >>> up the disk is 150 GB (as measured with du) while the data folder on all >>> other 3 nodes is 50 GB. At the same time, DataStax OpsCenter shows a size >>> of around 50GB for all 4 nodes. I though that the node was making a major >>> compaction at which time it filled up the disk....but even that doesn't >>> make sense because shouldn't a major compaction just be capable of doubling >>> the size, not triple-ing it? Doesn anyone know how to explain this behavior? >>> >>> Thanks, >>> Alex >>> >>> >> > -- Alexandru Dan Sicoe MEng, CERN Marie Curie ACEOLE Fellow