Understood. Thanks Edward! On Sat, Dec 3, 2011 at 6:35 AM, Edward Capriolo <edlinuxg...@gmail.com>wrote:
> There is no way to set a max size on an sstable file. If your Cassandra > data directory is not your / filesystem you could reformat it as ext4 (or > at least ext3 with better options) > > > On Fri, Dec 2, 2011 at 8:35 AM, Alexandru Dan Sicoe < > sicoe.alexan...@googlemail.com> wrote: > >> Ok, so my problem persisted. On the node that is filling up the harddisk, >> I have a 230 GB disk. Right after I restart the node I it deletes tmp files >> and reaches 55GB of data on disk. Then it start to quickly fill up the disk >> - I see gigs added fast - it's not real data because other nodes don't have >> this. >> >> While all this is happening I am seeing the node do a minor compaction of >> the main data CF but extremely slowly. Today I saw the error: >> >> ERROR 09:44:57,605 Fatal exception in thread >> Thread[CompactionExecutor:15,1,main] >> >> java.io.IOException: File too >> large >> >> at java.io.RandomAccessFile.writeBytes(Native >> Method) >> >> at >> java.io.RandomAccessFile.write(RandomAccessFile.java:466) >> >> at >> org.apache.cassandra.io.util.BufferedRandomAccessFile.flush(BufferedRandomAccessFile.java:168) >> >> at >> org.apache.cassandra.io.util.BufferedRandomAccessFile.reBuffer(BufferedRandomAccessFile.java:242) >> >> at >> org.apache.cassandra.io.util.BufferedRandomAccessFile.writeAtMost(BufferedRandomAccessFile.java:369) >> >> at >> org.apache.cassandra.io.util.BufferedRandomAccessFile.write(BufferedRandomAccessFile.java:348) >> >> at >> org.apache.cassandra.db.compaction.PrecompactedRow.write(PrecompactedRow.java:114) >> >> at >> org.apache.cassandra.io.sstable.SSTableWriter.append(SSTableWriter.java:132) >> >> at >> org.apache.cassandra.db.compaction.CompactionManager.doCompactionWithoutSizeEstimation(CompactionManager.java:576) >> >> at >> org.apache.cassandra.db.compaction.CompactionManager.doCompaction(CompactionManager.java:507) >> >> at >> org.apache.cassandra.db.compaction.CompactionManager$1.call(CompactionManager.java:142) >> >> at >> org.apache.cassandra.db.compaction.CompactionManager$1.call(CompactionManager.java:108) >> >> at >> java.util.concurrent.FutureTask$Sync.innerRun(FutureTask.java:303) >> >> at >> java.util.concurrent.FutureTask.run(FutureTask.java:138) >> >> at >> java.util.concurrent.ThreadPoolExecutor$Worker.runTask(ThreadPoolExecutor.java:886) >> >> at >> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:908) >> >> at java.lang.Thread.run(Thread.java:619) >> >> which means that it cannot finish that compaction because it hit the max >> file size. So I checked the file system and block size and I got ext3 and >> 1K which means that the max file size is 16GB. >> >> I didn't know what to do in this case so I just decommisioned the node. >> >> Is there a way to get around this max file limit? Is there some Cassandra >> configuration that helps avoid this? I'm asking here because I couldn't >> find anything in the documentation about that. >> >> I'm waiting for new machines to run Cassandra on....what file systems are >> people using? >> >> Cheers, >> Alex >> >> >> >> On Thu, Dec 1, 2011 at 10:08 PM, Jahangir Mohammed < >> md.jahangi...@gmail.com> wrote: >> >>> Yes, mostly sounds like it. In our case failed repairs were causing >>> accumulation of the tmp files. >>> >>> Thanks, >>> Jahangir Mohammed. >>> >>> On Thu, Dec 1, 2011 at 2:43 PM, Alexandru Dan Sicoe < >>> sicoe.alexan...@googlemail.com> wrote: >>> >>>> Hi Jeremiah, >>>> My commitlog was indeed on another disk. I did what you said and yes >>>> the node restart brings back the disk size to the around 50 GB I was >>>> expecting. Still I do not understand how the node managed to get itself in >>>> the situation of having these tmp files? Could you clarify what these are, >>>> how they are produced and why? I've tried to find a clear definition but >>>> all I could come up with is hints that they are produced during compaction. >>>> I also found a thread that described a similar problem: >>>> >>>> http://cassandra-user-incubator-apache-org.3065146.n2.nabble.com/Errors-During-Compaction-td5953493.html >>>> as described there it seems like compaction fails and tmp files don't >>>> get cleaned up until they fill the disk. Is this what happened in my case? >>>> Compactions did not finish properly because the disk utilization was more >>>> than half and then more and more files tmp started getting accumulated at >>>> each other attempt. The Cassandra log would indicate this because I get >>>> many of these: >>>> ERROR [CompactionExecutor:22850] 2011-12-01 04:12:15,200 >>>> CompactionManager.java (line 513) insufficie >>>> nt space to compact even the two smallest files, aborting >>>> >>>> before I started getting many of these: >>>> ERROR [FlushWriter:283] 2011-12-01 04:12:22,917 >>>> AbstractCassandraDaemon.java (line 139) Fatal exception in thread >>>> Thread[FlushWriter:283,5,main] java.lang.RuntimeException: >>>> java.lang.RuntimeException: Insufficient disk space to flush 42531 bytes >>>> >>>> I just want to clearly understand what happened. >>>> >>>> Thanks, >>>> Alex >>>> >>>> >>>> On Thu, Dec 1, 2011 at 6:58 PM, Jeremiah Jordan < >>>> jeremiah.jor...@morningstar.com> wrote: >>>> >>>>> If you are writing data with QUORUM or ALL you should be safe to >>>>> restart cassandra on that node. If the extra space is all from *tmp* >>>>> files >>>>> from compaction they will get deleted at startup. You will then need to >>>>> run repair on that node to get back any data that was missed while it was >>>>> full. If your commit log was on a different device you may not even have >>>>> lost much. >>>>> >>>>> -Jeremiah >>>>> >>>>> >>>>> On 12/01/2011 04:16 AM, Alexandru Dan Sicoe wrote: >>>>> >>>>> Hello everyone, >>>>> 4 node Cassandra 0.8.5 cluster with RF =2. >>>>> One node started throwing exceptions in its log: >>>>> >>>>> ERROR 10:02:46,837 Fatal exception in thread >>>>> Thread[FlushWriter:1317,5,main] >>>>> java.lang.RuntimeException: java.lang.RuntimeException: Insufficient >>>>> disk space to flush 17296 bytes >>>>> at >>>>> org.apache.cassandra.utils.WrappedRunnable.run(WrappedRunnable.java:34) >>>>> at >>>>> java.util.concurrent.ThreadPoolExecutor$Worker.runTask(ThreadPoolExecutor.java:886) >>>>> at >>>>> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:908) >>>>> at java.lang.Thread.run(Thread.java:619) >>>>> Caused by: java.lang.RuntimeException: Insufficient disk space to >>>>> flush 17296 bytes >>>>> at >>>>> org.apache.cassandra.db.ColumnFamilyStore.getFlushPath(ColumnFamilyStore.java:714) >>>>> at >>>>> org.apache.cassandra.db.ColumnFamilyStore.createFlushWriter(ColumnFamilyStore.java:2301) >>>>> at >>>>> org.apache.cassandra.db.Memtable.writeSortedContents(Memtable.java:246) >>>>> at >>>>> org.apache.cassandra.db.Memtable.access$400(Memtable.java:49) >>>>> at >>>>> org.apache.cassandra.db.Memtable$3.runMayThrow(Memtable.java:270) >>>>> at >>>>> org.apache.cassandra.utils.WrappedRunnable.run(WrappedRunnable.java:30) >>>>> ... 3 more >>>>> >>>>> Checked disk and obviously it's 100% full. >>>>> >>>>> How do I recover from this without loosing the data? I've got plenty >>>>> of space on the other nodes, so I thought of doing a decommission which I >>>>> understand reassigns ranges to the other nodes and replicates data to >>>>> them. >>>>> After that's done I plan on manually deleting the data on the node and >>>>> then >>>>> joining in the same cluster position with auto-bootstrap turned off so >>>>> that >>>>> I won't get back the old data and I can continue getting new data with the >>>>> node. >>>>> >>>>> Note, I would like to have 4 nodes in because the other three barely >>>>> take the input load alone. These are just long running tests until I get >>>>> some better machines. >>>>> >>>>> On strange thing I found is that the data folder on the ndoe that >>>>> filled up the disk is 150 GB (as measured with du) while the data folder >>>>> on >>>>> all other 3 nodes is 50 GB. At the same time, DataStax OpsCenter shows a >>>>> size of around 50GB for all 4 nodes. I though that the node was making a >>>>> major compaction at which time it filled up the disk....but even that >>>>> doesn't make sense because shouldn't a major compaction just be capable of >>>>> doubling the size, not triple-ing it? Doesn anyone know how to explain >>>>> this >>>>> behavior? >>>>> >>>>> Thanks, >>>>> Alex >>>>> >>>>> >>>> >>> >> >> >> -- >> Alexandru Dan Sicoe >> MEng, CERN Marie Curie ACEOLE Fellow >> >> > -- Alexandru Dan Sicoe MEng, CERN Marie Curie ACEOLE Fellow