Re: Insufficient disk space to flush

Edward Capriolo Fri, 02 Dec 2011 21:36:12 -0800

There is no way to set a max size on an sstable file. If your Cassandra
data directory is not your / filesystem you could reformat it as ext4 (or
at least ext3 with better options)


On Fri, Dec 2, 2011 at 8:35 AM, Alexandru Dan Sicoe <
sicoe.alexan...@googlemail.com> wrote:

> Ok, so my problem persisted. On the node that is filling up the harddisk,
> I have a 230 GB disk. Right after I restart the node I it deletes tmp files
> and reaches 55GB of data on disk. Then it start to quickly fill up the disk
> - I see gigs added fast - it's not real data because other nodes don't have
> this.
>
> While all this is happening I am seeing the node do a minor compaction of
> the main data CF but extremely slowly. Today I saw the error:
>
> ERROR 09:44:57,605 Fatal exception in thread
> Thread[CompactionExecutor:15,1,main]
>
> java.io.IOException: File too
> large
>
>         at java.io.RandomAccessFile.writeBytes(Native
> Method)
>
>         at
> java.io.RandomAccessFile.write(RandomAccessFile.java:466)
>
>         at
> org.apache.cassandra.io.util.BufferedRandomAccessFile.flush(BufferedRandomAccessFile.java:168)
>
>         at
> org.apache.cassandra.io.util.BufferedRandomAccessFile.reBuffer(BufferedRandomAccessFile.java:242)
>
>         at
> org.apache.cassandra.io.util.BufferedRandomAccessFile.writeAtMost(BufferedRandomAccessFile.java:369)
>
>         at
> org.apache.cassandra.io.util.BufferedRandomAccessFile.write(BufferedRandomAccessFile.java:348)
>
>         at
> org.apache.cassandra.db.compaction.PrecompactedRow.write(PrecompactedRow.java:114)
>
>         at
> org.apache.cassandra.io.sstable.SSTableWriter.append(SSTableWriter.java:132)
>
>         at
> org.apache.cassandra.db.compaction.CompactionManager.doCompactionWithoutSizeEstimation(CompactionManager.java:576)
>
>         at
> org.apache.cassandra.db.compaction.CompactionManager.doCompaction(CompactionManager.java:507)
>
>         at
> org.apache.cassandra.db.compaction.CompactionManager$1.call(CompactionManager.java:142)
>
>         at
> org.apache.cassandra.db.compaction.CompactionManager$1.call(CompactionManager.java:108)
>
>         at
> java.util.concurrent.FutureTask$Sync.innerRun(FutureTask.java:303)
>
>         at
> java.util.concurrent.FutureTask.run(FutureTask.java:138)
>
>         at
> java.util.concurrent.ThreadPoolExecutor$Worker.runTask(ThreadPoolExecutor.java:886)
>
>         at
> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:908)
>
>         at java.lang.Thread.run(Thread.java:619)
>
> which means that it cannot finish that compaction because it hit the max
> file size. So I checked the file system and block size and I got ext3 and
> 1K which means that the max file size is 16GB.
>
> I didn't know what to do in this case so I just decommisioned the node.
>
> Is there a way to get around this max file limit? Is there some Cassandra
> configuration that helps avoid this? I'm asking here because I couldn't
> find anything in the documentation about that.
>
> I'm waiting for new machines to run Cassandra on....what file systems are
> people using?
>
> Cheers,
> Alex
>
>
>
> On Thu, Dec 1, 2011 at 10:08 PM, Jahangir Mohammed <
> md.jahangi...@gmail.com> wrote:
>
>> Yes, mostly sounds like it. In our case failed repairs were causing
>> accumulation of the tmp files.
>>
>> Thanks,
>> Jahangir Mohammed.
>>
>> On Thu, Dec 1, 2011 at 2:43 PM, Alexandru Dan Sicoe <
>> sicoe.alexan...@googlemail.com> wrote:
>>
>>> Hi Jeremiah,
>>>  My commitlog was indeed on another disk. I did what you said and yes
>>> the node restart brings back the disk size to the around 50 GB I was
>>> expecting. Still I do not understand how the node managed to get itself in
>>> the situation of having these tmp files? Could you clarify what these are,
>>> how they are produced and why? I've tried to find a clear definition but
>>> all I could come up with is hints that they are produced during compaction.
>>> I also found a thread that described a similar problem:
>>>
>>> http://cassandra-user-incubator-apache-org.3065146.n2.nabble.com/Errors-During-Compaction-td5953493.html
>>> as described there it seems like compaction fails and tmp files don't
>>> get cleaned up until they fill the disk. Is this what happened in my case?
>>> Compactions did not finish properly because the disk utilization was more
>>> than half and then more and more files tmp started getting accumulated at
>>> each other attempt. The Cassandra log would indicate this because I get
>>> many of these:
>>> ERROR [CompactionExecutor:22850] 2011-12-01 04:12:15,200
>>> CompactionManager.java (line 513) insufficie
>>> nt space to compact even the two smallest files, aborting
>>>
>>> before I started getting many of these:
>>> ERROR [FlushWriter:283] 2011-12-01 04:12:22,917
>>> AbstractCassandraDaemon.java (line 139) Fatal exception in thread
>>> Thread[FlushWriter:283,5,main] java.lang.RuntimeException:
>>> java.lang.RuntimeException: Insufficient disk space to flush 42531 bytes
>>>
>>> I just want to clearly understand what happened.
>>>
>>> Thanks,
>>> Alex
>>>
>>>
>>> On Thu, Dec 1, 2011 at 6:58 PM, Jeremiah Jordan <
>>> jeremiah.jor...@morningstar.com> wrote:
>>>
>>>>  If you are writing data with QUORUM or ALL you should be safe to
>>>> restart cassandra on that node.  If the extra space is all from *tmp* files
>>>> from compaction they will get deleted at startup.  You will then need to
>>>> run repair on that node to get back any data that was missed while it was
>>>> full.  If your commit log was on a different device you may not even have
>>>> lost much.
>>>>
>>>> -Jeremiah
>>>>
>>>>
>>>> On 12/01/2011 04:16 AM, Alexandru Dan Sicoe wrote:
>>>>
>>>> Hello everyone,
>>>>  4 node Cassandra 0.8.5 cluster with RF =2.
>>>>  One node started throwing exceptions in its log:
>>>>
>>>> ERROR 10:02:46,837 Fatal exception in thread
>>>> Thread[FlushWriter:1317,5,main]
>>>> java.lang.RuntimeException: java.lang.RuntimeException: Insufficient
>>>> disk space to flush 17296 bytes
>>>>         at
>>>> org.apache.cassandra.utils.WrappedRunnable.run(WrappedRunnable.java:34)
>>>>         at
>>>> java.util.concurrent.ThreadPoolExecutor$Worker.runTask(ThreadPoolExecutor.java:886)
>>>>         at
>>>> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:908)
>>>>         at java.lang.Thread.run(Thread.java:619)
>>>> Caused by: java.lang.RuntimeException: Insufficient disk space to flush
>>>> 17296 bytes
>>>>         at
>>>> org.apache.cassandra.db.ColumnFamilyStore.getFlushPath(ColumnFamilyStore.java:714)
>>>>         at
>>>> org.apache.cassandra.db.ColumnFamilyStore.createFlushWriter(ColumnFamilyStore.java:2301)
>>>>         at
>>>> org.apache.cassandra.db.Memtable.writeSortedContents(Memtable.java:246)
>>>>         at org.apache.cassandra.db.Memtable.access$400(Memtable.java:49)
>>>>         at
>>>> org.apache.cassandra.db.Memtable$3.runMayThrow(Memtable.java:270)
>>>>         at
>>>> org.apache.cassandra.utils.WrappedRunnable.run(WrappedRunnable.java:30)
>>>>         ... 3 more
>>>>
>>>> Checked disk and obviously it's 100% full.
>>>>
>>>> How do I recover from this without loosing the data? I've got plenty of
>>>> space on the other nodes, so I thought of doing a decommission which I
>>>> understand reassigns ranges to the other nodes and replicates data to them.
>>>> After that's done I plan on manually deleting the data on the node and then
>>>> joining in the same cluster position with auto-bootstrap turned off so that
>>>> I won't get back the old data and I can continue getting new data with the
>>>> node.
>>>>
>>>> Note, I would like to have 4 nodes in because the other three barely
>>>> take the input load alone. These are just long running tests until I get
>>>> some better machines.
>>>>
>>>> On strange thing I found is that the data folder on the ndoe that
>>>> filled up the disk is 150 GB (as measured with du) while the data folder on
>>>> all other 3 nodes is 50 GB. At the same time, DataStax OpsCenter shows a
>>>> size of around 50GB for all 4 nodes. I though that the node was making a
>>>> major compaction at which time it filled up the disk....but even that
>>>> doesn't make sense because shouldn't a major compaction just be capable of
>>>> doubling the size, not triple-ing it? Doesn anyone know how to explain this
>>>> behavior?
>>>>
>>>> Thanks,
>>>> Alex
>>>>
>>>>
>>>
>>
>
>
> --
> Alexandru Dan Sicoe
> MEng, CERN Marie Curie ACEOLE Fellow
>
>

Re: Insufficient disk space to flush

Reply via email to