Re: Insufficient disk space to flush

Alexandru Dan Sicoe Sat, 03 Dec 2011 02:06:17 -0800

Understood. Thanks Edward!

On Sat, Dec 3, 2011 at 6:35 AM, Edward Capriolo <edlinuxg...@gmail.com>wrote:


> There is no way to set a max size on an sstable file. If your Cassandra
> data directory is not your / filesystem you could reformat it as ext4 (or
> at least ext3 with better options)
>
>
> On Fri, Dec 2, 2011 at 8:35 AM, Alexandru Dan Sicoe <
> sicoe.alexan...@googlemail.com> wrote:
>
>> Ok, so my problem persisted. On the node that is filling up the harddisk,
>> I have a 230 GB disk. Right after I restart the node I it deletes tmp files
>> and reaches 55GB of data on disk. Then it start to quickly fill up the disk
>> - I see gigs added fast - it's not real data because other nodes don't have
>> this.
>>
>> While all this is happening I am seeing the node do a minor compaction of
>> the main data CF but extremely slowly. Today I saw the error:
>>
>> ERROR 09:44:57,605 Fatal exception in thread
>> Thread[CompactionExecutor:15,1,main]
>>
>> java.io.IOException: File too
>> large
>>
>>         at java.io.RandomAccessFile.writeBytes(Native
>> Method)
>>
>>         at
>> java.io.RandomAccessFile.write(RandomAccessFile.java:466)
>>
>>         at
>> org.apache.cassandra.io.util.BufferedRandomAccessFile.flush(BufferedRandomAccessFile.java:168)
>>
>>         at
>> org.apache.cassandra.io.util.BufferedRandomAccessFile.reBuffer(BufferedRandomAccessFile.java:242)
>>
>>         at
>> org.apache.cassandra.io.util.BufferedRandomAccessFile.writeAtMost(BufferedRandomAccessFile.java:369)
>>
>>         at
>> org.apache.cassandra.io.util.BufferedRandomAccessFile.write(BufferedRandomAccessFile.java:348)
>>
>>         at
>> org.apache.cassandra.db.compaction.PrecompactedRow.write(PrecompactedRow.java:114)
>>
>>         at
>> org.apache.cassandra.io.sstable.SSTableWriter.append(SSTableWriter.java:132)
>>
>>         at
>> org.apache.cassandra.db.compaction.CompactionManager.doCompactionWithoutSizeEstimation(CompactionManager.java:576)
>>
>>         at
>> org.apache.cassandra.db.compaction.CompactionManager.doCompaction(CompactionManager.java:507)
>>
>>         at
>> org.apache.cassandra.db.compaction.CompactionManager$1.call(CompactionManager.java:142)
>>
>>         at
>> org.apache.cassandra.db.compaction.CompactionManager$1.call(CompactionManager.java:108)
>>
>>         at
>> java.util.concurrent.FutureTask$Sync.innerRun(FutureTask.java:303)
>>
>>         at
>> java.util.concurrent.FutureTask.run(FutureTask.java:138)
>>
>>         at
>> java.util.concurrent.ThreadPoolExecutor$Worker.runTask(ThreadPoolExecutor.java:886)
>>
>>         at
>> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:908)
>>
>>         at java.lang.Thread.run(Thread.java:619)
>>
>> which means that it cannot finish that compaction because it hit the max
>> file size. So I checked the file system and block size and I got ext3 and
>> 1K which means that the max file size is 16GB.
>>
>> I didn't know what to do in this case so I just decommisioned the node.
>>
>> Is there a way to get around this max file limit? Is there some Cassandra
>> configuration that helps avoid this? I'm asking here because I couldn't
>> find anything in the documentation about that.
>>
>> I'm waiting for new machines to run Cassandra on....what file systems are
>> people using?
>>
>> Cheers,
>> Alex
>>
>>
>>
>> On Thu, Dec 1, 2011 at 10:08 PM, Jahangir Mohammed <
>> md.jahangi...@gmail.com> wrote:
>>
>>> Yes, mostly sounds like it. In our case failed repairs were causing
>>> accumulation of the tmp files.
>>>
>>> Thanks,
>>> Jahangir Mohammed.
>>>
>>> On Thu, Dec 1, 2011 at 2:43 PM, Alexandru Dan Sicoe <
>>> sicoe.alexan...@googlemail.com> wrote:
>>>
>>>> Hi Jeremiah,
>>>>  My commitlog was indeed on another disk. I did what you said and yes
>>>> the node restart brings back the disk size to the around 50 GB I was
>>>> expecting. Still I do not understand how the node managed to get itself in
>>>> the situation of having these tmp files? Could you clarify what these are,
>>>> how they are produced and why? I've tried to find a clear definition but
>>>> all I could come up with is hints that they are produced during compaction.
>>>> I also found a thread that described a similar problem:
>>>>
>>>> http://cassandra-user-incubator-apache-org.3065146.n2.nabble.com/Errors-During-Compaction-td5953493.html
>>>> as described there it seems like compaction fails and tmp files don't
>>>> get cleaned up until they fill the disk. Is this what happened in my case?
>>>> Compactions did not finish properly because the disk utilization was more
>>>> than half and then more and more files tmp started getting accumulated at
>>>> each other attempt. The Cassandra log would indicate this because I get
>>>> many of these:
>>>> ERROR [CompactionExecutor:22850] 2011-12-01 04:12:15,200
>>>> CompactionManager.java (line 513) insufficie
>>>> nt space to compact even the two smallest files, aborting
>>>>
>>>> before I started getting many of these:
>>>> ERROR [FlushWriter:283] 2011-12-01 04:12:22,917
>>>> AbstractCassandraDaemon.java (line 139) Fatal exception in thread
>>>> Thread[FlushWriter:283,5,main] java.lang.RuntimeException:
>>>> java.lang.RuntimeException: Insufficient disk space to flush 42531 bytes
>>>>
>>>> I just want to clearly understand what happened.
>>>>
>>>> Thanks,
>>>> Alex
>>>>
>>>>
>>>> On Thu, Dec 1, 2011 at 6:58 PM, Jeremiah Jordan <
>>>> jeremiah.jor...@morningstar.com> wrote:
>>>>
>>>>>  If you are writing data with QUORUM or ALL you should be safe to
>>>>> restart cassandra on that node.  If the extra space is all from *tmp* 
>>>>> files
>>>>> from compaction they will get deleted at startup.  You will then need to
>>>>> run repair on that node to get back any data that was missed while it was
>>>>> full.  If your commit log was on a different device you may not even have
>>>>> lost much.
>>>>>
>>>>> -Jeremiah
>>>>>
>>>>>
>>>>> On 12/01/2011 04:16 AM, Alexandru Dan Sicoe wrote:
>>>>>
>>>>> Hello everyone,
>>>>>  4 node Cassandra 0.8.5 cluster with RF =2.
>>>>>  One node started throwing exceptions in its log:
>>>>>
>>>>> ERROR 10:02:46,837 Fatal exception in thread
>>>>> Thread[FlushWriter:1317,5,main]
>>>>> java.lang.RuntimeException: java.lang.RuntimeException: Insufficient
>>>>> disk space to flush 17296 bytes
>>>>>         at
>>>>> org.apache.cassandra.utils.WrappedRunnable.run(WrappedRunnable.java:34)
>>>>>         at
>>>>> java.util.concurrent.ThreadPoolExecutor$Worker.runTask(ThreadPoolExecutor.java:886)
>>>>>         at
>>>>> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:908)
>>>>>         at java.lang.Thread.run(Thread.java:619)
>>>>> Caused by: java.lang.RuntimeException: Insufficient disk space to
>>>>> flush 17296 bytes
>>>>>         at
>>>>> org.apache.cassandra.db.ColumnFamilyStore.getFlushPath(ColumnFamilyStore.java:714)
>>>>>         at
>>>>> org.apache.cassandra.db.ColumnFamilyStore.createFlushWriter(ColumnFamilyStore.java:2301)
>>>>>         at
>>>>> org.apache.cassandra.db.Memtable.writeSortedContents(Memtable.java:246)
>>>>>         at
>>>>> org.apache.cassandra.db.Memtable.access$400(Memtable.java:49)
>>>>>         at
>>>>> org.apache.cassandra.db.Memtable$3.runMayThrow(Memtable.java:270)
>>>>>         at
>>>>> org.apache.cassandra.utils.WrappedRunnable.run(WrappedRunnable.java:30)
>>>>>         ... 3 more
>>>>>
>>>>> Checked disk and obviously it's 100% full.
>>>>>
>>>>> How do I recover from this without loosing the data? I've got plenty
>>>>> of space on the other nodes, so I thought of doing a decommission which I
>>>>> understand reassigns ranges to the other nodes and replicates data to 
>>>>> them.
>>>>> After that's done I plan on manually deleting the data on the node and 
>>>>> then
>>>>> joining in the same cluster position with auto-bootstrap turned off so 
>>>>> that
>>>>> I won't get back the old data and I can continue getting new data with the
>>>>> node.
>>>>>
>>>>> Note, I would like to have 4 nodes in because the other three barely
>>>>> take the input load alone. These are just long running tests until I get
>>>>> some better machines.
>>>>>
>>>>> On strange thing I found is that the data folder on the ndoe that
>>>>> filled up the disk is 150 GB (as measured with du) while the data folder 
>>>>> on
>>>>> all other 3 nodes is 50 GB. At the same time, DataStax OpsCenter shows a
>>>>> size of around 50GB for all 4 nodes. I though that the node was making a
>>>>> major compaction at which time it filled up the disk....but even that
>>>>> doesn't make sense because shouldn't a major compaction just be capable of
>>>>> doubling the size, not triple-ing it? Doesn anyone know how to explain 
>>>>> this
>>>>> behavior?
>>>>>
>>>>> Thanks,
>>>>> Alex
>>>>>
>>>>>
>>>>
>>>
>>
>>
>> --
>> Alexandru Dan Sicoe
>> MEng, CERN Marie Curie ACEOLE Fellow
>>
>>
>


-- 
Alexandru Dan Sicoe
MEng, CERN Marie Curie ACEOLE Fellow

Re: Insufficient disk space to flush

Reply via email to