Ok, so my problem persisted. On the node that is filling up the harddisk, I
have a 230 GB disk. Right after I restart the node I it deletes tmp files
and reaches 55GB of data on disk. Then it start to quickly fill up the disk
- I see gigs added fast - it's not real data because other nodes don't have
this.

While all this is happening I am seeing the node do a minor compaction of
the main data CF but extremely slowly. Today I saw the error:

ERROR 09:44:57,605 Fatal exception in thread
Thread[CompactionExecutor:15,1,main]

java.io.IOException: File too
large

        at java.io.RandomAccessFile.writeBytes(Native
Method)

        at
java.io.RandomAccessFile.write(RandomAccessFile.java:466)

        at
org.apache.cassandra.io.util.BufferedRandomAccessFile.flush(BufferedRandomAccessFile.java:168)

        at
org.apache.cassandra.io.util.BufferedRandomAccessFile.reBuffer(BufferedRandomAccessFile.java:242)

        at
org.apache.cassandra.io.util.BufferedRandomAccessFile.writeAtMost(BufferedRandomAccessFile.java:369)

        at
org.apache.cassandra.io.util.BufferedRandomAccessFile.write(BufferedRandomAccessFile.java:348)

        at
org.apache.cassandra.db.compaction.PrecompactedRow.write(PrecompactedRow.java:114)

        at
org.apache.cassandra.io.sstable.SSTableWriter.append(SSTableWriter.java:132)

        at
org.apache.cassandra.db.compaction.CompactionManager.doCompactionWithoutSizeEstimation(CompactionManager.java:576)

        at
org.apache.cassandra.db.compaction.CompactionManager.doCompaction(CompactionManager.java:507)

        at
org.apache.cassandra.db.compaction.CompactionManager$1.call(CompactionManager.java:142)

        at
org.apache.cassandra.db.compaction.CompactionManager$1.call(CompactionManager.java:108)

        at
java.util.concurrent.FutureTask$Sync.innerRun(FutureTask.java:303)

        at
java.util.concurrent.FutureTask.run(FutureTask.java:138)

        at
java.util.concurrent.ThreadPoolExecutor$Worker.runTask(ThreadPoolExecutor.java:886)

        at
java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:908)

        at java.lang.Thread.run(Thread.java:619)

which means that it cannot finish that compaction because it hit the max
file size. So I checked the file system and block size and I got ext3 and
1K which means that the max file size is 16GB.

I didn't know what to do in this case so I just decommisioned the node.

Is there a way to get around this max file limit? Is there some Cassandra
configuration that helps avoid this? I'm asking here because I couldn't
find anything in the documentation about that.

I'm waiting for new machines to run Cassandra on....what file systems are
people using?

Cheers,
Alex



On Thu, Dec 1, 2011 at 10:08 PM, Jahangir Mohammed
<md.jahangi...@gmail.com>wrote:

> Yes, mostly sounds like it. In our case failed repairs were causing
> accumulation of the tmp files.
>
> Thanks,
> Jahangir Mohammed.
>
> On Thu, Dec 1, 2011 at 2:43 PM, Alexandru Dan Sicoe <
> sicoe.alexan...@googlemail.com> wrote:
>
>> Hi Jeremiah,
>>  My commitlog was indeed on another disk. I did what you said and yes the
>> node restart brings back the disk size to the around 50 GB I was expecting.
>> Still I do not understand how the node managed to get itself in the
>> situation of having these tmp files? Could you clarify what these are, how
>> they are produced and why? I've tried to find a clear definition but all I
>> could come up with is hints that they are produced during compaction. I
>> also found a thread that described a similar problem:
>>
>> http://cassandra-user-incubator-apache-org.3065146.n2.nabble.com/Errors-During-Compaction-td5953493.html
>> as described there it seems like compaction fails and tmp files don't get
>> cleaned up until they fill the disk. Is this what happened in my case?
>> Compactions did not finish properly because the disk utilization was more
>> than half and then more and more files tmp started getting accumulated at
>> each other attempt. The Cassandra log would indicate this because I get
>> many of these:
>> ERROR [CompactionExecutor:22850] 2011-12-01 04:12:15,200
>> CompactionManager.java (line 513) insufficie
>> nt space to compact even the two smallest files, aborting
>>
>> before I started getting many of these:
>> ERROR [FlushWriter:283] 2011-12-01 04:12:22,917
>> AbstractCassandraDaemon.java (line 139) Fatal exception in thread
>> Thread[FlushWriter:283,5,main] java.lang.RuntimeException:
>> java.lang.RuntimeException: Insufficient disk space to flush 42531 bytes
>>
>> I just want to clearly understand what happened.
>>
>> Thanks,
>> Alex
>>
>>
>> On Thu, Dec 1, 2011 at 6:58 PM, Jeremiah Jordan <
>> jeremiah.jor...@morningstar.com> wrote:
>>
>>>  If you are writing data with QUORUM or ALL you should be safe to
>>> restart cassandra on that node.  If the extra space is all from *tmp* files
>>> from compaction they will get deleted at startup.  You will then need to
>>> run repair on that node to get back any data that was missed while it was
>>> full.  If your commit log was on a different device you may not even have
>>> lost much.
>>>
>>> -Jeremiah
>>>
>>>
>>> On 12/01/2011 04:16 AM, Alexandru Dan Sicoe wrote:
>>>
>>> Hello everyone,
>>>  4 node Cassandra 0.8.5 cluster with RF =2.
>>>  One node started throwing exceptions in its log:
>>>
>>> ERROR 10:02:46,837 Fatal exception in thread
>>> Thread[FlushWriter:1317,5,main]
>>> java.lang.RuntimeException: java.lang.RuntimeException: Insufficient
>>> disk space to flush 17296 bytes
>>>         at
>>> org.apache.cassandra.utils.WrappedRunnable.run(WrappedRunnable.java:34)
>>>         at
>>> java.util.concurrent.ThreadPoolExecutor$Worker.runTask(ThreadPoolExecutor.java:886)
>>>         at
>>> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:908)
>>>         at java.lang.Thread.run(Thread.java:619)
>>> Caused by: java.lang.RuntimeException: Insufficient disk space to flush
>>> 17296 bytes
>>>         at
>>> org.apache.cassandra.db.ColumnFamilyStore.getFlushPath(ColumnFamilyStore.java:714)
>>>         at
>>> org.apache.cassandra.db.ColumnFamilyStore.createFlushWriter(ColumnFamilyStore.java:2301)
>>>         at
>>> org.apache.cassandra.db.Memtable.writeSortedContents(Memtable.java:246)
>>>         at org.apache.cassandra.db.Memtable.access$400(Memtable.java:49)
>>>         at
>>> org.apache.cassandra.db.Memtable$3.runMayThrow(Memtable.java:270)
>>>         at
>>> org.apache.cassandra.utils.WrappedRunnable.run(WrappedRunnable.java:30)
>>>         ... 3 more
>>>
>>> Checked disk and obviously it's 100% full.
>>>
>>> How do I recover from this without loosing the data? I've got plenty of
>>> space on the other nodes, so I thought of doing a decommission which I
>>> understand reassigns ranges to the other nodes and replicates data to them.
>>> After that's done I plan on manually deleting the data on the node and then
>>> joining in the same cluster position with auto-bootstrap turned off so that
>>> I won't get back the old data and I can continue getting new data with the
>>> node.
>>>
>>> Note, I would like to have 4 nodes in because the other three barely
>>> take the input load alone. These are just long running tests until I get
>>> some better machines.
>>>
>>> On strange thing I found is that the data folder on the ndoe that filled
>>> up the disk is 150 GB (as measured with du) while the data folder on all
>>> other 3 nodes is 50 GB. At the same time, DataStax OpsCenter shows a size
>>> of around 50GB for all 4 nodes. I though that the node was making a major
>>> compaction at which time it filled up the disk....but even that doesn't
>>> make sense because shouldn't a major compaction just be capable of doubling
>>> the size, not triple-ing it? Doesn anyone know how to explain this behavior?
>>>
>>> Thanks,
>>> Alex
>>>
>>>
>>
>


-- 
Alexandru Dan Sicoe
MEng, CERN Marie Curie ACEOLE Fellow

Reply via email to