Or we could "reserve" space when starting a compaction. On Wed, May 4, 2011 at 2:32 AM, Terje Marthinussen <tmarthinus...@gmail.com> wrote: > Partially, I guess this may be a side effect of multithreaded compactions? > Before running out of space completely, I do see a few of these: > WARN [CompactionExecutor:448] 2011-05-02 01:08:10,480 > CompactionManager.java (line 516) insufficient space to compact all > requested files SSTableReader(path='/data/cassandra/JP_MALL_P > H/Order-f-12858-Data.db'), > SSTableReader(path='/data/cassandra/JP_MALL_PH/Order-f-12851-Data.db'), > SSTableReader(path='/data/cassandra/JP_MALL_PH/Order-f-12864-Data.db') > INFO [CompactionExecutor:448] 2011-05-02 01:08:10,481 StorageService.java > (line 2066) requesting GC to free disk space > In this case, there would be 24 threads that asked if there was empty disk > space. > Most of them probably succeeded in that request, but they could have > requested 24x available space in theory since I do not think there is any > global pool of used disk in place that manages which how much disk space > will be needed for already started compactions? > Of course, regardless how much checking there is in advance, we could still > run out of disk, so I guess there is also a need for checking if diskspace > is about to run out while compaction runs so things may be halted/aborted. > Unfortunately that would need global coordination so we do not stop all > compaction threads.... > After reducing to 6 compaction threads in 0.8 beta2, the data has compacted > just fine without any disk space issues, so I guess another problem you may > hit as you get a lot of sstables which have updates (that is, duplicates) to > the same data, is that of course, the massively concurrent compaction taking > place with nproc threads could also concurrently duplicate all the > duplicates on a large scale. > Yes, this is in favour of multithreaded compaction as it should normally > help keeping sstables to a sane level and avoid such problems, but it is > unfortunately just a kludge to the real problem which is to segment the > sstables somehow on keyspace so we can get down the disk requirements and > recover from scenarios where disk gets above 50%. > Regards, > Terje > > > On Wed, May 4, 2011 at 3:33 PM, Terje Marthinussen <tmarthinus...@gmail.com> > wrote: >> >> Well, just did not look at these logs very well at all last night >> First out of disk message: >> ERROR [CompactionExecutor:387] 2011-05-02 01:16:01,027 >> AbstractCassandraDaemon.java (line 112) Fatal exception in thread >> Thread[CompactionExecutor:387,1,main] >> java.io.IOException: No space left on device >> Then finally the last one >> ERROR [FlushWriter:128] 2011-05-02 01:51:06,112 >> AbstractCassandraDaemon.java (line 112) Fatal exception in thread >> Thread[FlushWriter:128,5,main] >> java.lang.RuntimeException: java.lang.RuntimeException: Insufficient disk >> space to flush 554962 bytes >> at >> org.apache.cassandra.utils.WrappedRunnable.run(WrappedRunnable.java:34) >> at >> java.util.concurrent.ThreadPoolExecutor$Worker.runTask(ThreadPoolExecutor.java:886) >> at >> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:908) >> at java.lang.Thread.run(Thread.java:662) >> Caused by: java.lang.RuntimeException: Insufficient disk space to flush >> 554962 bytes >> at >> org.apache.cassandra.db.ColumnFamilyStore.getFlushPath(ColumnFamilyStore.java:597) >> at >> org.apache.cassandra.db.ColumnFamilyStore.createFlushWriter(ColumnFamilyStore.java:2100) >> at >> org.apache.cassandra.db.Memtable.writeSortedContents(Memtable.java:239) >> at org.apache.cassandra.db.Memtable.access$400(Memtable.java:50) >> at >> org.apache.cassandra.db.Memtable$3.runMayThrow(Memtable.java:263) >> at >> org.apache.cassandra.utils.WrappedRunnable.run(WrappedRunnable.java:30) >> ... 3 more >> INFO [CompactionExecutor:451] 2011-05-02 01:51:06,113 StorageService.java >> (line 2066) requesting GC to free disk space >> [lots of sstables deleted] >> After this is starts running again (although not fine it seems). >> So the disk seems to have been full for 35 minutes due to un-deleted >> sstables. >> Terje >> On Wed, May 4, 2011 at 6:34 AM, Terje Marthinussen >> <tmarthinus...@gmail.com> wrote: >>> >>> Hm... peculiar. >>> Post flush is not involved in compactions, right? >>> >>> May 2nd >>> 01:06 - Out of disk >>> 01:51 - Starts a mix of major and minor compactions on different column >>> families >>> It then starts a few minor compactions extra over the day, but given that >>> there are more than 1000 sstables, and we are talking 3 minor compactions >>> started, it is not normal I think. >>> May 3rd 1 minor compaction started. >>> When I checked today, there was a bunch of tmp files on the disk with >>> last modify time from 01:something on may 2nd and 200GB empty disk... >>> Definitely no compaction going on. >>> Guess I will add some debug logging and see if I get lucky and run out of >>> disk again. >>> Terje >>> On Wed, May 4, 2011 at 5:06 AM, Jonathan Ellis <jbel...@gmail.com> wrote: >>>> >>>> Compaction does, but flush didn't until >>>> https://issues.apache.org/jira/browse/CASSANDRA-2404 >>>> >>>> On Tue, May 3, 2011 at 2:26 PM, Terje Marthinussen >>>> <tmarthinus...@gmail.com> wrote: >>>> > Yes, I realize that. >>>> > I am bit curious why it ran out of disk, or rather, why I have 200GB >>>> > empty >>>> > disk now, but unfortunately it seems like we may not have had >>>> > monitoring >>>> > enabled on this node to tell me what happened in terms of disk usage. >>>> > I also thought that compaction was supposed to resume (try again with >>>> > less >>>> > data) if it fails? >>>> > Terje >>>> > >>>> > On Wed, May 4, 2011 at 3:50 AM, Jonathan Ellis <jbel...@gmail.com> >>>> > wrote: >>>> >> >>>> >> post flusher is responsible for updating commitlog header after a >>>> >> flush; each task waits for a specific flush to complete, then does >>>> >> its >>>> >> thing. >>>> >> >>>> >> so when you had a flush catastrophically fail, its corresponding >>>> >> post-flush task will be stuck. >>>> >> >>>> >> On Tue, May 3, 2011 at 1:20 PM, Terje Marthinussen >>>> >> <tmarthinus...@gmail.com> wrote: >>>> >> > Just some very tiny amount of writes in the background here (some >>>> >> > hints >>>> >> > spooled up on another node slowly coming in). >>>> >> > No new data. >>>> >> > >>>> >> > I thought there was no exceptions, but I did not look far enough >>>> >> > back in >>>> >> > the >>>> >> > log at first. >>>> >> > Going back a bit further now however, I see that about 50 hours >>>> >> > ago: >>>> >> > ERROR [CompactionExecutor:387] 2011-05-02 01:16:01,027 >>>> >> > AbstractCassandraDaemon.java (line 112) Fatal exception in thread >>>> >> > Thread[CompactionExecutor:387,1,main] >>>> >> > java.io.IOException: No space left on device >>>> >> > at java.io.RandomAccessFile.writeBytes(Native Method) >>>> >> > at >>>> >> > java.io.RandomAccessFile.write(RandomAccessFile.java:466) >>>> >> > at >>>> >> > >>>> >> > >>>> >> > org.apache.cassandra.io.util.BufferedRandomAccessFile.flush(BufferedRandomAccessFile.java:160) >>>> >> > at >>>> >> > >>>> >> > >>>> >> > org.apache.cassandra.io.util.BufferedRandomAccessFile.reBuffer(BufferedRandomAccessFile.java:225) >>>> >> > at >>>> >> > >>>> >> > >>>> >> > org.apache.cassandra.io.util.BufferedRandomAccessFile.writeAtMost(BufferedRandomAccessFile.java:356) >>>> >> > at >>>> >> > >>>> >> > >>>> >> > org.apache.cassandra.io.util.BufferedRandomAccessFile.write(BufferedRandomAccessFile.java:335) >>>> >> > at >>>> >> > >>>> >> > org.apache.cassandra.io.PrecompactedRow.write(PrecompactedRow.java:102) >>>> >> > at >>>> >> > >>>> >> > >>>> >> > org.apache.cassandra.io.sstable.SSTableWriter.append(SSTableWriter.java:130) >>>> >> > at >>>> >> > >>>> >> > >>>> >> > org.apache.cassandra.db.CompactionManager.doCompaction(CompactionManager.java:566) >>>> >> > at >>>> >> > >>>> >> > >>>> >> > org.apache.cassandra.db.CompactionManager$1.call(CompactionManager.java:146) >>>> >> > at >>>> >> > >>>> >> > >>>> >> > org.apache.cassandra.db.CompactionManager$1.call(CompactionManager.java:112) >>>> >> > at >>>> >> > java.util.concurrent.FutureTask$Sync.innerRun(FutureTask.java:303) >>>> >> > at java.util.concurrent.FutureTask.run(FutureTask.java:138) >>>> >> > at >>>> >> > >>>> >> > >>>> >> > java.util.concurrent.ThreadPoolExecutor$Worker.runTask(ThreadPoolExecutor.java:886) >>>> >> > at >>>> >> > >>>> >> > >>>> >> > java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:908) >>>> >> > at java.lang.Thread.run(Thread.java:662) >>>> >> > [followed by a few more of those...] >>>> >> > and then a bunch of these: >>>> >> > ERROR [FlushWriter:123] 2011-05-02 01:21:12,690 >>>> >> > AbstractCassandraDaemon.java >>>> >> > (line 112) Fatal exception in thread Thread[FlushWriter:123,5,main] >>>> >> > java.lang.RuntimeException: java.lang.RuntimeException: >>>> >> > Insufficient >>>> >> > disk >>>> >> > space to flush 40009184 bytes >>>> >> > at >>>> >> > >>>> >> > org.apache.cassandra.utils.WrappedRunnable.run(WrappedRunnable.java:34) >>>> >> > at >>>> >> > >>>> >> > >>>> >> > java.util.concurrent.ThreadPoolExecutor$Worker.runTask(ThreadPoolExecutor.java:886) >>>> >> > at >>>> >> > >>>> >> > >>>> >> > java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:908) >>>> >> > at java.lang.Thread.run(Thread.java:662) >>>> >> > Caused by: java.lang.RuntimeException: Insufficient disk space to >>>> >> > flush >>>> >> > 40009184 bytes >>>> >> > at >>>> >> > >>>> >> > >>>> >> > org.apache.cassandra.db.ColumnFamilyStore.getFlushPath(ColumnFamilyStore.java:597) >>>> >> > at >>>> >> > >>>> >> > >>>> >> > org.apache.cassandra.db.ColumnFamilyStore.createFlushWriter(ColumnFamilyStore.java:2100) >>>> >> > at >>>> >> > >>>> >> > org.apache.cassandra.db.Memtable.writeSortedContents(Memtable.java:239) >>>> >> > at >>>> >> > org.apache.cassandra.db.Memtable.access$400(Memtable.java:50) >>>> >> > at >>>> >> > org.apache.cassandra.db.Memtable$3.runMayThrow(Memtable.java:263) >>>> >> > at >>>> >> > >>>> >> > org.apache.cassandra.utils.WrappedRunnable.run(WrappedRunnable.java:30) >>>> >> > ... 3 more >>>> >> > Seems like compactions stopped after this (a bunch of tmp tables >>>> >> > there >>>> >> > still >>>> >> > from when those errors where generated), and I can only suspect the >>>> >> > post >>>> >> > flusher may have stopped at the same time. >>>> >> > There is 890GB of disk for data, sstables are currently using 604G >>>> >> > (139GB is >>>> >> > old tmp tables from when it ran out of disk) and "ring" tells me >>>> >> > the >>>> >> > load on >>>> >> > the node is 313GB. >>>> >> > Terje >>>> >> > >>>> >> > >>>> >> > On Wed, May 4, 2011 at 3:02 AM, Jonathan Ellis <jbel...@gmail.com> >>>> >> > wrote: >>>> >> >> >>>> >> >> ... and are there any exceptions in the log? >>>> >> >> >>>> >> >> On Tue, May 3, 2011 at 1:01 PM, Jonathan Ellis <jbel...@gmail.com> >>>> >> >> wrote: >>>> >> >> > Does it resolve down to 0 eventually if you stop doing writes? >>>> >> >> > >>>> >> >> > On Tue, May 3, 2011 at 12:56 PM, Terje Marthinussen >>>> >> >> > <tmarthinus...@gmail.com> wrote: >>>> >> >> >> Cassandra 0.8 beta trunk from about 1 week ago: >>>> >> >> >> Pool Name Active Pending Completed >>>> >> >> >> ReadStage 0 0 5 >>>> >> >> >> RequestResponseStage 0 0 87129 >>>> >> >> >> MutationStage 0 0 187298 >>>> >> >> >> ReadRepairStage 0 0 0 >>>> >> >> >> ReplicateOnWriteStage 0 0 0 >>>> >> >> >> GossipStage 0 0 1353524 >>>> >> >> >> AntiEntropyStage 0 0 0 >>>> >> >> >> MigrationStage 0 0 10 >>>> >> >> >> MemtablePostFlusher 1 190 108 >>>> >> >> >> StreamStage 0 0 0 >>>> >> >> >> FlushWriter 0 0 302 >>>> >> >> >> FILEUTILS-DELETE-POOL 0 0 26 >>>> >> >> >> MiscStage 0 0 0 >>>> >> >> >> FlushSorter 0 0 0 >>>> >> >> >> InternalResponseStage 0 0 0 >>>> >> >> >> HintedHandoff 1 4 7 >>>> >> >> >> >>>> >> >> >> Anyone with nice theories about the pending value on the >>>> >> >> >> memtable >>>> >> >> >> post >>>> >> >> >> flusher? >>>> >> >> >> Regards, >>>> >> >> >> Terje >>>> >> >> > >>>> >> >> > >>>> >> >> > >>>> >> >> > -- >>>> >> >> > Jonathan Ellis >>>> >> >> > Project Chair, Apache Cassandra >>>> >> >> > co-founder of DataStax, the source for professional Cassandra >>>> >> >> > support >>>> >> >> > http://www.datastax.com >>>> >> >> > >>>> >> >> >>>> >> >> >>>> >> >> >>>> >> >> -- >>>> >> >> Jonathan Ellis >>>> >> >> Project Chair, Apache Cassandra >>>> >> >> co-founder of DataStax, the source for professional Cassandra >>>> >> >> support >>>> >> >> http://www.datastax.com >>>> >> > >>>> >> > >>>> >> >>>> >> >>>> >> >>>> >> -- >>>> >> Jonathan Ellis >>>> >> Project Chair, Apache Cassandra >>>> >> co-founder of DataStax, the source for professional Cassandra support >>>> >> http://www.datastax.com >>>> > >>>> > >>>> >>>> >>>> >>>> -- >>>> Jonathan Ellis >>>> Project Chair, Apache Cassandra >>>> co-founder of DataStax, the source for professional Cassandra support >>>> http://www.datastax.com >>> >> > >
-- Jonathan Ellis Project Chair, Apache Cassandra co-founder of DataStax, the source for professional Cassandra support http://www.datastax.com