Hi,

(cassandra 1.0.8)

Stumbled on a piece of code in Memtable that looks like it could hang a thread forever.

public void flushAndSignal(final CountDownLatch latch, ExecutorService writer, final ReplayPosition context)
    {
        writer.execute(new WrappedRunnable()
        {
            public void runMayThrow() throws IOException
            {
                cfs.flushLock.lock();
                try
                {
                    if (!cfs.isDropped())
                    {
SSTableReader sstable = writeSortedContents(context);
                        cfs.replaceFlushed(Memtable.this, sstable);
                    }
                }
                finally
                {
                    cfs.flushLock.unlock();
                }
                latch.countDown();
            }
        });
    }

Given an IOException in writeSortedContents the latch.countDown() will not be called. Wouldn't it be better to place the latch.countDown() in the finally statement? We've had issues with IOExceptions in writeSortedContents when doing a snapshot which hung a thread (and still hangs) for 4 days. The thread hangs in ColumnFamilyStore.forceBlockingFlush waiting for future.get() because the latch.await() in ColumnFamilyStore.maybeSwitchMemtable never completes.

Regards

--
Mikael Wikblom
Software Architect
SiteVision AB
019-217058
mikael.wikb...@sitevision.se
http://www.sitevision.se

Reply via email to