Hi,

I found that our cluster repeats compacting a single file forever
(cassandra 0.7.5). We are wondering if compaction logic is wrong. I'd
like to have comments from you guys.

Situation:
- After trying to repair a column family, our cluster's disk usage is
quite high. Cassandra cannot compact all sstables at once. I think it
repeats compacting single file at the end. (you can check the attached
log below)
- Our data doesn't have deletes. So, the compaction of single file
doesn't make free disk space.

We are approaching to full-disk. But I believe that the repair
operation made a lot of duplicate data on the disk and it requires
compaction. However, most of nodes stuck on compacting a single file.
The only thing we can do is to restart the nodes.

My question is why the compaction doesn't stop.

I looked at the logic in CompactionManager.java:
-----------------
        String compactionFileLocation =
table.getDataFileLocation(cfs.getExpectedCompactedFileSize(sstables));
        // If the compaction file path is null that means we have no
space left for this compaction.
        // try again w/o the largest one.
        List<SSTableReader> smallerSSTables = new
ArrayList<SSTableReader>(sstables);
        while (compactionFileLocation == null && smallerSSTables.size() > 1)
        {
            logger.warn("insufficient space to compact all requested
files " + StringUtils.join(smallerSSTables, ", "));
            smallerSSTables.remove(cfs.getMaxSizeFile(smallerSSTables));
            compactionFileLocation =
table.getDataFileLocation(cfs.getExpectedCompactedFileSize(smallerSSTables));
        }
        if (compactionFileLocation == null)
        {
            logger.error("insufficient space to compact even the two
smallest files, aborting");
            return 0;
        }
-----------------

The while condition: smallerSSTables.size() > 1
Is this should be "smallerSSTables.size() > 2" ?

In my understanding, compaction of single file makes free disk space
only when the sstable has a lot of tombstone and only if the tombstone
is removed in the compaction. If cassandra knows the sstable has
tombstones to be removed, it's worth to compact it. Otherwise, it
might makes free space if you are lucky. In worst case, it leads to
infinite loop like our case.

What do you think the code change?


Best regards,
Shotaro


* Cassandra compaction log
-------------------------
 WARN [CompactionExecutor:1] 2011-04-20 01:03:14,446
CompactionManager.java (line 405) insufficient space to compact all
requested files SSTableReader(
path='foobar-f-3020-Data.db'), SSTableReader(path='foobar-f-3034-Data.db')
 INFO [CompactionExecutor:1] 2011-04-20 03:47:29,833
CompactionManager.java (line 482) Compacted to
foobar-tmp-f-3035-Data.db.  260,646,760,319 to 260,646,760,319 (~100%
of original) bytes for 6,893,896 keys.  Time: 9,855,385ms.

 WARN [CompactionExecutor:1] 2011-04-20 03:48:11,308
CompactionManager.java (line 405) insufficient space to compact all
requested files SSTableReader(path='foobar-f-3020-Data.db'),
SSTableReader(path='foobar-f-3035-Data.db')
 INFO [CompactionExecutor:1] 2011-04-20 06:31:41,193
CompactionManager.java (line 482) Compacted to
foobar-tmp-f-3036-Data.db.  260,646,760,319 to 260,646,760,319 (~100%
of original) bytes for 6,893,896 keys.  Time: 9,809,882ms.

 WARN [CompactionExecutor:1] 2011-04-20 06:32:22,476
CompactionManager.java (line 405) insufficient space to compact all
requested files SSTableReader(path='foobar-f-3020-Data.db'),
SSTableReader(path='foobar-f-3036-Data.db')
 INFO [CompactionExecutor:1] 2011-04-20 09:20:29,903
CompactionManager.java (line 482) Compacted to
foobar-tmp-f-3037-Data.db.  260,646,760,319 to 260,646,760,319 (~100%
of original) bytes for 6,893,896 keys.  Time: 10,087,424ms.
-------------------------
You can see that compacted size is always the same. It repeats
compacting the same single sstable.

Reply via email to