Want to check if you are talking about minor compactions or major (nodetool) compactions. What settings compaction settings do you have for this CF ? You can increase the min compaction threshold and reduce the frequency of compactions http://wiki.apache.org/cassandra/StorageConfiguration It seems like compaction is running continually, are their pending tasks in the o.a.c.db.CompactionManager MBean ? How bad is you disk space problem ?
For the code change, AFAIK it's not possible for cassandra to know if there are tombstones in the SSTable which can be purged until the rows are read. Perhaps the file could hold the earliest deleted at time somewhere (same for TTL), but I do not think we do that now. Hope that helps. Aaron On 20 Apr 2011, at 21:25, Shotaro Kamio wrote: > Hi, > > I found that our cluster repeats compacting a single file forever > (cassandra 0.7.5). We are wondering if compaction logic is wrong. I'd > like to have comments from you guys. > > Situation: > - After trying to repair a column family, our cluster's disk usage is > quite high. Cassandra cannot compact all sstables at once. I think it > repeats compacting single file at the end. (you can check the attached > log below) > - Our data doesn't have deletes. So, the compaction of single file > doesn't make free disk space. > > We are approaching to full-disk. But I believe that the repair > operation made a lot of duplicate data on the disk and it requires > compaction. However, most of nodes stuck on compacting a single file. > The only thing we can do is to restart the nodes. > > My question is why the compaction doesn't stop. > > I looked at the logic in CompactionManager.java: > ----------------- > String compactionFileLocation = > table.getDataFileLocation(cfs.getExpectedCompactedFileSize(sstables)); > // If the compaction file path is null that means we have no > space left for this compaction. > // try again w/o the largest one. > List<SSTableReader> smallerSSTables = new > ArrayList<SSTableReader>(sstables); > while (compactionFileLocation == null && smallerSSTables.size() > 1) > { > logger.warn("insufficient space to compact all requested > files " + StringUtils.join(smallerSSTables, ", ")); > smallerSSTables.remove(cfs.getMaxSizeFile(smallerSSTables)); > compactionFileLocation = > table.getDataFileLocation(cfs.getExpectedCompactedFileSize(smallerSSTables)); > } > if (compactionFileLocation == null) > { > logger.error("insufficient space to compact even the two > smallest files, aborting"); > return 0; > } > ----------------- > > The while condition: smallerSSTables.size() > 1 > Is this should be "smallerSSTables.size() > 2" ? > > In my understanding, compaction of single file makes free disk space > only when the sstable has a lot of tombstone and only if the tombstone > is removed in the compaction. If cassandra knows the sstable has > tombstones to be removed, it's worth to compact it. Otherwise, it > might makes free space if you are lucky. In worst case, it leads to > infinite loop like our case. > > What do you think the code change? > > > Best regards, > Shotaro > > > * Cassandra compaction log > ------------------------- > WARN [CompactionExecutor:1] 2011-04-20 01:03:14,446 > CompactionManager.java (line 405) insufficient space to compact all > requested files SSTableReader( > path='foobar-f-3020-Data.db'), SSTableReader(path='foobar-f-3034-Data.db') > INFO [CompactionExecutor:1] 2011-04-20 03:47:29,833 > CompactionManager.java (line 482) Compacted to > foobar-tmp-f-3035-Data.db. 260,646,760,319 to 260,646,760,319 (~100% > of original) bytes for 6,893,896 keys. Time: 9,855,385ms. > > WARN [CompactionExecutor:1] 2011-04-20 03:48:11,308 > CompactionManager.java (line 405) insufficient space to compact all > requested files SSTableReader(path='foobar-f-3020-Data.db'), > SSTableReader(path='foobar-f-3035-Data.db') > INFO [CompactionExecutor:1] 2011-04-20 06:31:41,193 > CompactionManager.java (line 482) Compacted to > foobar-tmp-f-3036-Data.db. 260,646,760,319 to 260,646,760,319 (~100% > of original) bytes for 6,893,896 keys. Time: 9,809,882ms. > > WARN [CompactionExecutor:1] 2011-04-20 06:32:22,476 > CompactionManager.java (line 405) insufficient space to compact all > requested files SSTableReader(path='foobar-f-3020-Data.db'), > SSTableReader(path='foobar-f-3036-Data.db') > INFO [CompactionExecutor:1] 2011-04-20 09:20:29,903 > CompactionManager.java (line 482) Compacted to > foobar-tmp-f-3037-Data.db. 260,646,760,319 to 260,646,760,319 (~100% > of original) bytes for 6,893,896 keys. Time: 10,087,424ms. > ------------------------- > You can see that compacted size is always the same. It repeats > compacting the same single sstable.