On Mon, Jul 7, 2014 at 9:13 AM, John Sanda <john.sa...@gmail.com> wrote:
> I have a write-heavy table that is using size tiered compaction. I am > running C* 1.2.9. There is an SSTable that is not getting compacted. It is > disproportionately larger than the other SSTables. The data file sizes are, > > 1.70 GB > 0.18 GB > 0.16 GB > 0.05 GB > 8.61 GB > > If I set the bucket_high compaction property on the table to a > sufficiently large value, will the 8.61 GB get compacted? What if any > drawbacks are there to increasing the bucket_high property? > > In what scenarios could I wind up with such a disproportionately large > SSTable like this? One thing that comes to mind is major compactions, but I > have not that. > First, it is very typical for there to be One Larger File in Size Tiered Compaction.. in a slightly glib summary, that's what makes it Size Tiered. As to causes of your specific case, row fragmentation such that old row fragments are always in this file? You could verify if this was the case by running a major compaction on the CF. Given your small data sizes, you will almost certainly Just Win from doing so. However in your case, I would also start by checking that this file is actually live. Cassandra around your version can sometimes leave spurious dead SSTables in the data directory. Dead files don't get compacted. =Rob