stays consistently in the 40-60 range, but only recent tables are being compacted.
What I fear is that TWCS when it hits a certain compaction threshold keeps compacting the same tables adding a slice of the most recently flushed data and falls behind. I'd rather it compacted fragments of sstable files from the same bucket together rather than constantly append to the same sstable But that assumption is based on a superficial examination of the compactor code. On Tue, Jul 16, 2019 at 12:47 AM Oleksandr Shulgin < oleksandr.shul...@zalando.de> wrote: > On Mon, Jul 15, 2019 at 6:20 PM Carl Mueller > <carl.muel...@smartthings.com.invalid> wrote: > >> Related to our overstreaming, we have a cluster of about 25 nodes, with >> most at about 1000 sstable files (Data + others). >> >> And about four that are at 20,000 - 30,000 sstable files (Data+Index+etc). >> >> We have vertically scaled the outlier machines and turned off compaction >> throttling thinking it was compaction that couldn't keep up. That >> stabilized the growth, but the sstable count is not going down. >> >> The TWCS code seems to highly bias towards "recent" tables for >> compaction. We figured we'd boost the throughput/compactors and that would >> solve the more recent ones, and the older ones would fall off. But the >> number of sstables has remained high on a daily basis on the couple "bad >> nodes". >> >> Is this simply a lack of sufficient compaction throughput? Is there >> something in TWCS that would force frequent flushing more than normal? >> > > What does nodetool compactionstats says about pending compaction tasks on > the affected nodes with the high number of files? > > Regards, > -- > Alex > >