In fact all of them say Repaired at: 0. On Tue, Aug 7, 2018 at 9:13 PM Brian Spindler <brian.spind...@gmail.com> wrote:
> Hi, I spot checked a couple of the files that were ~200MB and the mostly > had "Repaired at: 0" so maybe that's not it? > > -B > > > On Tue, Aug 7, 2018 at 8:16 PM <brian.spind...@gmail.com> wrote: > >> Everything is ttl’d >> >> I suppose I could use sstablemeta to see the repaired bit, could I just >> set that to unrepaired somehow and that would fix? >> >> Thanks! >> >> On Aug 7, 2018, at 8:12 PM, Jeff Jirsa <jji...@gmail.com> wrote: >> >> May be worth seeing if any of the sstables got promoted to repaired - if >> so they’re not eligible for compaction with unrepaired sstables and that >> could explain some higher counts >> >> Do you actually do deletes or is everything ttl’d? >> >> >> -- >> Jeff Jirsa >> >> >> On Aug 7, 2018, at 5:09 PM, Brian Spindler <brian.spind...@gmail.com> >> wrote: >> >> Hi Jeff, mostly lots of little files, like there will be 4-5 that are >> 1-1.5gb or so and then many at 5-50MB and many at 40-50MB each. >> >> Re incremental repair; Yes one of my engineers started an incremental >> repair on this column family that we had to abort. In fact, the node that >> the repair was initiated on ran out of disk space and we ended replacing >> that node like a dead node. >> >> Oddly the new node is experiencing this issue as well. >> >> -B >> >> >> On Tue, Aug 7, 2018 at 8:04 PM Jeff Jirsa <jji...@gmail.com> wrote: >> >>> You could toggle off the tombstone compaction to see if that helps, but >>> that should be lower priority than normal compactions >>> >>> Are the lots-of-little-files from memtable flushes or >>> repair/anticompaction? >>> >>> Do you do normal deletes? Did you try to run Incremental repair? >>> >>> -- >>> Jeff Jirsa >>> >>> >>> On Aug 7, 2018, at 5:00 PM, Brian Spindler <brian.spind...@gmail.com> >>> wrote: >>> >>> Hi Jonathan, both I believe. >>> >>> The window size is 1 day, full settings: >>> AND compaction = {'timestamp_resolution': 'MILLISECONDS', >>> 'unchecked_tombstone_compaction': 'true', 'compaction_window_size': '1', >>> 'compaction_window_unit': 'DAYS', 'tombstone_compaction_interval': '86400', >>> 'tombstone_threshold': '0.2', 'class': >>> 'com.jeffjirsa.cassandra.db.compaction.TimeWindowCompactionStrategy'} >>> >>> >>> nodetool tpstats >>> >>> Pool Name Active Pending Completed Blocked >>> All time blocked >>> MutationStage 0 0 68582241832 0 >>> 0 >>> ReadStage 0 0 209566303 0 >>> 0 >>> RequestResponseStage 0 0 44680860850 0 >>> 0 >>> ReadRepairStage 0 0 24562722 0 >>> 0 >>> CounterMutationStage 0 0 0 0 >>> 0 >>> MiscStage 0 0 0 0 >>> 0 >>> HintedHandoff 1 1 203 0 >>> 0 >>> GossipStage 0 0 8471784 0 >>> 0 >>> CacheCleanupExecutor 0 0 122 0 >>> 0 >>> InternalResponseStage 0 0 552125 0 >>> 0 >>> CommitLogArchiver 0 0 0 0 >>> 0 >>> CompactionExecutor 8 42 1433715 0 >>> 0 >>> ValidationExecutor 0 0 2521 0 >>> 0 >>> MigrationStage 0 0 527549 0 >>> 0 >>> AntiEntropyStage 0 0 7697 0 >>> 0 >>> PendingRangeCalculator 0 0 17 0 >>> 0 >>> Sampler 0 0 0 0 >>> 0 >>> MemtableFlushWriter 0 0 116966 0 >>> 0 >>> MemtablePostFlush 0 0 209103 0 >>> 0 >>> MemtableReclaimMemory 0 0 116966 0 >>> 0 >>> Native-Transport-Requests 1 0 1715937778 0 >>> 176262 >>> >>> Message type Dropped >>> READ 2 >>> RANGE_SLICE 0 >>> _TRACE 0 >>> MUTATION 4390 >>> COUNTER_MUTATION 0 >>> BINARY 0 >>> REQUEST_RESPONSE 1882 >>> PAGED_RANGE 0 >>> READ_REPAIR 0 >>> >>> >>> On Tue, Aug 7, 2018 at 7:57 PM Jonathan Haddad <j...@jonhaddad.com> >>> wrote: >>> >>>> What's your window size? >>>> >>>> When you say backed up, how are you measuring that? Are there pending >>>> tasks or do you just see more files than you expect? >>>> >>>> On Tue, Aug 7, 2018 at 4:38 PM Brian Spindler <brian.spind...@gmail.com> >>>> wrote: >>>> >>>>> Hey guys, quick question: >>>>> >>>>> I've got a v2.1 cassandra cluster, 12 nodes on aws i3.2xl, commit log >>>>> on one drive, data on nvme. That was working very well, it's a ts db and >>>>> has been accumulating data for about 4weeks. >>>>> >>>>> The nodes have increased in load and compaction seems to be falling >>>>> behind. I used to get about 1 file per day for this column family, about >>>>> ~30GB Data.db file per day. I am now getting hundreds per day at 1mb - >>>>> 50mb. >>>>> >>>>> How to recover from this? >>>>> >>>>> I can scale out to give some breathing room but will it go back and >>>>> compact the old days into nicely packed files for the day? >>>>> >>>>> I tried setting compaction throughput to 1000 from 256 and it seemed >>>>> to make things worse for the CPU, it's configured on i3.2xl with 8 >>>>> compaction threads. >>>>> >>>>> -B >>>>> >>>>> Lastly, I have mixed TTLs in this CF and need to run a repair (I >>>>> think) to get rid of old tombstones, however running repairs in 2.1 on >>>>> TWCS >>>>> column families causes a very large spike in sstable counts due to >>>>> anti-compaction which causes a lot of disruption, is there any other way? >>>>> >>>>> >>>>> >>>> >>>> -- >>>> Jon Haddad >>>> http://www.rustyrazorblade.com >>>> twitter: rustyrazorblade >>>> >>>