Given the feedback here and on the ticket, I've written up a proposal for a repair sidecar tool <https://docs.google.com/document/d/1RV4rOrG1gwlD5IljmrIq_t45rz7H3xs9GbFSEyGzEtM/edit#heading=h.5f10ng8gzle8> in the ticket's design document. If there are no major concerns we're going to start working on porting the Priam implementation into this new tool soon.
-Joey On Tue, Apr 10, 2018 at 4:17 PM, Elliott Sims <elli...@backblaze.com> wrote: > My two cents as a (relatively small) user. I'm coming at this from the > ops/user side, so my apologies if some of these don't make sense based on a > more detailed understanding of the codebase: > > Repair is definitely a major missing piece of Cassandra. Integrated would > be easier, but a sidecar might be more flexible. As an intermediate step > that works towards both options, does it make sense to start with > finer-grained tracking and reporting for subrange repairs? That is, expose > a set of interfaces (both internally and via JMX) that give a scheduler > enough information to run subrange repairs across multiple keyspaces or > even non-overlapping ranges at the same time. That lets people experiment > with and quickly/safely/easily iterate on different scheduling strategies > in the short term, and long-term those strategies can be integrated into a > built-in scheduler > > On the subject of scheduling, I think adjusting parallelism/aggression with > a possible whitelist or blacklist would be a lot more useful than a "time > between repairs". That is, if repairs run for a few hours then don't run > for a few (somewhat hard-to-predict) hours, I still have to size the > cluster for the load when the repairs are running. The only reason I can > think of for an interval between repairs is to allow re-compaction from > repair anticompactions, and subrange repairs seem to eliminate this. Even > if they didn't, a more direct method along the lines of "don't repair when > the compaction queue is too long" might make more sense. Blacklisted > timeslots might be useful for avoiding peak time or batch jobs, but only if > they can be specified for consistent time-of-day intervals instead of > unpredictable lulls between repairs. > > I really like the idea of automatically adjusting gc_grace_seconds based on > repair state. The only_purge_repaired_tombstones option fixes this > elegantly for sequential/incremental repairs on STCS, but not for subrange > repairs or LCS (unless a scheduler gains the ability somehow to determine > that every subrange in an sstable has been repaired and mark it > accordingly?) > > > On 2018/04/03 17:48:14, Blake Eggleston <b...@apple.com> wrote: > > Hi dev@,> > > > > > > > > > The question of the best way to schedule repairs came up on > CASSANDRA-14346, and I thought it would be good to bring up the idea of an > external tool on the dev list.> > > > > > > > > > Cassandra lacks any sort of tools for automating routine tasks that are > required for running clusters, specifically repair. Regular repair is a > must for most clusters, like compaction. This means that, especially as far > as eventual consistency is concerned, Cassandra isn’t totally functional > out of the box. Operators either need to find a 3rd party solution or > implement one themselves. Adding this to Cassandra would make it easier to > use.> > > > > > > > > > Is this something we should be doing? If so, what should it look like?> > > > > > > > > > Personally, I feel like this is a pretty big gap in the project and would > like to see an out of process tool offered. Ideally, Cassandra would just > take care of itself, but writing a distributed repair scheduler that you > trust to run in production is a lot harder than writing a single process > management application that can failover.> > > > > > > > > > Any thoughts on this?> > > > > > > > > > Thanks,> > > > > > > > > > Blake> > > > > >