Given the feedback here and on the ticket, I've written up a proposal
for a repair
sidecar tool
<https://docs.google.com/document/d/1RV4rOrG1gwlD5IljmrIq_t45rz7H3xs9GbFSEyGzEtM/edit#heading=h.5f10ng8gzle8>
in the ticket's design document. If there are no major concerns we're going
to start working on porting the Priam implementation into this new tool
soon.

-Joey

On Tue, Apr 10, 2018 at 4:17 PM, Elliott Sims <elli...@backblaze.com> wrote:

> My two cents as a (relatively small) user.  I'm coming at this from the
> ops/user side, so my apologies if some of these don't make sense based on a
> more detailed understanding of the codebase:
>
> Repair is definitely a major missing piece of Cassandra.  Integrated would
> be easier, but a sidecar might be more flexible.  As an intermediate step
> that works towards both options, does it make sense to start with
> finer-grained tracking and reporting for subrange repairs?  That is, expose
> a set of interfaces (both internally and via JMX) that give a scheduler
> enough information to run subrange repairs across multiple keyspaces or
> even non-overlapping ranges at the same time.  That lets people experiment
> with and quickly/safely/easily iterate on different scheduling strategies
> in the short term, and long-term those strategies can be integrated into a
> built-in scheduler
>
> On the subject of scheduling, I think adjusting parallelism/aggression with
> a possible whitelist or blacklist would be a lot more useful than a "time
> between repairs".  That is, if repairs run for a few hours then don't run
> for a few (somewhat hard-to-predict) hours, I still have to size the
> cluster for the load when the repairs are running.   The only reason I can
> think of for an interval between repairs is to allow re-compaction from
> repair anticompactions, and subrange repairs seem to eliminate this.  Even
> if they didn't, a more direct method along the lines of "don't repair when
> the compaction queue is too long" might make more sense.  Blacklisted
> timeslots might be useful for avoiding peak time or batch jobs, but only if
> they can be specified for consistent time-of-day intervals instead of
> unpredictable lulls between repairs.
>
> I really like the idea of automatically adjusting gc_grace_seconds based on
> repair state.  The only_purge_repaired_tombstones option fixes this
> elegantly for sequential/incremental repairs on STCS, but not for subrange
> repairs or LCS (unless a scheduler gains the ability somehow to determine
> that every subrange in an sstable has been repaired and mark it
> accordingly?)
>
>
> On 2018/04/03 17:48:14, Blake Eggleston <b...@apple.com> wrote:
> > Hi dev@,>
> >
> >  >
> >
> > The question of the best way to schedule repairs came up on
> CASSANDRA-14346, and I thought it would be good to bring up the idea of an
> external tool on the dev list.>
> >
> >  >
> >
> > Cassandra lacks any sort of tools for automating routine tasks that are
> required for running clusters, specifically repair. Regular repair is a
> must for most clusters, like compaction. This means that, especially as far
> as eventual consistency is concerned, Cassandra isn’t totally functional
> out of the box. Operators either need to find a 3rd party solution or
> implement one themselves. Adding this to Cassandra would make it easier to
> use.>
> >
> >  >
> >
> > Is this something we should be doing? If so, what should it look like?>
> >
> >  >
> >
> > Personally, I feel like this is a pretty big gap in the project and would
> like to see an out of process tool offered. Ideally, Cassandra would just
> take care of itself, but writing a distributed repair scheduler that you
> trust to run in production is a lot harder than writing a single process
> management application that can failover.>
> >
> >  >
> >
> > Any thoughts on this?>
> >
> >  >
> >
> > Thanks,>
> >
> >  >
> >
> > Blake>
> >
> >
>

Reply via email to