Re: (unofficial) Community Poll for Production Operators : Repair

Robert Coli Wed, 15 May 2013 14:33:16 -0700

On Wed, May 15, 2013 at 1:27 AM, Alain RODRIGUEZ <arodr...@gmail.com> wrote:
> Rob, I was wondering something. Are you a commiter working on improving the
> repair or something similar ?


I am not a committer [1], but I have an active interest in potential
improvements to the best practices for repair. The specific change
that I am considering is a modification to the default
gc_grace_seconds value, which seems picked out of a hat at 10 days. My
view is that the current implementation of repair has such negative
performance consequences that I do not believe that holding onto
tombstones for longer than 10 days could possibly be as bad as the
fixed cost of running repair once every 10 days. I believe that this
value is too low for a default (it also does not map cleanly to the
work week!) and likely should be increased to 14, 21 or 28 days.

> Anyway, if a commiter (or any other expert) could give us some feedback on
> our comments (Are we doing well or not, whether things we observe are normal
> or unexplained, what is going to be improved in the future about repair...)

1) you are doing things according to best practice
2) unfortunately your experience with significantly degraded
performance, including a blocked go-live due to repair bloat is pretty
typical
3) the things you are experiencing are part of the current
implementation of repair and are also typical, however I do not
believe they are fully "explained" [2]
4) as has been mentioned further down thread, there are discussions
regarding (and some already committed) improvements to both the
current repair paradigm and an evolution to a new paradigm

Thanks to all for the responses so far, please keep them coming! :D

=Rob
[1] hence the (unofficial) tag for this thread. I do have minor
patches accepted to the codebase, but always merged by an actual
committer. :)
[2] driftx@#cassandra feels that these things are explained/understood
by core team, and points to
https://issues.apache.org/jira/browse/CASSANDRA-5280 as a useful
approach to minimize same.

Re: (unofficial) Community Poll for Production Operators : Repair

Reply via email to