On Fri, Jan 23, 2015 at 10:03 AM, Robert Wille <rwi...@fold3.com> wrote:
> The docs say "Use -pr to repair only the first range returned by the > partitioner”. What does this mean? Why would I only want to repair the > first range? > If you're repairing the whole cluster, repairing only the primary range on each node avoids avoiding once per replication factor. > What are the tradeoffs of a parallel versus serial repair? > Parallel repair affects all replicas simultaneously and can thereby degrade latency for that replica set. Serial repair doesn't, but is serial and intensely slower. Serial repair is probably not usable at all with RF>5 or so, unless you set an extremely long gc_grace_seconds. > What are the recommended options for regular, periodic repair? > (Snapshot/incremental repair, default IIRC in newer Cassandra, changes many of these assumptions. I refer to "old-style" nodetool repair with my statements.) The canonical response is repair the entire cluster with -pr once per gc_grace_seconds. Regarding frequent repair... consider your RF, CL and whether you actually care about consistency and durability for any given colunfamily. If you never do DELETE-like-operations (in CQL, this includes things other than DELETE statements) in the CF, probably don't repair it just for consistency purposes. Then, consider how long you can tolerate DELETEd data sticking around. If you can tolerate it because you don't DELETE much data, set gc_grace_seconds to at least 34 days. With 34 days, you can begin a repair on the first of the month and have between 3 and 7 days for it to complete. You repair for up to a few days in order to repair a month's data. With shorter repair cycles, you pay the relatively high cost of repair repeatedly. Last, consider your Cassandra version. Newer versions have had significant focus on streaming and repair stability and performance. Upgrade to the HEAD of 2.0.x if possible. There's this thing I jokingly call the Coli Conjecture, which says that if you're in a good case for Cassandra you probably don't actually don't care about consistency or durability, even if you think you do. This comes from years of observing consistency edge cases in Cassandra and noticing that even very few people who detected them and reported them seemed to experience very negative results from the perspective of their application. I think it is an interesting observation and a different mindset for many people coming from the non-distributed, normalized, relational world. =Rob