What is the recommendation on the number of tokens value? I am asking because of the issue with sequential repairs on token range after token range.
Rahul Neelakantan > On Sep 29, 2014, at 2:29 PM, Robert Coli <rc...@eventbrite.com> wrote: > >> On Fri, Sep 26, 2014 at 9:52 AM, Gene Robichaux <gene.robich...@match.com> >> wrote: >> I am fairly new to Cassandra. We have a 9 node cluster, 5 in one DC and 4 in >> another. >> >> >> >> Running a repair on a large column family seems to be moving much slower >> than I expect. >> > > Unfortunately, as others have mentioned, the slowness/broken-ness of repair > is a long running (groan!) issue and therefore currently expected. > > At this time, I do not recommend upgrading to 2.1 in production to attempt to > fix it. I am also broadly skeptical that it as fixed in 2.1 as all that. > > Once can increase gc_grace_seconds to 34 days [1] and repair once a month, > which should help make repair slightly more tractable. > > For now you should probably evaluate which of your column families you > *absolutely must* repair (because you do DELETE like operations in them, > etc.) and only repair those. > > As an aside, you "just lose" with vnodes and clusters of the size. I presume > you plan to grow over appx 9 nodes per DC, in which case you probably do want > vnodes enabled. > > One note : >> Looking at nodetool compaction stats it indicates the Validation phase is >> running that the total bytes is 4.5T (4505336278756). > > This is the uncompressed size, I'm betting your actual on disk size is closer > to 2T? Even though 2.0 has improved performance for nodes with lots of data, > 2T per node is still relatively "fat" for a Cassandra node. > > > =Rob > [1] https://issues.apache.org/jira/browse/CASSANDRA-5850