> How come a node would consume 5x its normal data size during the repair > process?
https://issues.apache.org/jira/browse/CASSANDRA-2699 It's likely a variation based on how out of synch you happen to be, and whether you have a neighbor that's also been repaired and bloated up already. > My setup is kind of strange in that it's only about 80-100GB of data on a 35 > node cluster, with 2 data centers and 3 racks, however the rack assignments > are unbalanced. One data center has 8 nodes, and the other data center is > split into 2 racks with one rack of 9 nodes, and the other with 18 nodes. > However, within each rack, the tokens are distributed equally. It's a long > sad story about how we ended up this way, but it basically boils down to > having to utilize existing resources to resolve a production issue. https://issues.apache.org/jira/browse/CASSANDRA-3810 In terms of DCs, different DC:s are effectively independent of each other in terms of replica placement. So there is no need or desire for two DC:s to be symmetrical. The racks are important though if you are trying to take advantage of racks being somewhat independent failure domains (for reasons outlined in 3810 above). -- / Peter Schuller (@scode, http://worldmodscode.wordpress.com)