Occasionally as I'm doing my regular anti-entropy repair I end up with a
node that uses an exceptional amount of disk space (node should have about
5-6 GB of data on it, but ends up with 25+GB, and consumes the limited
amount of disk space I have available)

How come a node would consume 5x its normal data size during the repair
process?

My setup is kind of strange in that it's only about 80-100GB of data on a
35 node cluster, with 2 data centers and 3 racks, however the rack
assignments are unbalanced.  One data center has 8 nodes, and the other
data center is split into 2 racks with one rack of 9 nodes, and the other
with 18 nodes.  However, within each rack, the tokens are distributed
equally. It's a long sad story about how we ended up this way, but it
basically boils down to having to utilize existing resources to resolve a
production issue.

Additionally, the repair process takes (what I feel is) an extremely long
time to complete (36+ hours), and it always seems that nodes are streaming
data to each other, even on back-to-back executions of the repair.

Any help on these issues is appreciated.

- Mike

Reply via email to