Hello, I'm working on a two data center cluster with 12 nodes in each data center. I recently wanted to add a thirteenth node to one of the data centers to try and validate some load improvements to our hardware configuration. I added the node following DataStax directions ( http://www.datastax.com/documentation/cassandra/2.0/cassandra/operations/ops_add_node_to_cluster_t.html) and the node appeared to bootstrap correctly and start joining.
I monitored the load and watched it increase, periodically checking iotop to make sure there was still a pulse. Eventually the load topped out at roughly 85% of the average of the other nodes, iotop showed lots of activity. After a few hours iotop stopped showing activity and the node's load had gone down a small amount, ~50-100mb. Average load on the other nodes is about ~550gb The first time I tried this I let the process run through the weekend, periodically checking on it. Something happened Monday morning which caused Cassandra to die, so I restarted the process. The load immediately began growing, eventually doubling that 85% marker and settling in around ~935gb, way more than any other node. When it reached this point it did the same thing though, basically stalled out. The whole time nodetool status just showed "UJ". Finally I aborted and cleared the node's data directory and started over, but again experienced the same stall out at the 85% mark. The node tool no time at all to get to that point, it was only a few hours. It's not been sitting at 85% for roughly 20 hours and iotop shows no activity. I am wondering a few things... 1. What's going on? 2. How do I get more information about what is happening with the join process? 3. Has anyone seen this before? Thanks for your help, Stan