A tale of a node that never joins...

Stan Lemon Wed, 19 Nov 2014 06:58:32 -0800

Hello,
I'm working on a two data center cluster with 12 nodes in each data center.
I recently wanted to add a thirteenth node to one of the data centers to
try and validate some load improvements to our hardware configuration. I
added the node following DataStax directions (
http://www.datastax.com/documentation/cassandra/2.0/cassandra/operations/ops_add_node_to_cluster_t.html)
and the node appeared to bootstrap correctly and start joining.


I monitored the load and watched it increase, periodically checking iotop
to make sure there was still a pulse. Eventually the load topped out at
roughly 85% of the average of the other nodes, iotop showed lots of
activity.  After a few hours iotop stopped showing activity and the node's
load had gone down a small amount, ~50-100mb.  Average load on the other
nodes is about ~550gb

The first time I tried this I let the process run through the weekend,
periodically checking on it.  Something happened Monday morning which
caused Cassandra to die, so I restarted the process. The load immediately
began growing, eventually doubling that 85% marker and settling in around
~935gb, way more than any other node. When it reached this point it did the
same thing though, basically stalled out.

The whole time nodetool status just showed "UJ".

Finally I aborted and cleared the node's data directory and started over,
but again experienced the same stall out at the 85% mark. The node tool no
time at all to get to that point, it was only a few hours. It's not been
sitting at 85% for roughly 20 hours and iotop shows no activity.

I am wondering a few things...
1. What's going on?
2. How do I get more information about what is happening with the join
process?
3. Has anyone seen this before?

Thanks for your help,
Stan

A tale of a node that never joins...

Reply via email to