We've been having issues where as soon as we start doing heavy writes (via 
hadoop) recently, it really hammers 4 nodes out of 20.  We're using random 
partitioner and we've set the initial tokens for our 20 nodes according to the 
general spacing formula, except for a few token offsets as we've replaced dead 
nodes.

When I say hammers, I look at nodetool tpstats: those 4 nodes have completed 
something like 70 million mutation stage events whereas the rest of the cluster 
have completed from 2-20 million mutation stage events.  Therefore, on the 4 
nodes, we find in the logs there is evidence of backing up in the mutation 
stage and a lot of read repair message drops.  It looks like there is quite a 
bit of flushing is going on and consequently auto minor compactions.

We are running 0.7.8 and have about 34 column families (when counting secondary 
indexes as column families) so we can't get too large with our memtable 
throughput in mb.  We would like to upgrade to 0.8.4 (not least because of 
JAMM) but it seems that something else is going on with our cluster if we are 
using RP and balanced initial tokens and still have 4 hot nodes.

Do these symptoms and context sound familiar to anyone?  Does anyone have any 
suggestions as to how to address this kind of case - disproportionate write 
load?

Thanks,

Jeremy

Reply via email to