We've been having issues where as soon as we start doing heavy writes (via hadoop) recently, it really hammers 4 nodes out of 20. We're using random partitioner and we've set the initial tokens for our 20 nodes according to the general spacing formula, except for a few token offsets as we've replaced dead nodes.
When I say hammers, I look at nodetool tpstats: those 4 nodes have completed something like 70 million mutation stage events whereas the rest of the cluster have completed from 2-20 million mutation stage events. Therefore, on the 4 nodes, we find in the logs there is evidence of backing up in the mutation stage and a lot of read repair message drops. It looks like there is quite a bit of flushing is going on and consequently auto minor compactions. We are running 0.7.8 and have about 34 column families (when counting secondary indexes as column families) so we can't get too large with our memtable throughput in mb. We would like to upgrade to 0.8.4 (not least because of JAMM) but it seems that something else is going on with our cluster if we are using RP and balanced initial tokens and still have 4 hot nodes. Do these symptoms and context sound familiar to anyone? Does anyone have any suggestions as to how to address this kind of case - disproportionate write load? Thanks, Jeremy