Thanks Aaron for the response. We're not doing drain on node, and there's no
that message in the log.
We used LOCAL_QUORUM CL,
endpoint_snitch: org.apache.cassandra.locator.PropertyFileSnitch
dynamic_snitch: false
dynamic_snitch_badness_threshold: 0.0
Because we have another 3 nodes DC for Brisk
wrt the Exception something has shutdown the Mutation thread pool. The only
thing I can see in the code to do this is nodetool drain and running the
Embedded server. If it was drain you should see an INFO level messages "Node is
drained" somewhere. Could either of these things be happening ?
w
Just wondering, would it help if we shorten the rpc_timeout_in_ms (currently
using 30,000), so that when one node gets hot and responding slowly, others
will just take it as down and move forward?
On Aug 17, 2011, at 11:35 AM, Hefeng Yuan wrote:
> Sorry, correction, we're using 0.8.1.
>
> On A
Sorry, correction, we're using 0.8.1.
On Aug 17, 2011, at 11:24 AM, Hefeng Yuan wrote:
> Hi,
>
> We're noticing that when one node gets hot (very high cpu usage) because of
> 'nodetool repair', the whole cluster's performance becomes really bad.
>
> We're using 0.8.1 with random partition. We
Hi,
We're noticing that when one node gets hot (very high cpu usage) because of
'nodetool repair', the whole cluster's performance becomes really bad.
We're using 0.8.0 with random partition. We have 6 nodes with RF 5. Our repair
is scheduled to run once a week, spread across whole cluster. I d