On Mon, Nov 24, 2014 at 12:57 PM, Kevin Burton <bur...@spinn3r.com> wrote:

> I’m trying to track down some exceptions in our production cluster.  I
> bumped up our write load and now I’m getting a non-trivial number of these
> exceptions.  Somewhere on the order of 100 per hour.
>
> All machines have a somewhat high CPU load because they’re doing other
> tasks.  I’m worried that perhaps my background tasks are just overloading
> cassandra and one way to mitigate this is to nice them to least favorable
> priority (this is my first tasks).
>

Two out of three of them are timeouts or lack of availability. Seeing this
across your cluster is usually associated with hitting a "pre-fail"
condition in terms of GC, where the amount of data stored per node makes
the steady state working set larger than available non-fragmented heap. If
you're graphing GC time, I would expect to see a concomitant spike there.

=Rob

Reply via email to