On Tue, Aug 20, 2013 at 11:35 PM, Keith Wright <kwri...@nanigans.com> wrote:
> Still looking for help! We have stopped almost ALL traffic to the cluster > and still some nodes are showing almost 1000% CPU for cassandra with no > iostat activity. We were running cleanup on one of the nodes that was not > showing load spikes however now when I attempt to stop cleanup there via > nodetool stop cleanup the java task for stopping cleanup itself is at 1500% > and has not returned after 2 minutes. This is VERY odd behavior. Any > ideas? Hardware failure? Network? We are not seeing anything there but > wanted to get ideas. > The most obvious answer is that somehow the problem nodes hit a magical threshold which makes them "thrash" with GC. If you restart the affected nodes, does the error condition return? If so, how quickly? =Rob