On Wed, Aug 21, 2013 at 10:47 AM, Robert Coli <rc...@eventbrite.com> wrote:
> On Tue, Aug 20, 2013 at 11:35 PM, Keith Wright <kwri...@nanigans.com>wrote: > >> Still looking for help! We have stopped almost ALL traffic to the >> cluster and still some nodes are showing almost 1000% CPU for cassandra >> with no iostat activity. We were running cleanup on one of the nodes that >> was not showing load spikes however now when I attempt to stop cleanup >> there via nodetool stop cleanup the java task for stopping cleanup itself >> is at 1500% and has not returned after 2 minutes. This is VERY odd >> behavior. Any ideas? Hardware failure? Network? We are not seeing >> anything there but wanted to get ideas. >> > > The most obvious answer is that somehow the problem nodes hit a magical > threshold which makes them "thrash" with GC. > > If you restart the affected nodes, does the error condition return? If > so, how quickly? > Lol, missed the rest of the responses on thread. NVMD. :D =Rob