Re: ReplicateOnWriteStage exception causes a backlog in MutationStage that never clears

2012-03-23 Thread Thomas van Neerijnen
The main issue turned out to be a bug in our code whereby we were writing a lot of new columns to the same row key instead of a new row key, turning what we expected to be a skinny rowed CF into a CF with one very, very wide row. These writes on the single key were putting pressure on the 3 nodes h

Re: ReplicateOnWriteStage exception causes a backlog in MutationStage that never clears

2012-03-21 Thread Thomas van Neerijnen
Hi I'm going with yes to all three of your questions. I found a very heavily hit index which we have since reworked to remove the secondry index entirely. This fixed a large portion of the problem but during the panic of the overloaded cluster we did the simple scaling out trick of doubling the c

Re: ReplicateOnWriteStage exception causes a backlog in MutationStage that never clears

2012-03-21 Thread aaron morton
The node is overloaded with hints. I'll just grab the comments from code… // avoid OOMing due to excess hints. we need to do this check even for "live" nodes, since we can // still generate hints for those if it's overloaded or simply dead but not yet known-to-be-dead