The main issue turned out to be a bug in our code whereby we were writing a
lot of new columns to the same row key instead of a new row key, turning
what we expected to be a skinny rowed CF into a CF with one very, very wide
row. These writes on the single key were putting pressure on the 3 nodes
h
Hi
I'm going with yes to all three of your questions.
I found a very heavily hit index which we have since reworked to remove the
secondry index entirely.
This fixed a large portion of the problem but during the panic of the
overloaded cluster we did the simple scaling out trick of doubling the
c
The node is overloaded with hints.
I'll just grab the comments from codeā¦
// avoid OOMing due to excess hints. we need to do this check even
for "live" nodes, since we can
// still generate hints for those if it's overloaded or simply dead
but not yet known-to-be-dead