Re: Very large HintsColumnFamily

Rob Coli Fri, 21 Dec 2012 17:01:59 -0800

Before we start.. what version of cassandra?

On Fri, Dec 21, 2012 at 4:25 PM, Keith Wright <kwri...@nanigans.com> wrote:
> This behavior seems to occur if I do a large
> amount of data loading using that node as the coordinator node.


In general you want to use all nodes to coordinate, not a single one.

> Nodetool netstats never seems to show
> any streaming data.  With past nodes it seemed like the node eventually
> fixed itself.

That node is storing hints for other nodes it believes are or were at
some point in DOWN state. The first step to preventing this problem
from recurring is to understand why it believes/d other nodes are
down. My conjecture is that you are overloading the coordinating node
and/or other nodes with the large amount of write.

> Note that I am seeing severely degraded performance on this node when it
> attempts to compact the HintsColumnFamily to the point where I had to set
> setcompactionthroughput to 999 to ensure it doesn't run again (after which
> the node started serving requests much faster).

Depending on version, your 40gb of hints could be in one 40gb wide
row. Look at nodetool cfstats for HintsColumnFamily to determine if
this is the case.

Do you see "Timed out replaying hint" messages, or are the hints being
successfully delivered?

You have two broad options :

1) purge your hints and then either reload the data (if reloading it
will be idempotent) or "repair -pr" on every node in the cluster.
2) reduce load enough that hints will be successfully delivered,
reduce gc_grace_seconds on the hints cf to 0 and then do a major
compaction.

If I were you, I would probably do 1). The easiest way is to stop the
node and remove all sstables in the HintsColumnFamily.

=Rob

-- 
=Robert Coli
AIM&GTALK - rc...@palominodb.com
YAHOO - rcoli.palominob
SKYPE - rcoli_palominodb

Re: Very large HintsColumnFamily

Reply via email to