1.1.7

Rob Coli <rc...@palominodb.com> wrote:

>Before we start.. what version of cassandra?
>
>On Fri, Dec 21, 2012 at 4:25 PM, Keith Wright <kwri...@nanigans.com> wrote:
>> This behavior seems to occur if I do a large
>> amount of data loading using that node as the coordinator node.
>
>In general you want to use all nodes to coordinate, not a single one.
>
>> Nodetool netstats never seems to show
>> any streaming data.  With past nodes it seemed like the node eventually
>> fixed itself.
>
>That node is storing hints for other nodes it believes are or were at
>some point in DOWN state. The first step to preventing this problem
>from recurring is to understand why it believes/d other nodes are
>down. My conjecture is that you are overloading the coordinating node
>and/or other nodes with the large amount of write.
>
>> Note that I am seeing severely degraded performance on this node when it
>> attempts to compact the HintsColumnFamily to the point where I had to set
>> setcompactionthroughput to 999 to ensure it doesn't run again (after which
>> the node started serving requests much faster).
>
>Depending on version, your 40gb of hints could be in one 40gb wide
>row. Look at nodetool cfstats for HintsColumnFamily to determine if
>this is the case.
>
>Do you see "Timed out replaying hint" messages, or are the hints being
>successfully delivered?
>
>You have two broad options :
>
>1) purge your hints and then either reload the data (if reloading it
>will be idempotent) or "repair -pr" on every node in the cluster.
>2) reduce load enough that hints will be successfully delivered,
>reduce gc_grace_seconds on the hints cf to 0 and then do a major
>compaction.
>
>If I were you, I would probably do 1). The easiest way is to stop the
>node and remove all sstables in the HintsColumnFamily.
>
>=Rob
>
>-- 
>=Robert Coli
>AIM&GTALK - rc...@palominodb.com
>YAHOO - rcoli.palominob
>SKYPE - rcoli_palominodb

Reply via email to