In our performance tests, we are seeing similar FlushWriter, MutationStage,
MemtablePostFlusher pending tasks become non-zero. We collect snapshots every 5
minutes, and they seem to clear after ~10-15 minutes though. (The flush writer
has an 'All time blocked' count of 540 in the below example).
We do not use secondary indexes or snapshots. We do not use SSDs. We have a
4-node cluster with around 30-40 GB data on each node. Each node has 3 1-TB
disks with a RAID 0 setup.
Currently we monitor the tpstats every 5 minutes, and alert if FlushWriter or
MutationStage has a non-zero Pending count. Any suggestions if this is a cause
of concern already, or, should we alert only if that count becomes greater than
a bigger number, say 10, or if the count remains non-zero greater than a
specified time.
Pool Name Active Pending Completed Blocked All
time blocked
ReadStage 0 0 15685133 0
0
RequestResponseStage 0 0 29880863 0
0
MutationStage 0 0 40457340 0
0
ReadRepairStage 0 0 704322 0
0
ReplicateOnWriteStage 0 0 0 0
0
GossipStage 0 0 2283062 0
0
AntiEntropyStage 0 0 0 0
0
MigrationStage 0 0 70 0
0
MemtablePostFlusher 1 1 1837 0
0
StreamStage 0 0 0 0
0
FlushWriter 1 1 1446 0
540
MiscStage 0 0 0 0
0
commitlog_archiver 0 0 0 0
0
InternalResponseStage 0 0 43 0
0
HintedHandoff 0 0 3 0
0
Thanks,
Arindam
-----Original Message-----
From: aaron morton [mailto:[email protected]]
Sent: Tuesday, June 25, 2013 10:29 PM
To: [email protected]
Subject: Re: about FlushWriter "All time blocked"
> FlushWriter 0 0 191 0
> 12
This means there were 12 times the code wanted to put an memtable in the queue
to be flushed to disk but the queue was full.
The length of this queue is controlled by the memtable_flush_queue_size
https://github.com/apache/cassandra/blob/cassandra-1.2/conf/cassandra.yaml#L299
and memtable_flush_writers .
When this happens an internal lock around the commit log is held which prevents
writes from being processed.
In general it means the IO system cannot keep up. It can sometimes happen when
snapshot is used as all the CF's are flushed to disk at once. I also suspect it
happens sometimes when a commit log segment is flushed and their are a lot of
dirty CF's. But i've never proved it.
Increase memtable_flush_queue_size following the help in the yaml file. If you
do not use secondary indexes are you using snapshot?
Hope that helps.
A
-----------------
Aaron Morton
Freelance Cassandra Consultant
New Zealand
@aaronmorton
http://www.thelastpickle.com
On 24/06/2013, at 3:41 PM, yue.zhang <[email protected]> wrote:
> 3 node
> cent os
> CPU 8core memory 32GB
> cassandra 1.2.5
> my scenario: many counter incr, every node has one client program,
> performance is 400 wps /every clicent (it’s so slowly)
>
> my question:
> Ø nodetool tpstats
> ---------------------------------
> Pool Name Active Pending Completed Blocked All
> time blocked
> ReadStage 0 0 8453 0
> 0
> RequestResponseStage 0 0 138303982 0
> 0
> MutationStage 0 0 172002988 0
> 0
> ReadRepairStage 0 0 0 0
> 0
> ReplicateOnWriteStage 0 0 82246354 0
> 0
> GossipStage 0 0 1052389 0
> 0
> AntiEntropyStage 0 0 0 0
> 0
> MigrationStage 0 0 0 0
> 0
> MemtablePostFlusher 0 0 670 0
> 0
> FlushWriter 0 0 191 0
> 12
> MiscStage 0 0 0 0
> 0
> commitlog_archiver 0 0 0 0
> 0
> InternalResponseStage 0 0 0 0
> 0
> HintedHandoff 0 0 56 0
> 0
> -----------------------------------
> FlushWriter “All time blocked”=12,I restart the node,but no use,it’s normally
> ?
>
> thx
>
> -heipark
>
>