>> We do not use secondary indexes or snapshots Out of interest how many CF's do you have ?
Cheers ----------------- Aaron Morton Freelance Cassandra Consultant New Zealand @aaronmorton http://www.thelastpickle.com On 28/06/2013, at 7:52 AM, Nate McCall <zznat...@gmail.com> wrote: > Non-zero for pending tasks is too transient. Try monitoring tpstats > with a (much) higher frequency and look for sustained threshold over a > duration. > > Then, using a percentage of the configuration values for the max - 75% > of memtable_flush_queue_size in this case - alert when it has been > higher than '3' for more than N time. (Start with N=60 seconds and go > from there). > > Also, that is a very high 'all time blocked' to 'completed' ratio for > FlushWriter. If iostat is happy, i'd do as Aaron suggested above and > turn up the memtable_flush_queue_size and play around with turning up > memtable_flush_writers (incrementally and separately for both of > course so you can see the effect). > > On Thu, Jun 27, 2013 at 2:27 AM, Arindam Barua <aba...@247-inc.com> wrote: >> In our performance tests, we are seeing similar FlushWriter, MutationStage, >> MemtablePostFlusher pending tasks become non-zero. We collect snapshots >> every 5 minutes, and they seem to clear after ~10-15 minutes though. (The >> flush writer has an 'All time blocked' count of 540 in the below example). >> >> We do not use secondary indexes or snapshots. We do not use SSDs. We have a >> 4-node cluster with around 30-40 GB data on each node. Each node has 3 1-TB >> disks with a RAID 0 setup. >> >> Currently we monitor the tpstats every 5 minutes, and alert if FlushWriter >> or MutationStage has a non-zero Pending count. Any suggestions if this is a >> cause of concern already, or, should we alert only if that count becomes >> greater than a bigger number, say 10, or if the count remains non-zero >> greater than a specified time. >> >> Pool Name Active Pending Completed Blocked All >> time blocked >> ReadStage 0 0 15685133 0 >> 0 >> RequestResponseStage 0 0 29880863 0 >> 0 >> MutationStage 0 0 40457340 0 >> 0 >> ReadRepairStage 0 0 704322 0 >> 0 >> ReplicateOnWriteStage 0 0 0 0 >> 0 >> GossipStage 0 0 2283062 0 >> 0 >> AntiEntropyStage 0 0 0 0 >> 0 >> MigrationStage 0 0 70 0 >> 0 >> MemtablePostFlusher 1 1 1837 0 >> 0 >> StreamStage 0 0 0 0 >> 0 >> FlushWriter 1 1 1446 0 >> 540 >> MiscStage 0 0 0 0 >> 0 >> commitlog_archiver 0 0 0 0 >> 0 >> InternalResponseStage 0 0 43 0 >> 0 >> HintedHandoff 0 0 3 0 >> 0 >> >> Thanks, >> Arindam >> >> -----Original Message----- >> From: aaron morton [mailto:aa...@thelastpickle.com] >> Sent: Tuesday, June 25, 2013 10:29 PM >> To: user@cassandra.apache.org >> Subject: Re: about FlushWriter "All time blocked" >> >>> FlushWriter 0 0 191 0 >>> 12 >> >> This means there were 12 times the code wanted to put an memtable in the >> queue to be flushed to disk but the queue was full. >> >> The length of this queue is controlled by the memtable_flush_queue_size >> https://github.com/apache/cassandra/blob/cassandra-1.2/conf/cassandra.yaml#L299 >> and memtable_flush_writers . >> >> When this happens an internal lock around the commit log is held which >> prevents writes from being processed. >> >> In general it means the IO system cannot keep up. It can sometimes happen >> when snapshot is used as all the CF's are flushed to disk at once. I also >> suspect it happens sometimes when a commit log segment is flushed and their >> are a lot of dirty CF's. But i've never proved it. >> >> Increase memtable_flush_queue_size following the help in the yaml file. If >> you do not use secondary indexes are you using snapshot? >> >> Hope that helps. >> A >> ----------------- >> Aaron Morton >> Freelance Cassandra Consultant >> New Zealand >> >> @aaronmorton >> http://www.thelastpickle.com >> >> On 24/06/2013, at 3:41 PM, yue.zhang <yue.zh...@chinacache.com> wrote: >> >>> 3 node >>> cent os >>> CPU 8core memory 32GB >>> cassandra 1.2.5 >>> my scenario: many counter incr, every node has one client program, >>> performance is 400 wps /every clicent (it’s so slowly) >>> >>> my question: >>> Ø nodetool tpstats >>> --------------------------------- >>> Pool Name Active Pending Completed Blocked All >>> time blocked >>> ReadStage 0 0 8453 0 >>> 0 >>> RequestResponseStage 0 0 138303982 0 >>> 0 >>> MutationStage 0 0 172002988 0 >>> 0 >>> ReadRepairStage 0 0 0 0 >>> 0 >>> ReplicateOnWriteStage 0 0 82246354 0 >>> 0 >>> GossipStage 0 0 1052389 0 >>> 0 >>> AntiEntropyStage 0 0 0 0 >>> 0 >>> MigrationStage 0 0 0 0 >>> 0 >>> MemtablePostFlusher 0 0 670 0 >>> 0 >>> FlushWriter 0 0 191 0 >>> 12 >>> MiscStage 0 0 0 0 >>> 0 >>> commitlog_archiver 0 0 0 0 >>> 0 >>> InternalResponseStage 0 0 0 0 >>> 0 >>> HintedHandoff 0 0 56 0 >>> 0 >>> ----------------------------------- >>> FlushWriter “All time blocked”=12,I restart the node,but no use,it’s >>> normally ? >>> >>> thx >>> >>> -heipark >>> >>> >> >>