MDP is backing up because RRS is full at 4096. This means you're not able to process reads as quickly as the requests are coming in. Make whatever is doing those reads be less aggressive.
As to why the reads are slow in the first place, usually this means you are disk i/o bound. Posting your cfstats can help troubleshoot but is no substitute for thinking about your application workload. On Mon, Jul 26, 2010 at 12:32 PM, Anthony Molinaro <antho...@alumni.caltech.edu> wrote: > It's usually I/O which causes backup in MESSAGE-DESERIALIZER-POOL. You > should check iostat and see what it looks like. It may be that you > need more nodes in order to deal with the read/write rate. You can also > use JMX to get latency values on reads and writes and see if the backup > has a corresponding increase in latency. You may be able to get more > out of your hardware and memory with row caching but that really depends > on your data set. > > -Anthony > > On Mon, Jul 26, 2010 at 12:22:46PM -0700, Dathan Pattishall wrote: >> I have 4 nodes on enterprise type hardware (Lots of Ram 12GB, 16 i7 cores, >> RAID Disks). >> >> ~# /opt/cassandra/bin/nodetool --host=localhost --port=8181 tpstats >> Pool Name Active Pending Completed >> STREAM-STAGE 0 0 0 >> RESPONSE-STAGE 0 0 516280 >> ROW-READ-STAGE 8 4096 1164326 >> LB-OPERATIONS 0 0 0 >> *MESSAGE-DESERIALIZER-POOL 1 682008 1818682* >> GMFD 0 0 6467 >> LB-TARGET 0 0 0 >> CONSISTENCY-MANAGER 0 0 661477 >> ROW-MUTATION-STAGE 0 0 998780 >> MESSAGE-STREAMING-POOL 0 0 0 >> LOAD-BALANCER-STAGE 0 0 0 >> FLUSH-SORTER-POOL 0 0 0 >> MEMTABLE-POST-FLUSHER 0 0 4 >> FLUSH-WRITER-POOL 0 0 4 >> AE-SERVICE-STAGE 0 0 0 >> HINTED-HANDOFF-POOL 0 0 3 >> >> EQX r...@cass04:~# vmstat -n 1 >> >> procs -----------memory---------- ---swap-- -----io---- --system-- >> -----cpu------ >> r b swpd free buff cache si so bi bo in cs us sy id >> wa st >> 6 10 7096 121816 16244 10375492 0 0 1 3 0 0 5 1 >> 94 0 0 >> 2 10 7096 116484 16248 10381144 0 0 5636 4 21210 9820 2 1 >> 79 18 0 >> 1 9 7096 108920 16248 10387592 0 0 6216 0 21439 9878 2 1 >> 81 16 0 >> 0 9 7096 129108 16248 10364852 0 0 6024 0 23280 8753 2 1 >> 80 17 0 >> 2 9 7096 122460 16248 10370908 0 0 6072 0 20835 9461 2 1 >> 83 14 0 >> 2 8 7096 115740 16260 10375752 0 0 5168 292 21049 9511 3 1 >> 77 20 0 >> 1 10 7096 108424 16260 10382300 0 0 6244 0 21483 8981 2 1 >> 75 22 0 >> 3 8 7096 125028 16260 10364104 0 0 5584 0 21238 9436 2 1 >> 81 16 0 >> 3 9 7096 117928 16260 10370064 0 0 5988 0 21505 10225 2 1 >> 77 19 0 >> 1 8 7096 109544 16260 10376640 0 0 6340 28 20840 8602 2 1 >> 80 18 0 >> 0 9 7096 127028 16240 10357652 0 0 5984 0 20853 9158 2 1 >> 79 18 0 >> 9 0 7096 121472 16240 10363492 0 0 5716 0 20520 8489 1 1 >> 82 16 0 >> 3 9 7096 112668 16240 10369872 0 0 6404 0 21314 9459 2 1 >> 84 13 0 >> 1 9 7096 127300 16236 10353440 0 0 5684 0 38914 10068 2 1 >> 76 21 0 >> >> >> *But the 16 cores are hardly utilized. Which indicates to me there is some >> bad thread thrashing, but why? * >> >> >> >> 1 [||||| 8.3%] Tasks: >> 1070 total, 1 running >> 2 [ 0.0%] Load >> average: 8.34 9.05 8.82 >> 3 [ 0.0%] Uptime: >> 192 days(!), 15:29:52 >> 4 [||||||||||| 17.9%] >> 5 [||||| 5.7%] >> 6 [|| 1.3%] >> 7 [|| 2.6%] >> 8 [| 0.6%] >> 9 [| 0.6%] >> 10 [|| 1.9%] >> 11 [|| 1.9%] >> 12 [|| 1.9%] >> 13 [|| 1.3%] >> 14 [| 0.6%] >> 15 [|| 1.3%] >> 16 [| 0.6%] >> Mem[||||||||||||||||||||||||||||||||||||||||||||1791/12028MB] >> Swp[| 6/1983MB] >> >> PID USER PRI NI VIRT RES SHR S CPU% MEM% TIME+ Command >> 30269 root 40 0 14100 2116 900 R 4.0 0.0 0:00.49 htop >> 24878 root 40 0 20.6G 8345M 6883M D 3.0 69.4 1:23.03 >> /opt/java/bin/java -ea -Xms1G -Xmx7G -XX:+UseParNewGC -XX:+UseConcMark >> 24879 root 40 0 20.6G 8345M 6883M D 3.0 69.4 1:22.93 >> /opt/java/bin/java -ea -Xms1G -Xmx7G -XX:+UseParNewGC -XX:+UseConcMark >> 24874 root 40 0 20.6G 8345M 6883M D 2.0 69.4 1:22.73 >> /opt/java/bin/java -ea -Xms1G -Xmx7G -XX:+UseParNewGC -XX:+UseConcMark >> 24880 root 40 0 20.6G 8345M 6883M D 2.0 69.4 1:22.93 >> /opt/java/bin/java -ea -Xms1G -Xmx7G -XX:+UseParNewGC -XX:+UseConcMark >> 24875 root 40 0 20.6G 8345M 6883M D 2.0 69.4 1:23.17 >> /opt/java/bin/java -ea -Xms1G -Xmx7G -XX:+UseParNewGC -XX:+UseConcMark >> 24658 root 40 0 20.6G 8345M 6883M D 2.0 69.4 1:23.06 >> /opt/java/bin/java -ea -Xms1G -Xmx7G -XX:+UseParNewGC -XX:+UseConcMark >> 24877 root 40 0 20.6G 8345M 6883M S 2.0 69.4 1:23.43 >> /opt/java/bin/java -ea -Xms1G -Xmx7G -XX:+UseParNewGC -XX:+UseConcMark >> 24873 root 40 0 20.6G 8345M 6883M D 1.0 69.4 1:23.65 >> /opt/java/bin/java -ea -Xms1G -Xmx7G -XX:+UseParNewGC -XX:+UseConcMark >> 24876 root 40 0 20.6G 8345M 6883M S 1.0 69.4 1:23.62 >> /opt/java/bin/java -ea -Xms1G -Xmx7G -XX:+UseParNewGC -XX:+UseConcMark >> 24942 root 40 0 20.6G 8345M 6883M S 1.0 69.4 0:23.50 >> /opt/java/bin/java -ea -Xms1G -Xmx7G -XX:+UseParNewGC -XX:+UseConcMark >> 24943 root 40 0 20.6G 8345M 6883M S 0.0 69.4 0:29.53 >> /opt/java/bin/java -ea -Xms1G -Xmx7G -XX:+UseParNewGC -XX:+UseConcMark >> 24933 root 40 0 20.6G 8345M 6883M S 0.0 69.4 0:22.57 >> /opt/java/bin/java -ea -Xms1G -Xmx7G -XX:+UseParNewGC -XX:+UseConcMark >> 24939 root 40 0 20.6G 8345M 6883M S 0.0 69.4 0:12.73 >> /opt/java/bin/java -ea -Xms1G -Xmx7G -XX:+UseParNewGC -XX:+UseConcMark >> 25280 root 40 0 20.6G 8345M 6883M S 0.0 69.4 0:00.10 >> /opt/java/bin/java -ea -Xms1G -Xmx7G -XX:+UseParNewGC -XX:+UseConcMark > > -- > ------------------------------------------------------------------------ > Anthony Molinaro <antho...@alumni.caltech.edu> > -- Jonathan Ellis Project Chair, Apache Cassandra co-founder of Riptano, the source for professional Cassandra support http://riptano.com