It's usually I/O which causes backup in MESSAGE-DESERIALIZER-POOL. You should check iostat and see what it looks like. It may be that you need more nodes in order to deal with the read/write rate. You can also use JMX to get latency values on reads and writes and see if the backup has a corresponding increase in latency. You may be able to get more out of your hardware and memory with row caching but that really depends on your data set.
-Anthony On Mon, Jul 26, 2010 at 12:22:46PM -0700, Dathan Pattishall wrote: > I have 4 nodes on enterprise type hardware (Lots of Ram 12GB, 16 i7 cores, > RAID Disks). > > ~# /opt/cassandra/bin/nodetool --host=localhost --port=8181 tpstats > Pool Name Active Pending Completed > STREAM-STAGE 0 0 0 > RESPONSE-STAGE 0 0 516280 > ROW-READ-STAGE 8 4096 1164326 > LB-OPERATIONS 0 0 0 > *MESSAGE-DESERIALIZER-POOL 1 682008 1818682* > GMFD 0 0 6467 > LB-TARGET 0 0 0 > CONSISTENCY-MANAGER 0 0 661477 > ROW-MUTATION-STAGE 0 0 998780 > MESSAGE-STREAMING-POOL 0 0 0 > LOAD-BALANCER-STAGE 0 0 0 > FLUSH-SORTER-POOL 0 0 0 > MEMTABLE-POST-FLUSHER 0 0 4 > FLUSH-WRITER-POOL 0 0 4 > AE-SERVICE-STAGE 0 0 0 > HINTED-HANDOFF-POOL 0 0 3 > > EQX r...@cass04:~# vmstat -n 1 > > procs -----------memory---------- ---swap-- -----io---- --system-- > -----cpu------ > r b swpd free buff cache si so bi bo in cs us sy id > wa st > 6 10 7096 121816 16244 10375492 0 0 1 3 0 0 5 1 > 94 0 0 > 2 10 7096 116484 16248 10381144 0 0 5636 4 21210 9820 2 1 > 79 18 0 > 1 9 7096 108920 16248 10387592 0 0 6216 0 21439 9878 2 1 > 81 16 0 > 0 9 7096 129108 16248 10364852 0 0 6024 0 23280 8753 2 1 > 80 17 0 > 2 9 7096 122460 16248 10370908 0 0 6072 0 20835 9461 2 1 > 83 14 0 > 2 8 7096 115740 16260 10375752 0 0 5168 292 21049 9511 3 1 > 77 20 0 > 1 10 7096 108424 16260 10382300 0 0 6244 0 21483 8981 2 1 > 75 22 0 > 3 8 7096 125028 16260 10364104 0 0 5584 0 21238 9436 2 1 > 81 16 0 > 3 9 7096 117928 16260 10370064 0 0 5988 0 21505 10225 2 1 > 77 19 0 > 1 8 7096 109544 16260 10376640 0 0 6340 28 20840 8602 2 1 > 80 18 0 > 0 9 7096 127028 16240 10357652 0 0 5984 0 20853 9158 2 1 > 79 18 0 > 9 0 7096 121472 16240 10363492 0 0 5716 0 20520 8489 1 1 > 82 16 0 > 3 9 7096 112668 16240 10369872 0 0 6404 0 21314 9459 2 1 > 84 13 0 > 1 9 7096 127300 16236 10353440 0 0 5684 0 38914 10068 2 1 > 76 21 0 > > > *But the 16 cores are hardly utilized. Which indicates to me there is some > bad thread thrashing, but why? * > > > > 1 [||||| 8.3%] Tasks: > 1070 total, 1 running > 2 [ 0.0%] Load > average: 8.34 9.05 8.82 > 3 [ 0.0%] Uptime: > 192 days(!), 15:29:52 > 4 [||||||||||| 17.9%] > 5 [||||| 5.7%] > 6 [|| 1.3%] > 7 [|| 2.6%] > 8 [| 0.6%] > 9 [| 0.6%] > 10 [|| 1.9%] > 11 [|| 1.9%] > 12 [|| 1.9%] > 13 [|| 1.3%] > 14 [| 0.6%] > 15 [|| 1.3%] > 16 [| 0.6%] > Mem[||||||||||||||||||||||||||||||||||||||||||||1791/12028MB] > Swp[| 6/1983MB] > > PID USER PRI NI VIRT RES SHR S CPU% MEM% TIME+ Command > 30269 root 40 0 14100 2116 900 R 4.0 0.0 0:00.49 htop > 24878 root 40 0 20.6G 8345M 6883M D 3.0 69.4 1:23.03 > /opt/java/bin/java -ea -Xms1G -Xmx7G -XX:+UseParNewGC -XX:+UseConcMark > 24879 root 40 0 20.6G 8345M 6883M D 3.0 69.4 1:22.93 > /opt/java/bin/java -ea -Xms1G -Xmx7G -XX:+UseParNewGC -XX:+UseConcMark > 24874 root 40 0 20.6G 8345M 6883M D 2.0 69.4 1:22.73 > /opt/java/bin/java -ea -Xms1G -Xmx7G -XX:+UseParNewGC -XX:+UseConcMark > 24880 root 40 0 20.6G 8345M 6883M D 2.0 69.4 1:22.93 > /opt/java/bin/java -ea -Xms1G -Xmx7G -XX:+UseParNewGC -XX:+UseConcMark > 24875 root 40 0 20.6G 8345M 6883M D 2.0 69.4 1:23.17 > /opt/java/bin/java -ea -Xms1G -Xmx7G -XX:+UseParNewGC -XX:+UseConcMark > 24658 root 40 0 20.6G 8345M 6883M D 2.0 69.4 1:23.06 > /opt/java/bin/java -ea -Xms1G -Xmx7G -XX:+UseParNewGC -XX:+UseConcMark > 24877 root 40 0 20.6G 8345M 6883M S 2.0 69.4 1:23.43 > /opt/java/bin/java -ea -Xms1G -Xmx7G -XX:+UseParNewGC -XX:+UseConcMark > 24873 root 40 0 20.6G 8345M 6883M D 1.0 69.4 1:23.65 > /opt/java/bin/java -ea -Xms1G -Xmx7G -XX:+UseParNewGC -XX:+UseConcMark > 24876 root 40 0 20.6G 8345M 6883M S 1.0 69.4 1:23.62 > /opt/java/bin/java -ea -Xms1G -Xmx7G -XX:+UseParNewGC -XX:+UseConcMark > 24942 root 40 0 20.6G 8345M 6883M S 1.0 69.4 0:23.50 > /opt/java/bin/java -ea -Xms1G -Xmx7G -XX:+UseParNewGC -XX:+UseConcMark > 24943 root 40 0 20.6G 8345M 6883M S 0.0 69.4 0:29.53 > /opt/java/bin/java -ea -Xms1G -Xmx7G -XX:+UseParNewGC -XX:+UseConcMark > 24933 root 40 0 20.6G 8345M 6883M S 0.0 69.4 0:22.57 > /opt/java/bin/java -ea -Xms1G -Xmx7G -XX:+UseParNewGC -XX:+UseConcMark > 24939 root 40 0 20.6G 8345M 6883M S 0.0 69.4 0:12.73 > /opt/java/bin/java -ea -Xms1G -Xmx7G -XX:+UseParNewGC -XX:+UseConcMark > 25280 root 40 0 20.6G 8345M 6883M S 0.0 69.4 0:00.10 > /opt/java/bin/java -ea -Xms1G -Xmx7G -XX:+UseParNewGC -XX:+UseConcMark -- ------------------------------------------------------------------------ Anthony Molinaro <antho...@alumni.caltech.edu>