I restarted Cassandra on that node to clear out that queue, reduced the available memory to java to 4GB, and now I'm able to read with 8 concurrent threads, about 110/second
Running iostat -x I see a large amount of time in await, and a small amount of time in svctm indicating the device is responding quickly but the queue length is excessive. All the data filesystems are xfs on 64 bit debian, Cassandra is the only thing reading/writing them, the CPUs are napping Linux 2.6.31-14-generic (record) 04/08/2010 _x86_64_ (2 CPU) avg-cpu: %user %nice %system %iowait %steal %idle 17.77 0.06 5.62 14.32 0.00 62.24 Device: rrqm/s wrqm/s r/s w/s rsec/s wsec/s avgrq-sz avgqu-sz await svctm %util sda 14.56 87.14 5.62 2.02 181.05 712.94 116.98 0.14 18.66 4.64 3.55 sdb 0.03 0.00 0.02 0.00 0.34 0.01 14.22 0.00 10.00 9.99 0.02 sdc 0.18 83.79 91.08 2.13 4159.40 2024.88 66.34 5.35 57.38 3.43 31.94 $ iostat -x Linux 2.6.31-14-generic (record) 04/08/2010 _x86_64_ (2 CPU) avg-cpu: %user %nice %system %iowait %steal %idle 17.77 0.06 5.62 14.32 0.00 62.24 Device: rrqm/s wrqm/s r/s w/s rsec/s wsec/s avgrq-sz avgqu-sz await svctm %util sda 14.56 87.13 5.62 2.02 181.04 712.92 116.98 0.14 18.66 4.64 3.55 sdb 0.03 0.00 0.02 0.00 0.34 0.01 14.22 0.00 10.00 9.99 0.02 sdc 0.18 83.78 91.09 2.13 4159.65 2024.82 66.34 5.35 57.38 3.43 31.95 -----Original Message----- From: Jonathan Ellis [mailto:jbel...@gmail.com] Sent: Thursday, April 08, 2010 10:12 AM To: user@cassandra.apache.org Subject: Re: Some insight into the slow read speed. Where to go from here? RC1 MESSAGE-DESERIALIZER-POOL Have you checked iostat -x ? On Thu, Apr 8, 2010 at 9:45 AM, Mark Jones <mjo...@imagehawk.com> wrote: > I don't see any way to increase the # of active Deserializers in > storage-conf.xml > > Tpstats more than 8 hours after insert/read stop > > Pool Name Active Pending Completed > FILEUTILS-DELETE-POOL 0 0 227 > STREAM-STAGE 0 0 1 > RESPONSE-STAGE 0 0 76724280 > ROW-READ-STAGE 8 4091 1138277 > LB-OPERATIONS 0 0 0 > MESSAGE-DESERIALIZER-POOL 1 1849826 78135012 > GMFD 0 0 136886 > LB-TARGET 0 0 0 > CONSISTENCY-MANAGER 0 0 1803 > ROW-MUTATION-STAGE 0 0 68669717 > MESSAGE-STREAMING-POOL 0 0 0 > LOAD-BALANCER-STAGE 0 0 0 > FLUSH-SORTER-POOL 0 0 0 > MEMTABLE-POST-FLUSHER 0 0 438 > FLUSH-WRITER-POOL 0 0 438 > AE-SERVICE-STAGE 0 0 3 > HINTED-HANDOFF-POOL 0 0 3 > > More than 30 minutes later (with no reads or writes to the cluster) > > Pool Name Active Pending Completed > FILEUTILS-DELETE-POOL 0 0 227 > STREAM-STAGE 0 0 1 > RESPONSE-STAGE 0 0 76724280 > ROW-READ-STAGE 8 4098 1314304 > LB-OPERATIONS 0 0 0 > MESSAGE-DESERIALIZER-POOL 1 1663578 78336771 > GMFD 0 0 142651 > LB-TARGET 0 0 0 > CONSISTENCY-MANAGER 0 0 1803 > ROW-MUTATION-STAGE 0 0 68669717 > MESSAGE-STREAMING-POOL 0 0 0 > LOAD-BALANCER-STAGE 0 0 0 > FLUSH-SORTER-POOL 0 0 0 > MEMTABLE-POST-FLUSHER 0 0 438 > FLUSH-WRITER-POOL 0 0 438 > AE-SERVICE-STAGE 0 0 3 > HINTED-HANDOFF-POOL 0 0 3 > > The other 2 nodes in the cluster have Pending Counts of 0, but this node > seems hung > indefinitely processing requests that should have long ago timed out for the > client. > > TOP is showing a huge amount of I/O Wait, but I'm not sure how to track where > the wait is happening below here. I now have jconsole up and running on this > machine, and the memory usage appears to be a saw tooth wave, going from 1GB > up to 4GB over 3 hours, then plunging back to 1GB and resuming its climb. > > top - 08:33:40 up 1 day, 19:25, 4 users, load average: 7.75, 7.96, 8.16 > Tasks: 177 total, 2 running, 175 sleeping, 0 stopped, 0 zombie > Cpu(s): 16.6%us, 7.2%sy, 0.0%ni, 34.5%id, 41.1%wa, 0.0%hi, 0.6%si, 0.0%st > Mem: 8123068k total, 8062240k used, 60828k free, 2624k buffers > Swap: 12699340k total, 1951504k used, 10747836k free, 3757300k cached >