We have a 3 node cluster running cassandra 1.2.12, they are pretty big machines 64G ram with 16 cores, cassandra heap is 8G.
The interesting observation is that, when I send traffic to one node its performance is 2x more than when I send traffic to all the nodes. We ran 1.0.11 on the same box and we observed a slight dip but not half as seen with 1.2.12. In both the cases we were writing with LOCAL_QUORUM. Changing CL to ONE make a slight improvement but not much. The read_Repair_chance is 0.1. We see some compactions running. following is my iostat -x output, sda is the ssd (for commit log) and sdb is the spinner. avg-cpu: %user %nice %system %iowait %steal %idle 66.46 0.00 8.95 0.01 0.00 24.58 Device: rrqm/s wrqm/s r/s w/s rsec/s wsec/s avgrq-sz avgqu-sz await svctm %util sda 0.00 27.60 0.00 4.40 0.00 256.00 58.18 0.01 2.55 1.32 0.58 sda1 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 sda2 0.00 27.60 0.00 4.40 0.00 256.00 58.18 0.01 2.55 1.32 0.58 sdb 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 sdb1 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 dm-0 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 dm-1 0.00 0.00 0.00 0.60 0.00 4.80 8.00 0.00 5.33 2.67 0.16 dm-2 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 dm-3 0.00 0.00 0.00 24.80 0.00 198.40 8.00 0.24 9.80 0.13 0.32 dm-4 0.00 0.00 0.00 6.60 0.00 52.80 8.00 0.01 1.36 0.55 0.36 dm-5 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 dm-6 0.00 0.00 0.00 24.80 0.00 198.40 8.00 0.29 11.60 0.13 0.32 I can see I am cpu bound here but couldn't figure out exactly what is causing it, is this caused by GC or Compaction ? I am thinking it is compaction, I see a lot of context switches and interrupts in my vmstat output. I don't see GC activity in the logs but see some compaction activity. Has anyone seen this ? or know what can be done to free up the CPU. Thanks, Sandeep