We have a 3 node cluster running cassandra 1.2.12, they are pretty big
machines 64G ram with 16 cores, cassandra heap is 8G.

The interesting observation is that, when I send traffic to one node its
performance is 2x more than when I send traffic to all the nodes. We ran
1.0.11 on the same box and we observed a slight dip but not half as seen
with 1.2.12. In both the cases we were writing with LOCAL_QUORUM. Changing
CL to ONE make a slight improvement but not much.

The read_Repair_chance is 0.1. We see some compactions running.

following is my iostat -x output, sda is the ssd (for commit log) and sdb
is the spinner.

avg-cpu:  %user   %nice %system %iowait  %steal   %idle
          66.46    0.00    8.95    0.01    0.00   24.58

Device:         rrqm/s   wrqm/s   r/s   w/s   rsec/s   wsec/s avgrq-sz
avgqu-sz   await  svctm  %util
sda               0.00    27.60  0.00  4.40     0.00   256.00    58.18
0.01    2.55   1.32   0.58
sda1              0.00     0.00  0.00  0.00     0.00     0.00     0.00
0.00    0.00   0.00   0.00
sda2              0.00    27.60  0.00  4.40     0.00   256.00    58.18
0.01    2.55   1.32   0.58
sdb               0.00     0.00  0.00  0.00     0.00     0.00     0.00
0.00    0.00   0.00   0.00
sdb1              0.00     0.00  0.00  0.00     0.00     0.00     0.00
0.00    0.00   0.00   0.00
dm-0              0.00     0.00  0.00  0.00     0.00     0.00     0.00
0.00    0.00   0.00   0.00
dm-1              0.00     0.00  0.00  0.60     0.00     4.80     8.00
0.00    5.33   2.67   0.16
dm-2              0.00     0.00  0.00  0.00     0.00     0.00     0.00
0.00    0.00   0.00   0.00
dm-3              0.00     0.00  0.00 24.80     0.00   198.40     8.00
0.24    9.80   0.13   0.32
dm-4              0.00     0.00  0.00  6.60     0.00    52.80     8.00
0.01    1.36   0.55   0.36
dm-5              0.00     0.00  0.00  0.00     0.00     0.00     0.00
0.00    0.00   0.00   0.00
dm-6              0.00     0.00  0.00 24.80     0.00   198.40     8.00
0.29   11.60   0.13   0.32



I can see I am cpu bound here but couldn't figure out exactly what is
causing it, is this caused by GC or Compaction ? I am thinking it is
compaction, I see a lot of context switches and interrupts in my vmstat
output.

I don't see GC activity in the logs but see some compaction activity. Has
anyone seen this ? or know what can be done to free up the CPU.

Thanks,
Sandeep

Reply via email to