*Environment*
1) Actual Cassandra 2.1.3, it was upgraded from 2.1.0 (suggested by Al
Tobey from DataStax)
2) not using vnodes
3)Two data centres: 5 nodes in one DC (DC_A), 4 nodes in second DC (DC_B)
4) each node is set up on a physical box with two 16-Core HT Xeon
processors (E5-2660), 64GB RAM and 10x2TB 7.2K SAS disks (one for
commitlog, nine for Cassandra data file directories), 1Gbps network. No
RAID, only JBOD.
5) 3500 writes per seconds, I write only to DC_A with local_quorum with
RF=5 in the local DC_A on our largest CF’s.
6) acceptable write times (usually a few ms unless we encounter some
problem within the cluster)
7) minimal reads (usually none, sometimes few)
8) iostat looks like ok ->
http://serverfault.com/questions/666136/interpreting-disk-stats-using-sar
9) We use SizeTired compaction. We convert to it from LevelTired


*Problems*
Nowadays we see two main problems:
1) In DC_A we have a rally lot of pending compactions (400-700 depending on
node). In DC_B everything is fine (10 is short term maximum, usually is
less then 3). The pending compaction does not change in long term.
2) In DC_A reads usually has timeout exception. In DC_B is fast and works
without problems.

*The question*
Is there a way how can I diagnose what is wrong with my servers? I
understand that DC_A is doing much more work than DC_B, but tested much
bigger load on test machine for few days and everything was fine.

Reply via email to