*Environment* 1) Actual Cassandra 2.1.3, it was upgraded from 2.1.0 (suggested by Al Tobey from DataStax) 2) not using vnodes 3)Two data centres: 5 nodes in one DC (DC_A), 4 nodes in second DC (DC_B) 4) each node is set up on a physical box with two 16-Core HT Xeon processors (E5-2660), 64GB RAM and 10x2TB 7.2K SAS disks (one for commitlog, nine for Cassandra data file directories), 1Gbps network. No RAID, only JBOD. 5) 3500 writes per seconds, I write only to DC_A with local_quorum with RF=5 in the local DC_A on our largest CF’s. 6) acceptable write times (usually a few ms unless we encounter some problem within the cluster) 7) minimal reads (usually none, sometimes few) 8) iostat looks like ok -> http://serverfault.com/questions/666136/interpreting-disk-stats-using-sar 9) We use SizeTired compaction. We convert to it from LevelTired
*Problems* Nowadays we see two main problems: 1) In DC_A we have a rally lot of pending compactions (400-700 depending on node). In DC_B everything is fine (10 is short term maximum, usually is less then 3). The pending compaction does not change in long term. 2) In DC_A reads usually has timeout exception. In DC_B is fast and works without problems. *The question* Is there a way how can I diagnose what is wrong with my servers? I understand that DC_A is doing much more work than DC_B, but tested much bigger load on test machine for few days and everything was fine.