*Environment*
- Cassandra 2.1.0 - 5 nodes in one DC (DC_A), 4 nodes in second DC (DC_B) - 2500 writes per seconds, I write only to DC_A with local_quorum - minimal reads (usually none, sometimes few) *Problem* After a few weeks of running I cannot read any data from my cluster, because I have ReadTimeoutException like following: ERROR [Thrift:15] 2015-01-07 14:16:21,124 CustomTThreadPoolServer.java:219 - Error occurred during processing of message. com.google.common.util.concurrent.UncheckedExecutionException: java.lang.RuntimeException: org.apache.cassandra.exceptions.ReadTimeoutException: Operation timed out - received only 2 responses. To be precise it is not only problem in my cluster, The second one was described here: Cassandra GC takes 30 seconds and hangs node <http://stackoverflow.com/questions/27843538/cassandra-gc-takes-30-seconds-and-hangs-node> and I will try to use fix from CASSANDRA-6541 <http://issues.apache.org/jira/browse/CASSANDRA-6541> as leshkin suggested *Diagnose * I tried to use some tools which were presented on http://rustyrazorblade.com/2014/09/cassandra-summit-recap-diagnosing-problems-in-production/ by Jon Haddad and have some strange result. I tried to run same query in DC_A and DC_B with tracing enabled. Query is simple: SELECT * FROM X.customer_events WHERE customer='1234567' AND utc_day=16447 AND bucket IN (1,2,3,4,5,6,7,8,9,10); Where table is defiied as following: CREATE TABLE drev_maelstrom.customer_events (customer text,utc_day int, bucket int, event_time bigint, event_id blob, event_type int, event blob, PRIMARY KEY ((customer, utc_day, bucket), event_time, event_id, event_type)[...] Results of the query: 1) In DC_B the query finished in less then a 0.22 of second . In DC_A more then 2.5 (~10 times longer). -> the problem is that bucket can be in range form -128 to 256 2) In DC_B it checked ~1000 SSTables with lines like: Bloom filter allows skipping sstable 50372 [SharedPool-Worker-7] | 2015-01-12 13:51:49.467001 | 192.168.71.198 | 4782 Where in DC_A it is: Bloom filter allows skipping sstable 118886 [SharedPool-Worker-5] | 2015-01-12 14:01:39.520001 | 192.168.61.199 | 25527 3) Total records in both DC were same. *Question* The question is quite simple: how can I speed up DC_A - it is my primary DC, DC_B is mostly for backup, and there is a lot of network partitions between A and B. Maybe I should check something more, but I just don't have an idea what it should be.