hey list, i've been trying to understand why we are seeing rather nasty read latency peaks in as much as 2% of our total read ( not sure what the underlying thrift call is, should be get_slice ) requests that we have been **unable to tie to garbage collection or blocking I/O.** This is what i mean by "nasty" : Mean: 1.186341, Max: 41.912 stDev: 2.2490502737495333, 75th:1.0242499999999999, 95th: 2.1103499999999977, 98th: 5.2102600000000034, 99th: 13.24835000000001, 999th: 41.89454000000041
This is measured around the Astyanax / Hector wrappers for running column slice queries and iterating over their results. FWIW they perform very similarly. Now, the mean is awesome - 1.2ms for a read is really really fast, i think thrift overhead makes up for probably 1ms alone. What i can't live with is pretty much everything thats beyond the 98th percentile. Perhaps my expectations are misaligned to begin with and this is great, but the dataset under test that yielded this result ( see below ) is extremely small - only a 1000 items per test made up of a few small dynamic composite columns and the read is being done sequentially, from a single threaded client. I just don't understand whats causing the peaks. Here's the complete run-down on the setup: The data is dynamic composite columns with string or integer values. The DynamicComposites themselves are made up of 3 Components each: <String,String,Integer> and <String,Integer,Integer> The query is getting all columns ( in this particular test, that will be a single column ) of one of the composite types ( <String, String, Integer> or <String, Integer, Integer> ) for a particular string key. ( How this is achieved is more or less at https://gist.github.com/3207894 ) The production cluster is set up as follows: 30-40 read/sec clients from multiple hosts, NetworkTopologyStrategy, 2 replicas per DC, read consistency: ONE, 2DCs, 2 nodes per DC ) 8GB multi-core physical hosts most settings on default in cassandra.yaml In an attempt to isolate the problem, i've done the following ( this is where the results above are coming from but we see roughly the same numbers in prod ) - created a short unit test around the ( two ) classes that wrap the Astyanax and Hector libraries in our application, in which i'm only testing the read performance - in setUp() i start Cassandra with CassandraStoreKeeperTest.cassandra = new CassandraDaemon(); CassandraStoreKeeperTest.cassandra.init(null); CassandraStoreKeeperTest.cassandra.start(); - same cassandra yaml as in prod, file at https://gist.github.com/3207894 ( mostly defaults ) - i pass the following flags to the JVM running my unit test: -server -Xmx2G and all the GC flags from cassandra-env.sh, plus the jamm java agent - i warm up the embedded cassandra with 1k reads and 1k writes - the test is 1000 single threaded, sequential reads via Hector and Astyanax ( no significant differences between the two ) via thrift to the embedded single-node Cassandra instance - i read back the 1000 items saved in the warm-up The results are Mean: 1.186341, Max: 41.912 stDev: 2.2490502737495333, 75th:1.0242499999999999, 95th: 2.1103499999999977, 98th: 5.2102600000000034, 99th: 13.24835000000001, 999th: 41.89454000000041 here's the GC output during the read [GC [ParNew Desired survivor size 1081344 bytes, new threshold 1 (max 1) - age 1: 281480 bytes, 281480 total : 17322K->308K(19136K), 0.0037375 secs] 28742K->11885K(83008K), 0.0038170 secs] [Times: user=0.01 sys=0.00, real=0.01 secs] [GC [ParNew Desired survivor size 1081344 bytes, new threshold 1 (max 1) - age 1: 261360 bytes, 261360 total : 17332K->386K(19136K), 0.0034069 secs] 28909K->12098K(83008K), 0.0034849 secs] [Times: user=0.00 sys=0.01, real=0.00 secs] [GC [ParNew Desired survivor size 1081344 bytes, new threshold 1 (max 1) - age 1: 241336 bytes, 241336 total : 17410K->383K(19136K), 0.0035252 secs] 29122K->12209K(83008K), 0.0036062 secs] [Times: user=0.00 sys=0.00, real=0.00 secs] Doesn't look like GC ( i could be wrong ) - what else could it be ? all suggestions are highly appreciated.. Thanks! Andras ps: this is with Cassandra 1.1.0