Thanks for the probing questions. Just got around to looking into this a bit more and discovered it's not exactly what I thought. It's actually a single multiget_slice with a fairly slow response time.
I've answered as many of the questions as I can, and have also included a reply to Rob's statements below. In short, I don't fully understand the read latency metrics from JMX but they don't seem to line up well with my client-side latency measurements. All numeric stats are from a dev environment which has smaller use+data volume than production but still exhibits long multiget_slice waits leading to client timeouts. Responses inline: On Fri, Feb 4, 2011 at 3:02 PM, aaron morton <aa...@thelastpickle.com>wrote: > What operation are you calling ? Are you trying to read the entire row > back? > It's a multiget_slice of 10-200 rows from a single CF. Reading two columns for each row, which is the entire row entry. > How many SSTables do you have for the CF? Does your data have a lot of > overwrites ? Have you modified the default compaction settings ? Compaction settings are default AFAIK. However is currently a 1-node cluster, so it's running with gc_grace_seconds=0. If overwrite means an update (?), yes--though the data will always be the same, we do write the same entry repeatedly as a way of refreshing its TTL. For some entries, this may be multiple times per minute. LiveDiskSpaceUsed: 1927107766 LiveSSTableCount: 6 > Do you have row cache enabled ? > No. Given the fairly large set of data and occasional random access, I doubt that row cache will be a good solution to this problem? Here's the settings: ColumnFamily: Backtraces Columns sorted by: org.apache.cassandra.db.marshal.BytesType Row cache size / save period: 0.0/0 Key cache size / save period: 200000.0/3600 Memtable thresholds: 0.3/64/60 GC grace seconds: 0 Compaction min/max thresholds: 4/32 Read repair chance: 1.0 Built indexes: [] Can you use JConsole to check the read latency for the CF? > Used a jython JMX script I found on the mailing list. Output is below, though I'm not sure how to interpret the histograms. I take it RecentReadLatency is the very last observed request? If so, this was for a multiget of 99 items. Or is that for one particular get in the multiget? objectname= org.apache.cassandra.db:type=ColumnFamilies,keyspace=Tracelytics,columnfamily=Backtraces PendingTasks: 0 MemtableFlushAfterMins: 60 MemtableOperationsInMillions: 0.3 MinRowSize: 310 MaxRowSize: 43388628 MeanRowSize: 1848 ColumnFamilyName: Backtraces MemtableColumnsCount: 53880 MemtableDataSize: 51052251 MemtableSwitchCount: 470 RecentSSTablesPerReadHistogram: array('l',[34L, 0L, 6L, 12L, 10L, 101L, 106L, 160L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L]) SSTablesPerReadHistogram: array('l',[1747L, 2196L, 1319L, 578L, 1300L, 5815L, 2108L, 1649L, 13L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L]) ReadCount: 16725 RecentReadLatencyMicros: 133399.0536130536 LifetimeReadLatencyHistogramMicros: array('l',[0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 38L, 52L, 57L, 50L, 92L, 166L, 289L, 283L, 318L, 223L, 137L, 108L, 65L, 57L, 287L, 549L, 408L, 357L, 538L, 734L, 883L, 548L, 478L, 329L, 244L, 226L, 168L, 262L, 291L, 290L, 355L, 611L, 965L, 836L, 694L, 381L, 136L, 137L, 178L, 326L, 181L, 140L, 188L, 151L, 146L, 146L, 105L, 130L, 102L, 106L, 130L, 135L, 144L, 149L, 173L, 201L, 191L, 188L, 208L, 203L, 155L, 122L, 74L, 45L, 26L, 18L, 15L, 5L, 2L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L]) RecentReadLatencyHistogramMicros: array('l',[0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 2L, 2L, 2L, 8L, 8L, 5L, 4L, 1L, 1L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 4L, 0L, 0L, 0L, 4L, 9L, 3L, 4L, 8L, 13L, 24L, 19L, 32L, 51L, 22L, 17L, 7L, 1L, 1L, 1L, 1L, 4L, 2L, 1L, 2L, 2L, 2L, 2L, 2L, 5L, 12L, 9L, 11L, 8L, 8L, 15L, 13L, 20L, 12L, 15L, 13L, 7L, 5L, 5L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L]) TotalReadLatencyMicros: 1121884414 WriteCount: 10944316 TotalWriteLatencyMicros: 386100317 RecentWriteLatencyMicros: 34.614922206506364 LifetimeWriteLatencyHistogramMicros: array('l',[0L, 9L, 1181L, 5950L, 12524L, 19751L, 27521L, 34963L, 96032L, 158925L, 278830L, 615547L, 664807L, 951370L, 1334996L, 1652907L, 1745019L, 1517550L, 1091827L, 520153L, 164867L, 34155L, 6750L, 3239L, 1890L, 909L, 534L, 398L, 285L, 232L, 179L, 159L, 133L, 130L, 85L, 71L, 64L, 42L, 40L, 27L, 40L, 40L, 33L, 15L, 8L, 12L, 8L, 7L, 7L, 11L, 10L, 8L, 5L, 7L, 5L, 12L, 19L, 11L, 5L, 0L, 0L, 1L, 0L, 0L, 0L, 1L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L]) RecentWriteLatencyHistogramMicros: array('l',[0L, 0L, 0L, 0L, 0L, 2L, 3L, 3L, 17L, 127L, 228L, 526L, 544L, 967L, 1521L, 1953L, 2177L, 1801L, 1043L, 333L, 47L, 6L, 7L, 1L, 2L, 2L, 1L, 0L, 0L, 1L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L]) TotalDiskSpaceUsed: 12194409539 LiveDiskSpaceUsed: 1936759116 LiveSSTableCount: 7 BloomFilterFalsePositives: 2 RecentBloomFilterFalsePositives: 1 BloomFilterFalseRatio: 7.037297677691766E-4 RecentBloomFilterFalseRatio: 0.0 MinimumCompactionThreshold: 4 MaximumCompactionThreshold: 32 MemtableThroughputInMB: 64 EstimatedRowSizeHistogram: array('l',[0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 1L, 2L, 1799L, 1192L, 37786L, 196298L, 346833L, 154421L, 87274L, 33783L, 11927L, 7079L, 4770L, 9197L, 18893L, 8010L, 2984L, 1856L, 1332L, 988L, 719L, 845L, 782L, 504L, 725L, 315L, 257L, 290L, 307L, 205L, 139L, 142L, 92L, 103L, 90L, 84L, 112L, 44L, 76L, 50L, 24L, 37L, 15L, 31L, 23L, 9L, 6L, 17L, 8L, 3L, 5L, 10L, 3L, 0L, 0L, 8L, 2L, 2L, 0L, 1L, 1L, 0L]) EstimatedColumnCountHistogram: array('l',[0L, 0L, 3430L, 350443L, 28549L, 15586L, 9720L, 7111L, 9914L, 8150L, 4278L, 6069L, 2684L, 4458L, 2978L, 1677L, 1516L, 1207L, 1417L, 1528L, 901L, 801L, 765L, 790L, 437L, 627L, 476L, 369L, 309L, 415L, 576L, 516L, 34038L, 105418L, 229613L, 37927L, 38270L, 7787L, 2703L, 1769L, 1128L, 1093L, 1439L, 567L, 514L, 444L, 231L, 193L, 132L, 218L, 251L, 177L, 191L, 40L, 59L, 81L, 86L, 60L, 45L, 52L, 36L, 45L, 33L, 14L, 8L, 21L, 21L, 24L, 10L, 3L, 9L, 5L, 18L, 5L, 3L, 15L, 0L, 1L, 3L, 1L, 1L, 0L, 0L, 8L, 2L, 2L, 0L, 0L, 0L, 0L]) BuiltIndexes: [] Re: Rob's comments: Cassandra exposes metrics on a per-CF basis which indicate latency. This includes both cache hits and misses, as well as requests for rows which do not exist. It does NOT include an assortment of other latency causing things, like thrift. If you see two seconds of latency from the perspective of your application, you should compare it to the latency numbers Cassandra reports. If you are getting timed-out exceptions, that does seem relatively likely to be the cold cache "I went to disk" case, and the Cassandra latency numbers should reflect that. On the client side, I measured ~2 seconds elapsed for the 99-item multiget, after which the client (pycassa) timed it out. The timing is measured at the thrift client level below pycassa, so it should avoid library internals. I'm not sure how to compare this to the jmx latency report. Apologies for being so verbose! dan Sorry for all the questions, the answer to your initial question is "mmm, > that does not sound right. It will depend on...." > > Aaron > > On 5 Feb 2011, at 08:13, Dan Kuebrich wrote: > > > Hi all, > > > > It often takes more than two seconds to load: > > > > - one row of ~450 events comprising ~600k > > - cluster size of 1 > > - client is pycassa 1.04 > > - timeout on recv > > - cold read (I believe) > > - load generally < 0.5 on a 4-core machine, 2 EC2 instance store drives > for cassandra > > - cpu wait generally < 1% > > > > Often the following sequence occurs: > > > > 1. First attempt times out after 2 sec > > 2. Second attempt loads fine on immediate retry > > > > So, I assume it's an issue about cache miss and going to disk. Is 2 > seconds the normal "I went to disk" latency for cassandra? What should we > look to tune, if anything? I don't think keeping everything in-memory is an > option for us given dataset size and access pattern (hot set is stuff being > currently written, stuff being accessed is likely to be older). > > > > I didn't notice this problem with cassandra 0.6.8 and pycassa 0.3. > > > > Thanks, > > dan > >