> So basically, I'm flooding the system right ? For example 99303 means there > are 99303 key reads pending, possibly from just a couple MultiSlice gets ?
Yes and then some. Each row you ask for in a multiget turns into a single row request in the server. You are overloading the server. > - exactly how much data are you asking for ? how many rows and what sort of > slice > According to some munin monitoring, the server is cranking out to the client, > over the network, 10Mbits/s = 1.25 Mbytes/s Was thinking in terms of rows, but thats irrelevant now. The answer is "a lot" > BTW, cassandra is running on an XFS filesystem over LVM Others know more about this than me. > One question : when nodetool cfstats says the average read latency is 5ms, is > that counted once the query is being executed or does that include the time > spent "pending" ? In the cf stats output the latency displayed under the Keyspace is the total latency for all CF's / the total read count. The latency displayed for the individual CF's is for the actual time taken getting the columns requested for a row. It's taking 5ms to read the data from disk and apply the filter. I'd check you are reading the data you expect then wind back the number of requests and rows / columns requested. Get to a stable baseline and then add pressure to see when / how things go wrong. Hope that helps. ----------------- Aaron Morton Freelance Cassandra Developer @aaronmorton http://www.thelastpickle.com On 8 Jun 2011, at 08:00, Philippe wrote: > Aaron, > > - what version are you on ? > 0.7.6-2 > > - what is the concurrent_reads config setting ? > concurrent_reads: 64 > concurrent_writes: 64 > > Givent that I've got 4 cores and SSD drives, I doubled the concurrent writes > recommended. > Given that I've RAID-0ed the SSD drive, I figured I could at least double for > SSD and double for RAID-0 the recommended version. > Wrong assumptions ? > > BTW, cassandra is running on an XFS filesystem over LVM over RAID-0 > > - what is nodetool tpstats showing during the slow down ? > The only value that changes is the ReadStage line. Here's values from a > sample every second > Pool Name Active Pending Completed > ReadStage 64 99303 463085056 > ReadStage 64 88430 463095929 > ReadStage 64 91937 463107782 > > So basically, I'm flooding the system right ? For example 99303 means there > are 99303 key reads pending, possibly from just a couple MultiSlice gets ? > > - exactly how much data are you asking for ? how many rows and what sort of > slice > According to some munin monitoring, the server is cranking out to the client, > over the network, 10Mbits/s = 1.25 Mbytes/s > > The same munin monitoring shows me 200Mbytes/s read from the disks. This is > what is worrying me... > > - has their been a lot of deletes or TTL columns used ? > No deletes, only update, don't know if that counts as deletes though... > > This is going to be a read-heavy, update-heavy cluster. > No TTL columns, no counter columns > > One question : when nodetool cfstats says the average read latency is 5ms, is > that counted once the query is being executed or does that include the time > spent "pending" ? > > Thanks > Philippe > > Hope that helps. > Aaron > > ----------------- > Aaron Morton > Freelance Cassandra Developer > @aaronmorton > http://www.thelastpickle.com > > On 7 Jun 2011, at 10:09, Philippe wrote: > >> Ok, here it goes again... No swapping at all... >> >> procs -----------memory---------- ---swap-- -----io---- -system-- ----cpu---- >> r b swpd free buff cache si so bi bo in cs us sy id wa >> 1 63 32044 88736 37996 7116524 0 0 227156 0 18314 5607 30 5 >> 11 53 >> 1 63 32044 90844 37996 7103904 0 0 233524 202 17418 4977 29 4 >> 9 58 >> 0 42 32044 91304 37996 7123884 0 0 249736 0 16197 5433 19 6 >> 3 72 >> 3 25 32044 89864 37996 7135980 0 0 223140 16 18135 7567 32 5 >> 11 52 >> 1 1 32044 88664 37996 7150728 0 0 229416 128 19168 7554 36 4 >> 10 51 >> 4 0 32044 89464 37996 7149428 0 0 213852 18 21041 8819 45 5 >> 12 38 >> 4 0 32044 90372 37996 7149432 0 0 233086 142 19909 7041 43 5 >> 10 41 >> 7 1 32044 89752 37996 7149520 0 0 206906 0 19350 6875 50 4 >> 11 35 >> >> Lots and lots of disk activity >> iostat -dmx 2 >> Device: rrqm/s wrqm/s r/s w/s rMB/s wMB/s avgrq-sz >> avgqu-sz await r_await w_await svctm %util >> sda 52.50 0.00 7813.00 0.00 108.01 0.00 28.31 >> 117.15 14.89 14.89 0.00 0.11 83.00 >> sdb 56.00 0.00 7755.50 0.00 108.51 0.00 28.66 >> 118.67 15.18 15.18 0.00 0.11 82.80 >> md1 0.00 0.00 0.00 0.00 0.00 0.00 0.00 >> 0.00 0.00 0.00 0.00 0.00 0.00 >> md5 0.00 0.00 15796.50 0.00 219.21 0.00 28.42 >> 0.00 0.00 0.00 0.00 0.00 0.00 >> dm-0 0.00 0.00 15796.50 0.00 219.21 0.00 28.42 >> 273.42 17.03 17.03 0.00 0.05 83.40 >> dm-1 0.00 0.00 0.00 0.00 0.00 0.00 0.00 >> 0.00 0.00 0.00 0.00 0.00 0.00 >> >> More info : >> - all the data directory containing the data I'm querying into is 9.7GB and >> this is a server with 16GB >> - I'm hitting the server with 6 concurrent multigetsuperslicequeries on >> multiple keys, some of them can bring back quite a number of data >> - I'm reading all the keys for one column, pretty much sequentially >> >> This is a query in a rollup table that was originally in MySQL and it >> doesn't look like the performance to query by key is better. So I'm betting >> I'm doing something wrong here... but what ? >> >> Any ideas ? >> Thanks >> >> 2011/6/6 Philippe <watche...@gmail.com> >> hum..no, it wasn't swapping. cassandra was the only thing running on that >> server >> and i was querying the same keys over and over >> >> i restarted Cassandra and doing the same thing, io is now down to zero while >> cpu is up which dosen't surprise me as much. >> >> I'll report if it happens again. >> >> Le 5 juin 2011 16:55, "Jonathan Ellis" <jbel...@gmail.com> a écrit : >> >> > You may be swapping. >> > >> > http://spyced.blogspot.com/2010/01/linux-performance-basics.html >> > explains how to check this as well as how to see what threads are busy >> > in the Java process. >> > >> > On Sat, Jun 4, 2011 at 5:34 PM, Philippe <watche...@gmail.com> wrote: >> >> Hello, >> >> I am evaluating using cassandra and I'm running into some strange IO >> >> behavior that I can't explain, I'd like some help/ideas to troubleshoot >> >> it. >> >> I am running a 1 node cluster with a keyspace consisting of two columns >> >> families, one of which has dozens of supercolumns itself containing dozens >> >> of columns. >> >> All in all, this is a couple gigabytes of data, 12GB on the hard drive. >> >> The hardware is pretty good : 16GB memory + RAID-0 SSD drives with LVM and >> >> an i5 processor (4 cores). >> >> Keyspace: xxxxxxxxxxxxxxxxxxx >> >> Read Count: 460754852 >> >> Read Latency: 1.108205793092766 ms. >> >> Write Count: 30620665 >> >> Write Latency: 0.01411020877567486 ms. >> >> Pending Tasks: 0 >> >> Column Family: xxxxxxxxxxxxxxxxxxxxxxxxxx >> >> SSTable count: 5 >> >> Space used (live): 548700725 >> >> Space used (total): 548700725 >> >> Memtable Columns Count: 0 >> >> Memtable Data Size: 0 >> >> Memtable Switch Count: 11 >> >> Read Count: 2891192 >> >> Read Latency: NaN ms. >> >> Write Count: 3157547 >> >> Write Latency: NaN ms. >> >> Pending Tasks: 0 >> >> Key cache capacity: 367396 >> >> Key cache size: 367396 >> >> Key cache hit rate: NaN >> >> Row cache capacity: 112683 >> >> Row cache size: 112683 >> >> Row cache hit rate: NaN >> >> Compacted row minimum size: 125 >> >> Compacted row maximum size: 924 >> >> Compacted row mean size: 172 >> >> Column Family: yyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyy >> >> SSTable count: 7 >> >> Space used (live): 8707538781 >> >> Space used (total): 8707538781 >> >> Memtable Columns Count: 0 >> >> Memtable Data Size: 0 >> >> Memtable Switch Count: 30 >> >> Read Count: 457863660 >> >> Read Latency: 2.381 ms. >> >> Write Count: 27463118 >> >> Write Latency: NaN ms. >> >> Pending Tasks: 0 >> >> Key cache capacity: 4518387 >> >> Key cache size: 4518387 >> >> Key cache hit rate: 0.9247881700850826 >> >> Row cache capacity: 1349682 >> >> Row cache size: 1349682 >> >> Row cache hit rate: 0.39400533823415573 >> >> Compacted row minimum size: 125 >> >> Compacted row maximum size: 6866 >> >> Compacted row mean size: 165 >> >> My app makes a bunch of requests using a MultigetSuperSliceQuery for a set >> >> of keys, typically a couple dozen at most. It also selects a subset of the >> >> supercolumns. I am running 8 requests in parallel at most. >> >> >> >> Two days, I ran a 1.5 hour process that basically read every key. The >> >> server >> >> had no IOwaits and everything was humming along. However, right at the end >> >> of the process, there was a huge spike in IOs. I didn't think much of it. >> >> Today, after two days of inactivity, any query I run raises the IOs to 80% >> >> utilization of the SSD drives even though I'm running the same query over >> >> and over (no cache??) >> >> Any ideas on how to troubleshoot this, or better, how to solve this ? >> >> thanks >> >> Philippe >> > >> > >> > >> > -- >> > Jonathan Ellis >> > Project Chair, Apache Cassandra >> > co-founder of DataStax, the source for professional Cassandra support >> > http://www.datastax.com >> > >