Ok, here it goes again... No swapping at all... procs -----------memory---------- ---swap-- -----io---- -system-- ----cpu---- r b swpd free buff cache si so bi bo in cs us sy id wa 1 63 32044 88736 37996 7116524 0 0 227156 0 18314 5607 30 5 11 53 1 63 32044 90844 37996 7103904 0 0 233524 202 17418 4977 29 4 9 58 0 42 32044 91304 37996 7123884 0 0 249736 0 16197 5433 19 6 3 72 3 25 32044 89864 37996 7135980 0 0 223140 16 18135 7567 32 5 11 52 1 1 32044 88664 37996 7150728 0 0 229416 128 19168 7554 36 4 10 51 4 0 32044 89464 37996 7149428 0 0 213852 18 21041 8819 45 5 12 38 4 0 32044 90372 37996 7149432 0 0 233086 142 19909 7041 43 5 10 41 7 1 32044 89752 37996 7149520 0 0 206906 0 19350 6875 50 4 11 35
Lots and lots of disk activity iostat -dmx 2 Device: rrqm/s wrqm/s r/s w/s rMB/s wMB/s avgrq-sz avgqu-sz await r_await w_await svctm %util sda 52.50 0.00 7813.00 0.00 108.01 0.00 28.31 117.15 14.89 14.89 0.00 0.11 83.00 sdb 56.00 0.00 7755.50 0.00 108.51 0.00 28.66 118.67 15.18 15.18 0.00 0.11 82.80 md1 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 md5 0.00 0.00 15796.50 0.00 219.21 0.00 28.42 0.00 0.00 0.00 0.00 0.00 0.00 dm-0 0.00 0.00 15796.50 0.00 219.21 0.00 28.42 273.42 17.03 17.03 0.00 0.05 83.40 dm-1 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 More info : - all the data directory containing the data I'm querying into is 9.7GB and this is a server with 16GB - I'm hitting the server with 6 concurrent multigetsuperslicequeries on multiple keys, some of them can bring back quite a number of data - I'm reading all the keys for one column, pretty much sequentially This is a query in a rollup table that was originally in MySQL and it doesn't look like the performance to query by key is better. So I'm betting I'm doing something wrong here... but what ? Any ideas ? Thanks 2011/6/6 Philippe <watche...@gmail.com> > hum..no, it wasn't swapping. cassandra was the only thing running on that > server > and i was querying the same keys over and over > > i restarted Cassandra and doing the same thing, io is now down to zero > while cpu is up which dosen't surprise me as much. > > I'll report if it happens again. > Le 5 juin 2011 16:55, "Jonathan Ellis" <jbel...@gmail.com> a écrit : > > > You may be swapping. > > > > http://spyced.blogspot.com/2010/01/linux-performance-basics.html > > explains how to check this as well as how to see what threads are busy > > in the Java process. > > > > On Sat, Jun 4, 2011 at 5:34 PM, Philippe <watche...@gmail.com> wrote: > >> Hello, > >> I am evaluating using cassandra and I'm running into some strange IO > >> behavior that I can't explain, I'd like some help/ideas to troubleshoot > it. > >> I am running a 1 node cluster with a keyspace consisting of two columns > >> families, one of which has dozens of supercolumns itself containing > dozens > >> of columns. > >> All in all, this is a couple gigabytes of data, 12GB on the hard drive. > >> The hardware is pretty good : 16GB memory + RAID-0 SSD drives with LVM > and > >> an i5 processor (4 cores). > >> Keyspace: xxxxxxxxxxxxxxxxxxx > >> Read Count: 460754852 > >> Read Latency: 1.108205793092766 ms. > >> Write Count: 30620665 > >> Write Latency: 0.01411020877567486 ms. > >> Pending Tasks: 0 > >> Column Family: xxxxxxxxxxxxxxxxxxxxxxxxxx > >> SSTable count: 5 > >> Space used (live): 548700725 > >> Space used (total): 548700725 > >> Memtable Columns Count: 0 > >> Memtable Data Size: 0 > >> Memtable Switch Count: 11 > >> Read Count: 2891192 > >> Read Latency: NaN ms. > >> Write Count: 3157547 > >> Write Latency: NaN ms. > >> Pending Tasks: 0 > >> Key cache capacity: 367396 > >> Key cache size: 367396 > >> Key cache hit rate: NaN > >> Row cache capacity: 112683 > >> Row cache size: 112683 > >> Row cache hit rate: NaN > >> Compacted row minimum size: 125 > >> Compacted row maximum size: 924 > >> Compacted row mean size: 172 > >> Column Family: yyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyy > >> SSTable count: 7 > >> Space used (live): 8707538781 > >> Space used (total): 8707538781 > >> Memtable Columns Count: 0 > >> Memtable Data Size: 0 > >> Memtable Switch Count: 30 > >> Read Count: 457863660 > >> Read Latency: 2.381 ms. > >> Write Count: 27463118 > >> Write Latency: NaN ms. > >> Pending Tasks: 0 > >> Key cache capacity: 4518387 > >> Key cache size: 4518387 > >> Key cache hit rate: 0.9247881700850826 > >> Row cache capacity: 1349682 > >> Row cache size: 1349682 > >> Row cache hit rate: 0.39400533823415573 > >> Compacted row minimum size: 125 > >> Compacted row maximum size: 6866 > >> Compacted row mean size: 165 > >> My app makes a bunch of requests using a MultigetSuperSliceQuery for a > set > >> of keys, typically a couple dozen at most. It also selects a subset of > the > >> supercolumns. I am running 8 requests in parallel at most. > >> > >> Two days, I ran a 1.5 hour process that basically read every key. The > server > >> had no IOwaits and everything was humming along. However, right at the > end > >> of the process, there was a huge spike in IOs. I didn't think much of > it. > >> Today, after two days of inactivity, any query I run raises the IOs to > 80% > >> utilization of the SSD drives even though I'm running the same query > over > >> and over (no cache??) > >> Any ideas on how to troubleshoot this, or better, how to solve this ? > >> thanks > >> Philippe > > > > > > > > -- > > Jonathan Ellis > > Project Chair, Apache Cassandra > > co-founder of DataStax, the source for professional Cassandra support > > http://www.datastax.com >