Yes, we finally got to the bottom of it. There was some code in our web tier so although our client load had not changed, the webtier was reading a lot more than usual. It was a good experience still to debug something like this and get to the bottom of it. Always seem to be learning a new corner of the system.
It just happened someone started using this new feature that slammed our servers due to a bug when we added our first node to the cluster. We will be adding the second node tomorrow as things look great on version 1.2.2. Thanks, Dean From: aaron morton <aa...@thelastpickle.com<mailto:aa...@thelastpickle.com>> Reply-To: "user@cassandra.apache.org<mailto:user@cassandra.apache.org>" <user@cassandra.apache.org<mailto:user@cassandra.apache.org>> Date: Tuesday, May 14, 2013 12:44 PM To: "user@cassandra.apache.org<mailto:user@cassandra.apache.org>" <user@cassandra.apache.org<mailto:user@cassandra.apache.org>> Subject: Re: (better info)any way to get the #writes/second, reads per second Any reason why cassandra might be reading a lot from the data disks(not the commit log disk) more than usual? On the new node or all nodes ? Maybe cold Key Cache or cold memmapped files due to a change in the data distribution ? Did it settle down ? Cheers ----------------- Aaron Morton Freelance Cassandra Consultant New Zealand @aaronmorton http://www.thelastpickle.com On 14/05/2013, at 5:06 AM, "Hiller, Dean" <dean.hil...@nrel.gov<mailto:dean.hil...@nrel.gov>> wrote: Ah, okay iostat -x NEEDS a number like "iostat -x 5" works better(first one always shows 4% util while second one shows 100%). Iotop seems a bit better here. So we know that since we added our new node, we are slammed with read and no one is running compations according to "clush -g datanodes nodetool compactionstats" Any reason why cassandra might be reading a lot from the data disks(not the commit log disk) more than usual? Thanks, Dean On 5/13/13 10:46 AM, "Hiller, Dean" <dean.hil...@nrel.gov<mailto:dean.hil...@nrel.gov>> wrote: We running a pretty consistent load on our cluster and added a new node to a 6 node cluster Friday(QA worked great, but production not so much). One mistake that was made was starting up the new node, then disabling the firewall :( which allowed nodes to discover it BEFORE the node bootstrapped itself. We shutdown the node and booted him up and he bootstrapped himself streaming all the data in. After that though, all the ndoes have really really high load numbers now. We are trying to figure out what is going on still. Is there any way to get the number of reads/second and writes/second through JMX or something? The only way I can see of on doing this is manually calculating it by timing the read count and dividing by my manual stop watches start/stop times(timerange). Also, while my load is load average: 20.31, 19.10, 19.72 , what does a normal iostat look like? My iostat await time is 13.66 ms which I think is kind of bad, but not that bad to cause a load of 20.31? Device: rrqm/s wrqm/s r/s w/s rsec/s wsec/s avgrq-sz avgqu-sz await svctm %util sda 0.02 0.07 11.70 1.96 1353.67 702.88 150.58 0.19 13.66 3.61 4.93 sdb 0.00 0.02 0.11 0.46 20.72 97.54 206.70 0.00 1.33 0.67 0.04 Thanks, Dean