Yes, we finally got to the bottom of it.  There was some code in our web tier 
so although our client load had not changed, the webtier was reading a lot more 
than usual.  It was a good experience still to debug something like this and 
get to the bottom of it.  Always seem to be learning a new corner of the system.

It just happened someone started using this new feature that slammed our 
servers due to a bug when we added our first node to the cluster.  We will be 
adding the second node tomorrow as things look great on version 1.2.2.

Thanks,
Dean

From: aaron morton <aa...@thelastpickle.com<mailto:aa...@thelastpickle.com>>
Reply-To: "user@cassandra.apache.org<mailto:user@cassandra.apache.org>" 
<user@cassandra.apache.org<mailto:user@cassandra.apache.org>>
Date: Tuesday, May 14, 2013 12:44 PM
To: "user@cassandra.apache.org<mailto:user@cassandra.apache.org>" 
<user@cassandra.apache.org<mailto:user@cassandra.apache.org>>
Subject: Re: (better info)any way to get the #writes/second, reads per second

Any reason why cassandra might be reading a lot from the data disks(not
the commit log disk) more than usual?
On the new node or all nodes ?

Maybe cold Key Cache or cold memmapped files due to a change in the data 
distribution ?

Did it settle down ?

Cheers


-----------------
Aaron Morton
Freelance Cassandra Consultant
New Zealand

@aaronmorton
http://www.thelastpickle.com

On 14/05/2013, at 5:06 AM, "Hiller, Dean" 
<dean.hil...@nrel.gov<mailto:dean.hil...@nrel.gov>> wrote:

Ah, okay iostat -x NEEDS a number like "iostat -x 5" works better(first
one always shows 4% util while second one shows 100%).  Iotop seems a bit
better here.

So we know that since we added our new node, we are slammed with read and
no one is running compations according to "clush -g datanodes nodetool
compactionstats"

Any reason why cassandra might be reading a lot from the data disks(not
the commit log disk) more than usual?

Thanks,
Dean

On 5/13/13 10:46 AM, "Hiller, Dean" 
<dean.hil...@nrel.gov<mailto:dean.hil...@nrel.gov>> wrote:

We running a pretty consistent load on our cluster and added a new node
to a 6 node cluster Friday(QA worked great, but production not so much).
One mistake that was made was starting up the new node, then disabling
the firewall :( which allowed nodes to discover it BEFORE the node
bootstrapped itself.  We shutdown the node and booted him up and he
bootstrapped himself streaming all the data in.

After that though, all the ndoes have really really high load numbers
now.  We are trying to figure out what is going on still.

Is there any way to get the number of reads/second and writes/second
through JMX or something?  The only way I can see of on doing this is
manually calculating it by timing the read count and dividing by my
manual stop watches start/stop times(timerange).

Also, while my load is load average: 20.31, 19.10, 19.72 , what does a
normal iostat look like?  My iostat await time is 13.66 ms which I think
is kind of bad, but not that bad to cause a load of 20.31?

Device:         rrqm/s   wrqm/s     r/s     w/s   rsec/s   wsec/s
avgrq-sz avgqu-sz   await  svctm  %util
sda               0.02     0.07   11.70    1.96  1353.67   702.88
150.58     0.19   13.66   3.61   4.93
sdb               0.00     0.02    0.11    0.46    20.72    97.54
206.70     0.00    1.33   0.67   0.04

Thanks,
Dean


Reply via email to