> I am querying in batches of 256 keys max. Each batch may slice between 1 and > 5 explicit super columns (I need all the columns in each super column, there > are at the very most a couple dozen columns per SC). That's a pretty high row count, bigger is not always better.
I just remembered you are using the BOP. Are the rows you are reading all on the same node ? Is the load evenly distributed across the cluster ? it sounds like a single node is getting overloaded and the others are doing little. In your isolated experiment. > Another experiment : I stopped the process that does all the reading and a > little of the writing. All that's left is a single-threaded process that > sending counter updates as fast as it can in batches of up to 50 mutations. > First replica : pending counts go up into the low hundreds and back to 0, > active up to 3 or 5 and that's a max. Some mutation stage active & pendings > => the process is indeed faster at updating the counters so that doesn't > surprise me given that a counter write requires a read. > Second & third replicas : no read stage pendings at all. A little > RequestResponseStage as earlier. What CL are you using ? Which thread pool is showing pending ? Cheers ----------------- Aaron Morton Freelance Developer @aaronmorton http://www.thelastpickle.com On 22/12/2011, at 11:15 AM, Philippe wrote: > along the same line of the last experimient I did (cluster is only being > updated by a single threaded batching processing.) > All nodes are the same hardware & configuration. Why on earth would one node > require disk IO and not the 2 replicas ? > > Primary replica show some disk activity (iostat shows about 40%) > ----total-cpu-usage---- -dsk/total- > usr sys idl wai hiq siq| read writ > 67 10 19 2 0 3|4244k 364k| > > where as 2nd & 3rd replica do not > ----total-cpu-usage---- -dsk/total- > usr sys idl wai hiq siq| read writ > 42 13 41 0 0 3| 0 0 | > 47 15 34 0 0 4|4096B 185k > 49 14 35 0 0 3| 0 8192B > 47 16 33 0 0 4| 0 4096B > 44 13 41 0 0 3| 284k 112k > > 3rd > 11 2 87 1 0 0| 0 136k| > 0 0 99 0 0 0| 0 0 > 9 1 90 0 0 0|4096B 128k > 2 2 96 0 0 0| 0 0 > 0 0 99 0 0 0| 0 0 > 11 1 87 0 0 0| 0 128k > > > Philippe > 2011/12/21 Philippe <watche...@gmail.com> > Hi Aaron, > > >How many rows are you asking for in the multget_slice and what thread pools > >are showing pending tasks ? > I am querying in batches of 256 keys max. Each batch may slice between 1 and > 5 explicit super columns (I need all the columns in each super column, there > are at the very most a couple dozen columns per SC). > > On the first replica, only ReadStage ever shows any pending. All the others > have 1 to 10 pending from time to time only. Here's a typical "high pending > count" reading on the first replica for the data hotspot. > ReadStage 13 5238 10374301128 0 > 0 > I've got a watch running every two seconds and I see the numbers vary every > time going from that high point to 0 active, 0 pending. The one thing I've > noticed is that I hardly every see the Active count stay up at the current 2s > sampling rate. > On the 2 other replicas, I hardly ever see any pendings on ReadStage and > Active hardly goes up to 1 or 2. But I do see a little PENDING on > RequestResponseStage, goes up in the tens or hundreds from time to time. > > > If I'm flooding that one replica, shouldn't the ReadStage Active count be at > maximum capacity ? > > > I've already thought of CASSANDRA-2980 but I'm running 0.8.7 and 0.8.9. > > Also, what happens when you reduce the number of rows in the request? > I've reduced the requests to batches of 16. I've had to increased the number > of threads from 30 to 90 in order to get the same key throughput because the > throughput I measure drastically goes down on a per thread basis. > What I see : > - CPU utilization is lower on the first replica (why would that be if the > batches are smaller ?) > - Pending ReadStage on first replica seems to be staying higher longer. > Still goes down to 0 regularly. > - lowering to 60 client threads, I see non-zero active MutationStage and > ReplicateOnWriteStage more often > For our use-case, the higher the throughput per client thread, the less > rework will be done in our processing. > > Another experiment : I stopped the process that does all the reading and a > little of the writing. All that's left is a single-threaded process that > sending counter updates as fast as it can in batches of up to 50 mutations. > First replica : pending counts go up into the low hundreds and back to 0, > active up to 3 or 5 and that's a max. Some mutation stage active & pendings > => the process is indeed faster at updating the counters so that doesn't > surprise me given that a counter write requires a read. > Second & third replicas : no read stage pendings at all. A little > RequestResponseStage as earlier. > > Cheers > Philippe > > Cheers > > ----------------- > Aaron Morton > Freelance Developer > @aaronmorton > http://www.thelastpickle.com > > On 21/12/2011, at 11:57 AM, Philippe wrote: > >> Hello, >> 5 nodes running 0.8.7/0.8.9, RF=3, BOP, counter columns inside super >> columns. Read queries are multigetslices of super columns inside of which I >> read every column for processing (20-30 at most), using Hector with default >> settings. >> Watching tpstat on the 3 nodes holding the data being most often queries, I >> see the pending count increase only on the "main replica" and I see heavy >> CPU load and network load only on that node. The other nodes seem to be >> doing very little. >> >> Aren't counter read requests supposed to be round-robin across replicas ? >> I'm confused as to why the nodes don't exhibit the same load. >> >> Thanks > > >