> > That's a pretty high row count, bigger is not always better. > Yes, I've learned that ! However in my case, it is better for the throughput per thread. It may be that the whole cluster throughput is a lower but in my case, higher throughput per thread is better
> I just remembered you are using the BOP. Are the rows you are reading all > on the same node ? Is the load evenly distributed across the cluster ? it > sounds like a single node is getting overloaded and the others are doing > little. > No, the data being read/write most often is definitaly on a single replica. That I understand and I know I must rebalance... My question is why isn't the ReadStage showing similar performance across all three replicas. > In your isolated experiment. > > Another experiment : I stopped the process that does all the reading and a >> little of the writing. All that's left is a single-threaded process that >> sending counter updates as fast as it can in batches of up to 50 mutations. >> First replica : pending counts go up into the low hundreds and back to 0, >> active up to 3 or 5 and that's a max. Some mutation stage active & pendings >> => the process is indeed faster at updating the counters so that doesn't >> surprise me given that a counter write requires a read. >> Second & third replicas : no read stage pendings at all. A >> little RequestResponseStage as earlier. >> > What CL are you using ? > Always forget that one... using QUORUM > Which thread pool is showing pending ? > ReadStage is the one I'm talking about above when I don't mention the stage explicitely. Thanks > > Cheers > > ----------------- > Aaron Morton > Freelance Developer > @aaronmorton > http://www.thelastpickle.com > > On 22/12/2011, at 11:15 AM, Philippe wrote: > > along the same line of the last experimient I did (cluster is only being > updated by a single threaded batching processing.) > All nodes are the same hardware & configuration. Why on earth would one > node require disk IO and not the 2 replicas ? > > Primary replica show some disk activity (iostat shows about 40%) > ----total-cpu-usage---- -dsk/total- > usr sys idl wai hiq siq| read writ > 67 10 19 2 0 3|4244k 364k| > > where as 2nd & 3rd replica do not > ----total-cpu-usage---- -dsk/total- > usr sys idl wai hiq siq| read writ > 42 13 41 0 0 3| 0 0 | > 47 15 34 0 0 4|4096B 185k > 49 14 35 0 0 3| 0 8192B > 47 16 33 0 0 4| 0 4096B > 44 13 41 0 0 3| 284k 112k > > 3rd > 11 2 87 1 0 0| 0 136k| > 0 0 99 0 0 0| 0 0 > 9 1 90 0 0 0|4096B 128k > 2 2 96 0 0 0| 0 0 > 0 0 99 0 0 0| 0 0 > 11 1 87 0 0 0| 0 128k > > > Philippe > 2011/12/21 Philippe <watche...@gmail.com> > >> Hi Aaron, >> >> >How many rows are you asking for in the multget_slice and what thread >> pools are showing pending tasks ? >> I am querying in batches of 256 keys max. Each batch may slice between 1 >> and 5 explicit super columns (I need all the columns in each super column, >> there are at the very most a couple dozen columns per SC). >> >> On the first replica, only ReadStage ever shows any pending. All the >> others have 1 to 10 pending from time to time only. Here's a typical "high >> pending count" reading on the first replica for the data hotspot. >> ReadStage 13 5238 10374301128 0 >> 0 >> I've got a watch running every two seconds and I see the numbers vary >> every time going from that high point to 0 active, 0 pending. The one thing >> I've noticed is that I hardly every see the Active count stay up at the >> current 2s sampling rate. >> On the 2 other replicas, I hardly ever see any pendings on ReadStage and >> Active hardly goes up to 1 or 2. But I do see a little PENDING >> on RequestResponseStage, goes up in the tens or hundreds from time to time. >> >> >> If I'm flooding that one replica, shouldn't the ReadStage Active count be >> at maximum capacity ? >> >> >> I've already thought of CASSANDRA-2980 but I'm running 0.8.7 and 0.8.9. >> >> Also, what happens when you reduce the number of rows in the request? >>> >> I've reduced the requests to batches of 16. I've had to increased the >> number of threads from 30 to 90 in order to get the same key throughput >> because the throughput I measure drastically goes down on a per thread >> basis. >> What I see : >> - CPU utilization is lower on the first replica (why would that be if >> the batches are smaller ?) >> - Pending ReadStage on first replica seems to be staying higher longer. >> Still goes down to 0 regularly. >> - lowering to 60 client threads, I see non-zero active MutationStage and >> ReplicateOnWriteStage more often >> For our use-case, the higher the throughput per client thread, the less >> rework will be done in our processing. >> >> Another experiment : I stopped the process that does all the reading and >> a little of the writing. All that's left is a single-threaded process that >> sending counter updates as fast as it can in batches of up to 50 mutations. >> First replica : pending counts go up into the low hundreds and back to 0, >> active up to 3 or 5 and that's a max. Some mutation stage active & pendings >> => the process is indeed faster at updating the counters so that doesn't >> surprise me given that a counter write requires a read. >> Second & third replicas : no read stage pendings at all. A >> little RequestResponseStage as earlier. >> >> Cheers >> Philippe >> >>> >>> Cheers >>> >>> ----------------- >>> Aaron Morton >>> Freelance Developer >>> @aaronmorton >>> http://www.thelastpickle.com >>> >>> On 21/12/2011, at 11:57 AM, Philippe wrote: >>> >>> Hello, >>> 5 nodes running 0.8.7/0.8.9, RF=3, BOP, counter columns inside super >>> columns. Read queries are multigetslices of super columns inside of which I >>> read every column for processing (20-30 at most), using Hector with default >>> settings. >>> Watching tpstat on the 3 nodes holding the data being most often >>> queries, I see the pending count increase only on the "main replica" and I >>> see heavy CPU load and network load only on that node. The other nodes seem >>> to be doing very little. >>> >>> Aren't counter read requests supposed to be round-robin across replicas >>> ? I'm confused as to why the nodes don't exhibit the same load. >>> >>> Thanks >>> >>> >>> >> > >