> I am querying in batches of 256 keys max. Each batch may slice between 1 and 
> 5 explicit super columns (I need all the columns in each super column, there 
> are at the very most a couple dozen columns per SC).
That's a pretty high row count, bigger is not always better. 

I just remembered you are using the BOP. Are the rows you are reading all on 
the same node ? Is the load evenly distributed across the cluster ? it sounds 
like a single node is getting overloaded and the others are doing little. 

In your isolated experiment.
> Another experiment : I stopped the process that does all the reading and a 
> little of the writing. All that's left is a single-threaded process that 
> sending counter updates as fast as it can in batches of up to 50 mutations.
> First replica : pending counts go up into the low hundreds and back to 0, 
> active up to 3 or 5 and that's a max. Some mutation stage active & pendings 
> => the process is indeed faster at updating the counters so that doesn't 
> surprise me given that a counter write requires a read.
> Second & third replicas : no read stage pendings at all. A little 
> RequestResponseStage as earlier.


What CL are you using ? 
Which thread pool is showing pending ? 

Cheers

-----------------
Aaron Morton
Freelance Developer
@aaronmorton
http://www.thelastpickle.com

On 22/12/2011, at 11:15 AM, Philippe wrote:

> along the same line of the last experimient I did (cluster is only being 
> updated by a single threaded batching processing.)
> All nodes are the same hardware & configuration. Why on earth would one node 
> require disk IO and not the 2 replicas ?
> 
> Primary replica show some disk activity (iostat shows about 40%)
> ----total-cpu-usage---- -dsk/total- 
> usr sys idl wai hiq siq| read  writ
> 67  10  19   2   0   3|4244k  364k|
> 
> where as 2nd & 3rd replica do not
> ----total-cpu-usage---- -dsk/total- 
> usr sys idl wai hiq siq| read  writ
> 42  13  41   0   0   3|   0     0 |
>  47  15  34   0   0   4|4096B  185k
>  49  14  35   0   0   3|   0  8192B
>  47  16  33   0   0   4|   0  4096B
>  44  13  41   0   0   3| 284k  112k
> 
> 3rd
> 11   2  87   1   0   0|   0   136k|
>   0   0  99   0   0   0|   0     0 
>   9   1  90   0   0   0|4096B  128k
>   2   2  96   0   0   0|   0     0 
>   0   0  99   0   0   0|   0     0 
>  11   1  87   0   0   0|   0   128k
> 
> 
> Philippe
> 2011/12/21 Philippe <watche...@gmail.com>
> Hi Aaron,
> 
> >How many rows are you asking for in the multget_slice and what thread pools 
> >are showing pending tasks ?
> I am querying in batches of 256 keys max. Each batch may slice between 1 and 
> 5 explicit super columns (I need all the columns in each super column, there 
> are at the very most a couple dozen columns per SC).
> 
> On the first replica, only ReadStage ever shows any pending. All the others  
> have 1 to 10 pending from time to time only. Here's a typical "high pending 
> count" reading on the first replica for the data hotspot.
> ReadStage                        13      5238    10374301128         0        
>          0
> I've got a watch running every two seconds and I see the numbers vary every 
> time going from that high point to 0 active, 0 pending. The one thing I've 
> noticed is that I hardly every see the Active count stay up at the current 2s 
> sampling rate. 
> On the 2 other replicas, I hardly ever see any pendings on ReadStage and 
> Active hardly goes up to 1 or 2. But I do see a little PENDING on 
> RequestResponseStage, goes up in the tens or hundreds from time to time.
> 
> 
> If I'm flooding that one replica, shouldn't the ReadStage Active count be at 
> maximum capacity ?
> 
> 
> I've already thought of CASSANDRA-2980 but I'm running 0.8.7 and 0.8.9.
> 
> Also, what happens when you reduce the number of rows in the request?
> I've reduced the requests to batches of 16. I've had to increased the number 
> of threads from 30 to 90 in order to get the same key throughput because the 
> throughput I measure drastically goes down on a per thread basis.
> What I see :
>  - CPU utilization is lower on the first replica (why would that be if the 
> batches are smaller ?)
>  - Pending ReadStage on first replica seems to be staying higher longer. 
> Still goes down to 0 regularly.
>  - lowering to 60 client threads, I see non-zero active MutationStage and 
> ReplicateOnWriteStage more often
> For our use-case, the higher the throughput per client thread, the less 
> rework will be done in our processing.
> 
> Another experiment : I stopped the process that does all the reading and a 
> little of the writing. All that's left is a single-threaded process that 
> sending counter updates as fast as it can in batches of up to 50 mutations.
> First replica : pending counts go up into the low hundreds and back to 0, 
> active up to 3 or 5 and that's a max. Some mutation stage active & pendings 
> => the process is indeed faster at updating the counters so that doesn't 
> surprise me given that a counter write requires a read.
> Second & third replicas : no read stage pendings at all. A little 
> RequestResponseStage as earlier.
> 
> Cheers
> Philippe 
> 
> Cheers
> 
> -----------------
> Aaron Morton
> Freelance Developer
> @aaronmorton
> http://www.thelastpickle.com
> 
> On 21/12/2011, at 11:57 AM, Philippe wrote:
> 
>> Hello,
>> 5 nodes running 0.8.7/0.8.9, RF=3, BOP, counter columns inside super 
>> columns. Read queries are multigetslices of super columns inside of which I 
>> read every column for processing (20-30 at most), using Hector with default 
>> settings.
>> Watching tpstat on the 3 nodes holding the data being most often queries, I 
>> see the pending count increase only on the "main replica" and I see heavy 
>> CPU load and network load only on that node. The other nodes seem to be 
>> doing very little.
>> 
>> Aren't counter read requests supposed to be round-robin across replicas ? 
>> I'm confused as to why the nodes don't exhibit the same load.
>> 
>> Thanks
> 
> 
> 

Reply via email to