That's interesting.  I did an experiment wherein I added some entropy to the 
row name based on the time when the increment came in, (e.g. row = row + "/" + 
(timestamp - (timestamp % 300))) and now not only is the load (in GB) on my 
cluster more balanced, the performance has not decayed and has stayed steady 
(inserts/sec) with a relatively low average ms/insert.  Each row is now 
significantly shorter as a result of this change.



On Sep 2, 2011, at 12:30 AM, Sylvain Lebresne wrote:

> On Thu, Sep 1, 2011 at 8:52 PM, David Hawthorne <dha...@gmx.3crowd.com> wrote:
>> I'm curious... digging through the source, it looks like replicate on write 
>> triggers a read of the entire row, and not just the columns/supercolumns 
>> that are affected by the counter update.  Is this the case?  It would 
>> certainly explain why my inserts/sec decay over time and why the average 
>> insert latency increases over time.  The strange thing is that I'm not 
>> seeing disk read IO increase over that same period, but that might be due to 
>> the OS buffer cache...
> 
> It does not. It only reads the columns/supercolumns affected by the
> counter update.
> In the source, this happens in CounterMutation.java. If you look at
> addReadCommandFromColumnFamily you'll see that it does a query by name
> only for the column involved in the update (the update is basically
> the content of the columnFamily parameter there).
> 
> And Cassandra does *not* always reads a full row. Never had, never will.
> 
>> On another note, on a 5-node cluster, I'm only seeing 3 nodes with 
>> ReplicateOnWrite Completed tasks in nodetool tpstats output.  Is that 
>> normal?  I'm using RandomPartitioner...
>> 
>> Address         DC          Rack        Status State   Load            Owns  
>>   Token
>>                                                                            
>> 136112946768375385385349842972707284580
>> 10.0.0.57    datacenter1 rack1       Up     Normal  2.26 GB         20.00%  0
>> 10.0.0.56    datacenter1 rack1       Up     Normal  2.47 GB         20.00%  
>> 34028236692093846346337460743176821145
>> 10.0.0.55    datacenter1 rack1       Up     Normal  2.52 GB         20.00%  
>> 68056473384187692692674921486353642290
>> 10.0.0.54    datacenter1 rack1       Up     Normal  950.97 MB       20.00%  
>> 102084710076281539039012382229530463435
>> 10.0.0.72    datacenter1 rack1       Up     Normal  383.25 MB       20.00%  
>> 136112946768375385385349842972707284580
>> 
>> The nodes with ReplicateOnWrites are the 3 in the middle.  The first node 
>> and last node both have a count of 0.  This is a clean cluster, and I've 
>> been doing 3k ... 2.5k (decaying performance) inserts/sec for the last 12 
>> hours.  The last time this test ran, it went all the way down to 500 
>> inserts/sec before I killed it.
> 
> Could be due to https://issues.apache.org/jira//browse/CASSANDRA-2890.
> 
> --
> Sylvain

Reply via email to