That's interesting. I did an experiment wherein I added some entropy to the row name based on the time when the increment came in, (e.g. row = row + "/" + (timestamp - (timestamp % 300))) and now not only is the load (in GB) on my cluster more balanced, the performance has not decayed and has stayed steady (inserts/sec) with a relatively low average ms/insert. Each row is now significantly shorter as a result of this change.
On Sep 2, 2011, at 12:30 AM, Sylvain Lebresne wrote: > On Thu, Sep 1, 2011 at 8:52 PM, David Hawthorne <dha...@gmx.3crowd.com> wrote: >> I'm curious... digging through the source, it looks like replicate on write >> triggers a read of the entire row, and not just the columns/supercolumns >> that are affected by the counter update. Is this the case? It would >> certainly explain why my inserts/sec decay over time and why the average >> insert latency increases over time. The strange thing is that I'm not >> seeing disk read IO increase over that same period, but that might be due to >> the OS buffer cache... > > It does not. It only reads the columns/supercolumns affected by the > counter update. > In the source, this happens in CounterMutation.java. If you look at > addReadCommandFromColumnFamily you'll see that it does a query by name > only for the column involved in the update (the update is basically > the content of the columnFamily parameter there). > > And Cassandra does *not* always reads a full row. Never had, never will. > >> On another note, on a 5-node cluster, I'm only seeing 3 nodes with >> ReplicateOnWrite Completed tasks in nodetool tpstats output. Is that >> normal? I'm using RandomPartitioner... >> >> Address DC Rack Status State Load Owns >> Token >> >> 136112946768375385385349842972707284580 >> 10.0.0.57 datacenter1 rack1 Up Normal 2.26 GB 20.00% 0 >> 10.0.0.56 datacenter1 rack1 Up Normal 2.47 GB 20.00% >> 34028236692093846346337460743176821145 >> 10.0.0.55 datacenter1 rack1 Up Normal 2.52 GB 20.00% >> 68056473384187692692674921486353642290 >> 10.0.0.54 datacenter1 rack1 Up Normal 950.97 MB 20.00% >> 102084710076281539039012382229530463435 >> 10.0.0.72 datacenter1 rack1 Up Normal 383.25 MB 20.00% >> 136112946768375385385349842972707284580 >> >> The nodes with ReplicateOnWrites are the 3 in the middle. The first node >> and last node both have a count of 0. This is a clean cluster, and I've >> been doing 3k ... 2.5k (decaying performance) inserts/sec for the last 12 >> hours. The last time this test ran, it went all the way down to 500 >> inserts/sec before I killed it. > > Could be due to https://issues.apache.org/jira//browse/CASSANDRA-2890. > > -- > Sylvain