sorry i mean cf * row
if you look in the code, db.cf is just basically a set of columns On Sep 1, 2011 1:36 PM, "Ian Danforth" <idanfo...@numenta.com> wrote: > I'm not sure I understand the scalability of this approach. A given > column family can be HUGE with millions of rows and columns. In my > cluster I have a single column family that accounts for 90GB of load > on each node. Not only that but column family is distributed over the > entire ring. > > Clearly I'm misunderstanding something. > > Ian > > On Thu, Sep 1, 2011 at 1:17 PM, Yang <teddyyyy...@gmail.com> wrote: >> when Cassandra reads, the entire CF is always read together, only at the >> hand-over to client does the pruning happens >> >> On Thu, Sep 1, 2011 at 11:52 AM, David Hawthorne <dha...@gmx.3crowd.com> >> wrote: >>> >>> I'm curious... digging through the source, it looks like replicate on >>> write triggers a read of the entire row, and not just the >>> columns/supercolumns that are affected by the counter update. Is this the >>> case? It would certainly explain why my inserts/sec decay over time and why >>> the average insert latency increases over time. The strange thing is that >>> I'm not seeing disk read IO increase over that same period, but that might >>> be due to the OS buffer cache... >>> >>> On another note, on a 5-node cluster, I'm only seeing 3 nodes with >>> ReplicateOnWrite Completed tasks in nodetool tpstats output. Is that >>> normal? I'm using RandomPartitioner... >>> >>> Address DC Rack Status State Load >>> Owns Token >>> >>> 136112946768375385385349842972707284580 >>> 10.0.0.57 datacenter1 rack1 Up Normal 2.26 GB 20.00% >>> 0 >>> 10.0.0.56 datacenter1 rack1 Up Normal 2.47 GB 20.00% >>> 34028236692093846346337460743176821145 >>> 10.0.0.55 datacenter1 rack1 Up Normal 2.52 GB 20.00% >>> 68056473384187692692674921486353642290 >>> 10.0.0.54 datacenter1 rack1 Up Normal 950.97 MB 20.00% >>> 102084710076281539039012382229530463435 >>> 10.0.0.72 datacenter1 rack1 Up Normal 383.25 MB 20.00% >>> 136112946768375385385349842972707284580 >>> >>> The nodes with ReplicateOnWrites are the 3 in the middle. The first node >>> and last node both have a count of 0. This is a clean cluster, and I've >>> been doing 3k ... 2.5k (decaying performance) inserts/sec for the last 12 >>> hours. The last time this test ran, it went all the way down to 500 >>> inserts/sec before I killed it. >>