Re: Replicate On Write behavior

2011-09-09 Thread David Hawthorne
They are evenly distributed. 5 nodes * 40 connections each using hector, and I can confirm that all 200 are active when this happened (from hector's perspective, from graphing the hector jmx data), and all 5 nodes saw roughly 40 connections, and all were receiving traffic over those connections

Re: Replicate On Write behavior

2011-09-09 Thread Sylvain Lebresne
We'll solve #2890 and we should have done it sooner. That being said, a quick question: how do you do your inserts from the clients ? Are you evenly distributing the inserts among the nodes ? Or are you always hitting the same coordinator ? Because provided the nodes are correctly distributed on

Re: Replicate On Write behavior

2011-09-08 Thread David Hawthorne
It was exactly due to 2890, and the fact that the first replica is always the one with the lowest value IP address. I patched cassandra to pick a random node out of the replica set in StorageProxy.java findSuitableEndpoint: Random rng = new Random(); return endpoints.get(rng.nextInt(endpoints.

Re: Replicate On Write behavior

2011-09-02 Thread David Hawthorne
Does it always pick the node with the lowest IP address? All of my hosts are in the same /24. The fourth node in the 5 node cluster has the lowest value in the 4th octet (54). I erased the cluster and rebuilt it from scratch as a 3 node cluster using the first 3 nodes, and now the ReplicateOn

Re: Replicate On Write behavior

2011-09-02 Thread Ian Danforth
That ticket explains a lot, looking forward to a resolution on it. (Sorry I don't have a patch to offer) Ian On Fri, Sep 2, 2011 at 12:30 AM, Sylvain Lebresne wrote: > On Thu, Sep 1, 2011 at 8:52 PM, David Hawthorne wrote: >> I'm curious... digging through the source, it looks like replicate on

Re: Replicate On Write behavior

2011-09-02 Thread David Hawthorne
That's interesting. I did an experiment wherein I added some entropy to the row name based on the time when the increment came in, (e.g. row = row + "/" + (timestamp - (timestamp % 300))) and now not only is the load (in GB) on my cluster more balanced, the performance has not decayed and has s

Re: Replicate On Write behavior

2011-09-02 Thread Sylvain Lebresne
On Thu, Sep 1, 2011 at 8:52 PM, David Hawthorne wrote: > I'm curious... digging through the source, it looks like replicate on write > triggers a read of the entire row, and not just the columns/supercolumns that > are affected by the counter update.  Is this the case?  It would certainly > exp

Re: Replicate On Write behavior

2011-09-01 Thread Yang
sorry i mean cf * row if you look in the code, db.cf is just basically a set of columns On Sep 1, 2011 1:36 PM, "Ian Danforth" wrote: > I'm not sure I understand the scalability of this approach. A given > column family can be HUGE with millions of rows and columns. In my > cluster I have a sin

Re: Replicate On Write behavior

2011-09-01 Thread Konstantin Naryshkin
disk. - Original Message - From: "Ian Danforth" To: user@cassandra.apache.org Sent: Thursday, September 1, 2011 4:35:33 PM Subject: Re: Replicate On Write behavior I'm not sure I understand the scalability of this approach. A given column family can be HUGE with millions of r

Re: Replicate On Write behavior

2011-09-01 Thread Ian Danforth
I'm not sure I understand the scalability of this approach. A given column family can be HUGE with millions of rows and columns. In my cluster I have a single column family that accounts for 90GB of load on each node. Not only that but column family is distributed over the entire ring. Clearly I'm

Re: Replicate On Write behavior

2011-09-01 Thread Yang
when Cassandra reads, the entire CF is always read together, only at the hand-over to client does the pruning happens On Thu, Sep 1, 2011 at 11:52 AM, David Hawthorne wrote: > I'm curious... digging through the source, it looks like replicate on write > triggers a read of the entire row, and not

Replicate On Write behavior

2011-09-01 Thread David Hawthorne
I'm curious... digging through the source, it looks like replicate on write triggers a read of the entire row, and not just the columns/supercolumns that are affected by the counter update. Is this the case? It would certainly explain why my inserts/sec decay over time and why the average inse