Ahh, I see your point. Thanks for the help Stephen.
On Mon, Nov 7, 2011 at 12:43 PM, Stephen Connolly < stephen.alan.conno...@gmail.com> wrote: > at that point, your cluster will either have so much data on each node > that you will need to split them, keeping rf=5 so you have 10 nodes... or > the intra cluster traffic will swap you and you will split each node > keeping rf=5 so you have 10 nodes again. > > safest thing is not to design with the assumption that rf=n > > - Stephen > > --- > Sent from my Android phone, so random spelling mistakes, random nonsense > words and other nonsense are a direct result of using swype to type on the > screen > On 7 Nov 2011 17:47, "Riyad Kalla" <rka...@gmail.com> wrote: > >> Stephen, >> >> I appreciate you making the point more strongly; I won't make this >> decision lightly given the stress you are putting on it, but the technical >> aspects of this make me curious... >> >> If I start with RF=N (number of nodes) now, and in 2 years >> (hypothetically) my dataset is too large and I say to myself "Dangit, >> Stephen was right...", couldn't I just change the RF to some smaller value, >> say "3" at that point or would the Cassandra ring not rebalance the data >> set nicely at that point? >> >> More specifically, would it not know how best to slowly remove extraneous >> copies from the nodes and make the data more sparse among the ring members? >> >> Thanks for the hand-holding; it is helping me understand the operational >> landscape quickly. >> >> -R >> >> On Mon, Nov 7, 2011 at 10:18 AM, Stephen Connolly < >> stephen.alan.conno...@gmail.com> wrote: >> >>> Plan for the future.... >>> >>> At some point your data set will become too big for the node that it >>> is running on, or your load will force you to split nodes.... once you >>> do that RF < N >>> >>> To solve performance issues with C* the solution is add more nodes >>> >>> To solve storage issues with C* the solution is add more nodes >>> >>> In most cases the solution in C* is add more nodes. >>> >>> Don't assume RF=Number of nodes as a core design decision of your >>> application and you will not have your ass bitten >>> >>> ;-) >>> >>> -Stephen >>> P.S. making the point more extreme to make it clear >>> >>> On 7 November 2011 15:04, Riyad Kalla <rka...@gmail.com> wrote: >>> > Stephen, >>> > Excellent breakdown; I appreciate all the detail. >>> > Your last comment about RF being smaller than N (number of nodes) -- >>> in my >>> > particular case my data set isn't particularly large (a few GB) and is >>> > distributed globally across a handful of data centers. What I am >>> utilizing >>> > Cassandra for is the replication in order to minimize latency for >>> requests. >>> > So when a request comes into any location, I want each node in the >>> ring to >>> > contain the full data set so it never needs to defer to another member >>> of >>> > the ring to answer a question (even if this means eventually >>> consistency, >>> > that is alright in my case). >>> > Given that, the way I've understood this discussion so far is I would >>> have a >>> > RF of N (my total node count) but my Consistency Level with all my >>> writes >>> > will *likely* be QUORUM -- I think that is a good/safe default for me >>> to use >>> > as writes aren't the scenario I need to optimize for latency; that >>> being >>> > said, I also don't want to wait for a ConsistencyLevel of ALL to >>> complete >>> > before my code continues though. >>> > Would you agree with this assessment or am I missing the boat on >>> something? >>> > Best, >>> > Riyad >>> > >>> > On Mon, Nov 7, 2011 at 7:42 AM, Stephen Connolly >>> > <stephen.alan.conno...@gmail.com> wrote: >>> >> >>> >> Consistency Level is a pseudo-enum... >>> >> >>> >> you have the choice between >>> >> >>> >> ONE >>> >> Quorum (and there are different types of this) >>> >> ALL >>> >> >>> >> At CL=ONE, only one node is guaranteed to have got the write if the >>> >> operation is a success. >>> >> At CL=ALL, all nodes that the RF says it should be stored at must >>> >> confirm the write before the operation succeeds, but a partial write >>> >> will succeed eventually if at least one node recorded the write >>> >> At CL=QUORUM, at least ((N/2)+1) nodes must confirm the write for the >>> >> operation to succeed, otherwise failure, but a partial write will >>> >> succeed eventually if at least one node recorded the write. >>> >> >>> >> Read repair will eventually ensure that the write is replicated across >>> >> all RF nodes in the cluster. >>> >> >>> >> The N in QUORUM above depends on the type of QUORUM you choose, in >>> >> general think N=RF unless you choose a fancy QUORUM. >>> >> >>> >> To have a consistent read, CL of write + CL of read must be > RF... >>> >> >>> >> Write at ONE, read at ONE => may not get the most recent write if RF > >>> >> 1 [fastest write, fastest read] {data loss possible if node lost >>> >> before read repair} >>> >> Write at QUORUM, read at ONE => consistent read [moderate write, >>> >> fastest read] {multiple nodes must be lost for data loss to be >>> >> possible} >>> >> Write at ALL, read at ONE => consistent read, writes may be blocked if >>> >> any node fails [slowest write, fastest read] >>> >> >>> >> Write at ONE, read at QUORUM => may not get the most recent write if >>> >> RF > 2 [fastest write, moderate read] {data loss possible if node >>> >> lost before read repair} >>> >> Write at QUORUM, read at QUORUM => consistent read [moderate write, >>> >> moderate read] {multiple nodes must be lost for data loss to be >>> >> possible} >>> >> Write at ALL, read at QUORUM => consistent read, writes may be blocked >>> >> if any node fails [slowest write, moderate read] >>> >> >>> >> Write at ONE, read at ALL => consistent read, reads may fail if any >>> >> node fails [fastest write, slowest read] {data loss possible if node >>> >> lost before read repair} >>> >> Write at QUORUM, read at ALL => consistent read, reads may fail if any >>> >> node fails [moderate write, slowest read] {multiple nodes must be lost >>> >> for data loss to be possible} >>> >> Write at ALL, read at ALL => consistent read, writes may be blocked if >>> >> any node fails, reads may fail if any node fails [slowest write, >>> >> slowest read] >>> >> >>> >> Note: You can choose the CL for each and every operation. This is >>> >> something that you should design into your application (unless you >>> >> exclusively use QUORUM for all operations, in which case you are >>> >> advised to bake the logic in, but it is less necessary) >>> >> >>> >> The other thing to remember is that RF does not have to equal the >>> >> number of nodes in your cluster... in fact I would recommend designing >>> >> your app on the basis that RF < number of nodes in your cluster... >>> >> because at some point, when your data set grows big enough, you will >>> >> end up with RF < number of nodes. >>> >> >>> >> -Stephen >>> >> >>> >> On 7 November 2011 13:03, Riyad Kalla <rka...@gmail.com> wrote: >>> >> > Ah! Ok I was interpreting what you were saying to mean that if my >>> RF was >>> >> > too >>> >> > high, then the ring would die if I lost one. >>> >> > Ultimately what I want (I think) is: >>> >> > Replication Factor: 5 (aka "all of my nodes") >>> >> > Consistency Level: 2 >>> >> > Put another way, when I write a value, I want it to exist on two >>> servers >>> >> > *at >>> >> > least* before I consider that write "successful" enough for my code >>> to >>> >> > continue, but in the background I would like Cassandra to keep >>> copying >>> >> > that >>> >> > value around at its leisure until all the ring nodes know about it. >>> >> > This sounds like what I need. Thanks for pointing me in the right >>> >> > direction. >>> >> > Best, >>> >> > Riyad >>> >> > >>> >> > On Mon, Nov 7, 2011 at 5:47 AM, Anthony Ikeda >>> >> > <anthony.ikeda....@gmail.com> >>> >> > wrote: >>> >> >> >>> >> >> Riyad, I'm also just getting to know the different settings and >>> values >>> >> >> myself :) >>> >> >> I believe, and it also depends on your config, CL.ONE Should >>> ignore the >>> >> >> loss of a node if your RF is 5, once you increase the CL then if >>> you >>> >> >> lose a >>> >> >> node the CL is not met and you will get exceptions returned. >>> >> >> Sent from my iPhone >>> >> >> On 07/11/2011, at 4:32, Riyad Kalla <rka...@gmail.com> wrote: >>> >> >> >>> >> >> Anthony and Jaydeep, thank you for weighing in. I am glad to see >>> that >>> >> >> they >>> >> >> are two different values (makes more sense mentally to me). >>> >> >> Anthony, what you said caught my attention "to ensure all nodes >>> have a >>> >> >> copy you may not be able to survive the loss of a single node." -- >>> why >>> >> >> would >>> >> >> this be the case? >>> >> >> I assumed (incorrectly?) that a node would simply disappear off >>> the map >>> >> >> until I could bring it back up again, at which point all the >>> missing >>> >> >> values >>> >> >> that it didn't get while it was done, it would slowly retrieve from >>> >> >> other >>> >> >> members of the ring. Is this the wrong understanding? >>> >> >> If forcing a replication factor equal to the number of nodes in my >>> ring >>> >> >> will cause a hard-stop when one ring goes down (as I understood >>> your >>> >> >> comment >>> >> >> to mean), it seems to me I should go with a much lower replication >>> >> >> factor... >>> >> >> something along the lines of 3 or roughly ceiling(N / 2) and just >>> deal >>> >> >> with >>> >> >> the latency when one of the nodes has to route a request to another >>> >> >> server >>> >> >> when it doesn't contain the value. >>> >> >> Is there a better way to accomplish what I want, or is keeping the >>> >> >> replication factor that aggressively high generally a bad thing and >>> >> >> using >>> >> >> Cassandra in the "wrong" way? >>> >> >> Thank you for the help. >>> >> >> -Riyad >>> >> >> >>> >> >> On Sun, Nov 6, 2011 at 11:14 PM, chovatia jaydeep >>> >> >> <chovatia_jayd...@yahoo.co.in> wrote: >>> >> >>> >>> >> >>> Hi Riyad, >>> >> >>> You can set replication = 5 (number of replicas) and write with >>> CL = >>> >> >>> ONE. >>> >> >>> There is no hard requirement from Cassandra to write with CL=ALL >>> to >>> >> >>> replicate the data unless you need it. Considering your example, >>> If >>> >> >>> you >>> >> >>> write with CL=ONE then also it will replicate your data to all 5 >>> >> >>> replicas >>> >> >>> eventually. >>> >> >>> Thank you, >>> >> >>> Jaydeep >>> >> >>> ________________________________ >>> >> >>> From: Riyad Kalla <rka...@gmail.com> >>> >> >>> To: "user@cassandra.apache.org" <user@cassandra.apache.org> >>> >> >>> Sent: Sunday, 6 November 2011 9:50 PM >>> >> >>> Subject: Will writes with < ALL consistency eventually propagate? >>> >> >>> >>> >> >>> I am new to Cassandra and was curious about the following >>> scenario... >>> >> >>> >>> >> >>> Lets say i have a ring of 5 servers. Ultimately I would like each >>> >> >>> server >>> >> >>> to be a full replication of the next (master-master-*). >>> >> >>> >>> >> >>> In a presentation i watched today on Cassandra, the presenter >>> >> >>> mentioned >>> >> >>> that the ring members will shard data and route your requests to >>> the >>> >> >>> right >>> >> >>> host when they come in to a server that doesnt physically contain >>> the >>> >> >>> value >>> >> >>> you wanted. To the client requesting this is seamless excwpt for >>> the >>> >> >>> added >>> >> >>> latency. >>> >> >>> >>> >> >>> If i wanted to avoid the routing and latency and ensure every >>> server >>> >> >>> had >>> >> >>> the full data set, do i have to write with a consistency level of >>> ALL >>> >> >>> and >>> >> >>> wait for all of those writes to return in my code, or can i write >>> with >>> >> >>> a CL >>> >> >>> of 1 or 2 and let the ring propagate the rest of the copies to the >>> >> >>> other >>> >> >>> servers in the background after my code has continued executing? >>> >> >>> >>> >> >>> I dont mind eventual consistency in my case, but i do (eventually) >>> >> >>> want >>> >> >>> all nodes to have all values and cannot tell if this is default >>> >> >>> behavior, or >>> >> >>> if sharding is the default and i can only force duplicates onto >>> the >>> >> >>> other >>> >> >>> servers explicitly with a CL of ALL. >>> >> >>> >>> >> >>> Best, >>> >> >>> Riyad >>> >> >>> >>> >> >> >>> >> > >>> >> > >>> > >>> > >>> >> >>