at that point, your cluster will either have so much data on each node that you will need to split them, keeping rf=5 so you have 10 nodes... or the intra cluster traffic will swap you and you will split each node keeping rf=5 so you have 10 nodes again.
safest thing is not to design with the assumption that rf=n - Stephen --- Sent from my Android phone, so random spelling mistakes, random nonsense words and other nonsense are a direct result of using swype to type on the screen On 7 Nov 2011 17:47, "Riyad Kalla" <rka...@gmail.com> wrote: > Stephen, > > I appreciate you making the point more strongly; I won't make this > decision lightly given the stress you are putting on it, but the technical > aspects of this make me curious... > > If I start with RF=N (number of nodes) now, and in 2 years > (hypothetically) my dataset is too large and I say to myself "Dangit, > Stephen was right...", couldn't I just change the RF to some smaller value, > say "3" at that point or would the Cassandra ring not rebalance the data > set nicely at that point? > > More specifically, would it not know how best to slowly remove extraneous > copies from the nodes and make the data more sparse among the ring members? > > Thanks for the hand-holding; it is helping me understand the operational > landscape quickly. > > -R > > On Mon, Nov 7, 2011 at 10:18 AM, Stephen Connolly < > stephen.alan.conno...@gmail.com> wrote: > >> Plan for the future.... >> >> At some point your data set will become too big for the node that it >> is running on, or your load will force you to split nodes.... once you >> do that RF < N >> >> To solve performance issues with C* the solution is add more nodes >> >> To solve storage issues with C* the solution is add more nodes >> >> In most cases the solution in C* is add more nodes. >> >> Don't assume RF=Number of nodes as a core design decision of your >> application and you will not have your ass bitten >> >> ;-) >> >> -Stephen >> P.S. making the point more extreme to make it clear >> >> On 7 November 2011 15:04, Riyad Kalla <rka...@gmail.com> wrote: >> > Stephen, >> > Excellent breakdown; I appreciate all the detail. >> > Your last comment about RF being smaller than N (number of nodes) -- in >> my >> > particular case my data set isn't particularly large (a few GB) and is >> > distributed globally across a handful of data centers. What I am >> utilizing >> > Cassandra for is the replication in order to minimize latency for >> requests. >> > So when a request comes into any location, I want each node in the ring >> to >> > contain the full data set so it never needs to defer to another member >> of >> > the ring to answer a question (even if this means eventually >> consistency, >> > that is alright in my case). >> > Given that, the way I've understood this discussion so far is I would >> have a >> > RF of N (my total node count) but my Consistency Level with all my >> writes >> > will *likely* be QUORUM -- I think that is a good/safe default for me >> to use >> > as writes aren't the scenario I need to optimize for latency; that being >> > said, I also don't want to wait for a ConsistencyLevel of ALL to >> complete >> > before my code continues though. >> > Would you agree with this assessment or am I missing the boat on >> something? >> > Best, >> > Riyad >> > >> > On Mon, Nov 7, 2011 at 7:42 AM, Stephen Connolly >> > <stephen.alan.conno...@gmail.com> wrote: >> >> >> >> Consistency Level is a pseudo-enum... >> >> >> >> you have the choice between >> >> >> >> ONE >> >> Quorum (and there are different types of this) >> >> ALL >> >> >> >> At CL=ONE, only one node is guaranteed to have got the write if the >> >> operation is a success. >> >> At CL=ALL, all nodes that the RF says it should be stored at must >> >> confirm the write before the operation succeeds, but a partial write >> >> will succeed eventually if at least one node recorded the write >> >> At CL=QUORUM, at least ((N/2)+1) nodes must confirm the write for the >> >> operation to succeed, otherwise failure, but a partial write will >> >> succeed eventually if at least one node recorded the write. >> >> >> >> Read repair will eventually ensure that the write is replicated across >> >> all RF nodes in the cluster. >> >> >> >> The N in QUORUM above depends on the type of QUORUM you choose, in >> >> general think N=RF unless you choose a fancy QUORUM. >> >> >> >> To have a consistent read, CL of write + CL of read must be > RF... >> >> >> >> Write at ONE, read at ONE => may not get the most recent write if RF > >> >> 1 [fastest write, fastest read] {data loss possible if node lost >> >> before read repair} >> >> Write at QUORUM, read at ONE => consistent read [moderate write, >> >> fastest read] {multiple nodes must be lost for data loss to be >> >> possible} >> >> Write at ALL, read at ONE => consistent read, writes may be blocked if >> >> any node fails [slowest write, fastest read] >> >> >> >> Write at ONE, read at QUORUM => may not get the most recent write if >> >> RF > 2 [fastest write, moderate read] {data loss possible if node >> >> lost before read repair} >> >> Write at QUORUM, read at QUORUM => consistent read [moderate write, >> >> moderate read] {multiple nodes must be lost for data loss to be >> >> possible} >> >> Write at ALL, read at QUORUM => consistent read, writes may be blocked >> >> if any node fails [slowest write, moderate read] >> >> >> >> Write at ONE, read at ALL => consistent read, reads may fail if any >> >> node fails [fastest write, slowest read] {data loss possible if node >> >> lost before read repair} >> >> Write at QUORUM, read at ALL => consistent read, reads may fail if any >> >> node fails [moderate write, slowest read] {multiple nodes must be lost >> >> for data loss to be possible} >> >> Write at ALL, read at ALL => consistent read, writes may be blocked if >> >> any node fails, reads may fail if any node fails [slowest write, >> >> slowest read] >> >> >> >> Note: You can choose the CL for each and every operation. This is >> >> something that you should design into your application (unless you >> >> exclusively use QUORUM for all operations, in which case you are >> >> advised to bake the logic in, but it is less necessary) >> >> >> >> The other thing to remember is that RF does not have to equal the >> >> number of nodes in your cluster... in fact I would recommend designing >> >> your app on the basis that RF < number of nodes in your cluster... >> >> because at some point, when your data set grows big enough, you will >> >> end up with RF < number of nodes. >> >> >> >> -Stephen >> >> >> >> On 7 November 2011 13:03, Riyad Kalla <rka...@gmail.com> wrote: >> >> > Ah! Ok I was interpreting what you were saying to mean that if my RF >> was >> >> > too >> >> > high, then the ring would die if I lost one. >> >> > Ultimately what I want (I think) is: >> >> > Replication Factor: 5 (aka "all of my nodes") >> >> > Consistency Level: 2 >> >> > Put another way, when I write a value, I want it to exist on two >> servers >> >> > *at >> >> > least* before I consider that write "successful" enough for my code >> to >> >> > continue, but in the background I would like Cassandra to keep >> copying >> >> > that >> >> > value around at its leisure until all the ring nodes know about it. >> >> > This sounds like what I need. Thanks for pointing me in the right >> >> > direction. >> >> > Best, >> >> > Riyad >> >> > >> >> > On Mon, Nov 7, 2011 at 5:47 AM, Anthony Ikeda >> >> > <anthony.ikeda....@gmail.com> >> >> > wrote: >> >> >> >> >> >> Riyad, I'm also just getting to know the different settings and >> values >> >> >> myself :) >> >> >> I believe, and it also depends on your config, CL.ONE Should ignore >> the >> >> >> loss of a node if your RF is 5, once you increase the CL then if you >> >> >> lose a >> >> >> node the CL is not met and you will get exceptions returned. >> >> >> Sent from my iPhone >> >> >> On 07/11/2011, at 4:32, Riyad Kalla <rka...@gmail.com> wrote: >> >> >> >> >> >> Anthony and Jaydeep, thank you for weighing in. I am glad to see >> that >> >> >> they >> >> >> are two different values (makes more sense mentally to me). >> >> >> Anthony, what you said caught my attention "to ensure all nodes >> have a >> >> >> copy you may not be able to survive the loss of a single node." -- >> why >> >> >> would >> >> >> this be the case? >> >> >> I assumed (incorrectly?) that a node would simply disappear off the >> map >> >> >> until I could bring it back up again, at which point all the missing >> >> >> values >> >> >> that it didn't get while it was done, it would slowly retrieve from >> >> >> other >> >> >> members of the ring. Is this the wrong understanding? >> >> >> If forcing a replication factor equal to the number of nodes in my >> ring >> >> >> will cause a hard-stop when one ring goes down (as I understood your >> >> >> comment >> >> >> to mean), it seems to me I should go with a much lower replication >> >> >> factor... >> >> >> something along the lines of 3 or roughly ceiling(N / 2) and just >> deal >> >> >> with >> >> >> the latency when one of the nodes has to route a request to another >> >> >> server >> >> >> when it doesn't contain the value. >> >> >> Is there a better way to accomplish what I want, or is keeping the >> >> >> replication factor that aggressively high generally a bad thing and >> >> >> using >> >> >> Cassandra in the "wrong" way? >> >> >> Thank you for the help. >> >> >> -Riyad >> >> >> >> >> >> On Sun, Nov 6, 2011 at 11:14 PM, chovatia jaydeep >> >> >> <chovatia_jayd...@yahoo.co.in> wrote: >> >> >>> >> >> >>> Hi Riyad, >> >> >>> You can set replication = 5 (number of replicas) and write with CL >> = >> >> >>> ONE. >> >> >>> There is no hard requirement from Cassandra to write with CL=ALL to >> >> >>> replicate the data unless you need it. Considering your example, If >> >> >>> you >> >> >>> write with CL=ONE then also it will replicate your data to all 5 >> >> >>> replicas >> >> >>> eventually. >> >> >>> Thank you, >> >> >>> Jaydeep >> >> >>> ________________________________ >> >> >>> From: Riyad Kalla <rka...@gmail.com> >> >> >>> To: "user@cassandra.apache.org" <user@cassandra.apache.org> >> >> >>> Sent: Sunday, 6 November 2011 9:50 PM >> >> >>> Subject: Will writes with < ALL consistency eventually propagate? >> >> >>> >> >> >>> I am new to Cassandra and was curious about the following >> scenario... >> >> >>> >> >> >>> Lets say i have a ring of 5 servers. Ultimately I would like each >> >> >>> server >> >> >>> to be a full replication of the next (master-master-*). >> >> >>> >> >> >>> In a presentation i watched today on Cassandra, the presenter >> >> >>> mentioned >> >> >>> that the ring members will shard data and route your requests to >> the >> >> >>> right >> >> >>> host when they come in to a server that doesnt physically contain >> the >> >> >>> value >> >> >>> you wanted. To the client requesting this is seamless excwpt for >> the >> >> >>> added >> >> >>> latency. >> >> >>> >> >> >>> If i wanted to avoid the routing and latency and ensure every >> server >> >> >>> had >> >> >>> the full data set, do i have to write with a consistency level of >> ALL >> >> >>> and >> >> >>> wait for all of those writes to return in my code, or can i write >> with >> >> >>> a CL >> >> >>> of 1 or 2 and let the ring propagate the rest of the copies to the >> >> >>> other >> >> >>> servers in the background after my code has continued executing? >> >> >>> >> >> >>> I dont mind eventual consistency in my case, but i do (eventually) >> >> >>> want >> >> >>> all nodes to have all values and cannot tell if this is default >> >> >>> behavior, or >> >> >>> if sharding is the default and i can only force duplicates onto the >> >> >>> other >> >> >>> servers explicitly with a CL of ALL. >> >> >>> >> >> >>> Best, >> >> >>> Riyad >> >> >>> >> >> >> >> >> > >> >> > >> > >> > >> > >