Stephen, Excellent breakdown; I appreciate all the detail.
Your last comment about RF being smaller than N (number of nodes) -- in my particular case my data set isn't particularly large (a few GB) and is distributed globally across a handful of data centers. What I am utilizing Cassandra for is the replication in order to minimize latency for requests. So when a request comes into any location, I want each node in the ring to contain the full data set so it never needs to defer to another member of the ring to answer a question (even if this means eventually consistency, that is alright in my case). Given that, the way I've understood this discussion so far is I would have a RF of N (my total node count) but my Consistency Level with all my writes will *likely* be QUORUM -- I think that is a good/safe default for me to use as writes aren't the scenario I need to optimize for latency; that being said, I also don't want to wait for a ConsistencyLevel of ALL to complete before my code continues though. Would you agree with this assessment or am I missing the boat on something? Best, Riyad On Mon, Nov 7, 2011 at 7:42 AM, Stephen Connolly < stephen.alan.conno...@gmail.com> wrote: > Consistency Level is a pseudo-enum... > > you have the choice between > > ONE > Quorum (and there are different types of this) > ALL > > At CL=ONE, only one node is guaranteed to have got the write if the > operation is a success. > At CL=ALL, all nodes that the RF says it should be stored at must > confirm the write before the operation succeeds, but a partial write > will succeed eventually if at least one node recorded the write > At CL=QUORUM, at least ((N/2)+1) nodes must confirm the write for the > operation to succeed, otherwise failure, but a partial write will > succeed eventually if at least one node recorded the write. > > Read repair will eventually ensure that the write is replicated across > all RF nodes in the cluster. > > The N in QUORUM above depends on the type of QUORUM you choose, in > general think N=RF unless you choose a fancy QUORUM. > > To have a consistent read, CL of write + CL of read must be > RF... > > Write at ONE, read at ONE => may not get the most recent write if RF > > 1 [fastest write, fastest read] {data loss possible if node lost > before read repair} > Write at QUORUM, read at ONE => consistent read [moderate write, > fastest read] {multiple nodes must be lost for data loss to be > possible} > Write at ALL, read at ONE => consistent read, writes may be blocked if > any node fails [slowest write, fastest read] > > Write at ONE, read at QUORUM => may not get the most recent write if > RF > 2 [fastest write, moderate read] {data loss possible if node > lost before read repair} > Write at QUORUM, read at QUORUM => consistent read [moderate write, > moderate read] {multiple nodes must be lost for data loss to be > possible} > Write at ALL, read at QUORUM => consistent read, writes may be blocked > if any node fails [slowest write, moderate read] > > Write at ONE, read at ALL => consistent read, reads may fail if any > node fails [fastest write, slowest read] {data loss possible if node > lost before read repair} > Write at QUORUM, read at ALL => consistent read, reads may fail if any > node fails [moderate write, slowest read] {multiple nodes must be lost > for data loss to be possible} > Write at ALL, read at ALL => consistent read, writes may be blocked if > any node fails, reads may fail if any node fails [slowest write, > slowest read] > > Note: You can choose the CL for each and every operation. This is > something that you should design into your application (unless you > exclusively use QUORUM for all operations, in which case you are > advised to bake the logic in, but it is less necessary) > > The other thing to remember is that RF does not have to equal the > number of nodes in your cluster... in fact I would recommend designing > your app on the basis that RF < number of nodes in your cluster... > because at some point, when your data set grows big enough, you will > end up with RF < number of nodes. > > -Stephen > > On 7 November 2011 13:03, Riyad Kalla <rka...@gmail.com> wrote: > > Ah! Ok I was interpreting what you were saying to mean that if my RF was > too > > high, then the ring would die if I lost one. > > Ultimately what I want (I think) is: > > Replication Factor: 5 (aka "all of my nodes") > > Consistency Level: 2 > > Put another way, when I write a value, I want it to exist on two servers > *at > > least* before I consider that write "successful" enough for my code to > > continue, but in the background I would like Cassandra to keep copying > that > > value around at its leisure until all the ring nodes know about it. > > This sounds like what I need. Thanks for pointing me in the right > direction. > > Best, > > Riyad > > > > On Mon, Nov 7, 2011 at 5:47 AM, Anthony Ikeda < > anthony.ikeda....@gmail.com> > > wrote: > >> > >> Riyad, I'm also just getting to know the different settings and values > >> myself :) > >> I believe, and it also depends on your config, CL.ONE Should ignore the > >> loss of a node if your RF is 5, once you increase the CL then if you > lose a > >> node the CL is not met and you will get exceptions returned. > >> Sent from my iPhone > >> On 07/11/2011, at 4:32, Riyad Kalla <rka...@gmail.com> wrote: > >> > >> Anthony and Jaydeep, thank you for weighing in. I am glad to see that > they > >> are two different values (makes more sense mentally to me). > >> Anthony, what you said caught my attention "to ensure all nodes have a > >> copy you may not be able to survive the loss of a single node." -- why > would > >> this be the case? > >> I assumed (incorrectly?) that a node would simply disappear off the map > >> until I could bring it back up again, at which point all the missing > values > >> that it didn't get while it was done, it would slowly retrieve from > other > >> members of the ring. Is this the wrong understanding? > >> If forcing a replication factor equal to the number of nodes in my ring > >> will cause a hard-stop when one ring goes down (as I understood your > comment > >> to mean), it seems to me I should go with a much lower replication > factor... > >> something along the lines of 3 or roughly ceiling(N / 2) and just deal > with > >> the latency when one of the nodes has to route a request to another > server > >> when it doesn't contain the value. > >> Is there a better way to accomplish what I want, or is keeping the > >> replication factor that aggressively high generally a bad thing and > using > >> Cassandra in the "wrong" way? > >> Thank you for the help. > >> -Riyad > >> > >> On Sun, Nov 6, 2011 at 11:14 PM, chovatia jaydeep > >> <chovatia_jayd...@yahoo.co.in> wrote: > >>> > >>> Hi Riyad, > >>> You can set replication = 5 (number of replicas) and write with CL = > ONE. > >>> There is no hard requirement from Cassandra to write with CL=ALL to > >>> replicate the data unless you need it. Considering your example, If you > >>> write with CL=ONE then also it will replicate your data to all 5 > replicas > >>> eventually. > >>> Thank you, > >>> Jaydeep > >>> ________________________________ > >>> From: Riyad Kalla <rka...@gmail.com> > >>> To: "user@cassandra.apache.org" <user@cassandra.apache.org> > >>> Sent: Sunday, 6 November 2011 9:50 PM > >>> Subject: Will writes with < ALL consistency eventually propagate? > >>> > >>> I am new to Cassandra and was curious about the following scenario... > >>> > >>> Lets say i have a ring of 5 servers. Ultimately I would like each > server > >>> to be a full replication of the next (master-master-*). > >>> > >>> In a presentation i watched today on Cassandra, the presenter mentioned > >>> that the ring members will shard data and route your requests to the > right > >>> host when they come in to a server that doesnt physically contain the > value > >>> you wanted. To the client requesting this is seamless excwpt for the > added > >>> latency. > >>> > >>> If i wanted to avoid the routing and latency and ensure every server > had > >>> the full data set, do i have to write with a consistency level of ALL > and > >>> wait for all of those writes to return in my code, or can i write with > a CL > >>> of 1 or 2 and let the ring propagate the rest of the copies to the > other > >>> servers in the background after my code has continued executing? > >>> > >>> I dont mind eventual consistency in my case, but i do (eventually) want > >>> all nodes to have all values and cannot tell if this is default > behavior, or > >>> if sharding is the default and i can only force duplicates onto the > other > >>> servers explicitly with a CL of ALL. > >>> > >>> Best, > >>> Riyad > >>> > >> > > > > >