Stephen,

Excellent breakdown; I appreciate all the detail.

Your last comment about RF being smaller than N (number of nodes) -- in my
particular case my data set isn't particularly large (a few GB) and is
distributed globally across a handful of data centers. What I am utilizing
Cassandra for is the replication in order to minimize latency for requests.

So when a request comes into any location, I want each node in the ring to
contain the full data set so it never needs to defer to another member of
the ring to answer a question (even if this means eventually consistency,
that is alright in my case).

Given that, the way I've understood this discussion so far is I would have
a RF of N (my total node count) but my Consistency Level with all my writes
will *likely* be QUORUM -- I think that is a good/safe default for me to
use as writes aren't the scenario I need to optimize for latency; that
being said, I also don't want to wait for a ConsistencyLevel of ALL to
complete before my code continues though.

Would you agree with this assessment or am I missing the boat on something?

Best,
Riyad

On Mon, Nov 7, 2011 at 7:42 AM, Stephen Connolly <
stephen.alan.conno...@gmail.com> wrote:

> Consistency Level is a pseudo-enum...
>
> you have the choice between
>
> ONE
> Quorum (and there are different types of this)
> ALL
>
> At CL=ONE, only one node is guaranteed to have got the write if the
> operation is a success.
> At CL=ALL, all nodes that the RF says it should be stored at must
> confirm the write before the operation succeeds, but a partial write
> will succeed eventually if at least one node recorded the write
> At CL=QUORUM, at least ((N/2)+1) nodes must confirm the write for the
> operation to succeed, otherwise failure, but a partial write will
> succeed eventually if at least one node recorded the write.
>
> Read repair will eventually ensure that the write is replicated across
> all RF nodes in the cluster.
>
> The N in QUORUM above depends on the type of QUORUM you choose, in
> general think N=RF unless you choose a fancy QUORUM.
>
> To have a consistent read, CL of write + CL of read must be > RF...
>
> Write at ONE, read at ONE => may not get the most recent write if RF >
> 1 [fastest write, fastest read] {data loss possible if node lost
> before read repair}
> Write at QUORUM, read at ONE => consistent read [moderate write,
> fastest read] {multiple nodes must be lost for data loss to be
> possible}
> Write at ALL, read at ONE => consistent read, writes may be blocked if
> any node fails [slowest write, fastest read]
>
> Write at ONE, read at QUORUM => may not get the most recent write if
> RF > 2 [fastest write, moderate read]  {data loss possible if node
> lost before read repair}
> Write at QUORUM, read at QUORUM => consistent read [moderate write,
> moderate read] {multiple nodes must be lost for data loss to be
> possible}
> Write at ALL, read at QUORUM => consistent read, writes may be blocked
> if any node fails [slowest write, moderate read]
>
> Write at ONE, read at ALL => consistent read, reads may fail if any
> node fails [fastest write, slowest read] {data loss possible if node
> lost before read repair}
> Write at QUORUM, read at ALL => consistent read, reads may fail if any
> node fails [moderate write, slowest read] {multiple nodes must be lost
> for data loss to be possible}
> Write at ALL, read at ALL => consistent read, writes may be blocked if
> any node fails, reads may fail if any node fails [slowest write,
> slowest read]
>
> Note: You can choose the CL for each and every operation. This is
> something that you should design into your application (unless you
> exclusively use QUORUM for all operations, in which case you are
> advised to bake the logic in, but it is less necessary)
>
> The other thing to remember is that RF does not have to equal the
> number of nodes in your cluster... in fact I would recommend designing
> your app on the basis that RF < number of nodes in your cluster...
> because at some point, when your data set grows big enough, you will
> end up with RF < number of nodes.
>
> -Stephen
>
> On 7 November 2011 13:03, Riyad Kalla <rka...@gmail.com> wrote:
> > Ah! Ok I was interpreting what you were saying to mean that if my RF was
> too
> > high, then the ring would die if I lost one.
> > Ultimately what I want (I think) is:
> > Replication Factor: 5 (aka "all of my nodes")
> > Consistency Level: 2
> > Put another way, when I write a value, I want it to exist on two servers
> *at
> > least* before I consider that write "successful" enough for my code to
> > continue, but in the background I would like Cassandra to keep copying
> that
> > value around at its leisure until all the ring nodes know about it.
> > This sounds like what I need. Thanks for pointing me in the right
> direction.
> > Best,
> > Riyad
> >
> > On Mon, Nov 7, 2011 at 5:47 AM, Anthony Ikeda <
> anthony.ikeda....@gmail.com>
> > wrote:
> >>
> >> Riyad, I'm also just getting to know the different settings and values
> >> myself :)
> >> I believe, and it also depends on your config, CL.ONE Should ignore the
> >> loss of a node if your RF is 5, once you increase the CL then if you
> lose a
> >> node the CL is not met and you will get exceptions returned.
> >> Sent from my iPhone
> >> On 07/11/2011, at 4:32, Riyad Kalla <rka...@gmail.com> wrote:
> >>
> >> Anthony and Jaydeep, thank you for weighing in. I am glad to see that
> they
> >> are two different values (makes more sense mentally to me).
> >> Anthony, what you said caught my attention "to ensure all nodes have a
> >> copy you may not be able to survive the loss of a single node." -- why
> would
> >> this be the case?
> >> I assumed (incorrectly?) that a node would simply disappear off the map
> >> until I could bring it back up again, at which point all the missing
> values
> >> that it didn't get while it was done, it would slowly retrieve from
> other
> >> members of the ring. Is this the wrong understanding?
> >> If forcing a replication factor equal to the number of nodes in my ring
> >> will cause a hard-stop when one ring goes down (as I understood your
> comment
> >> to mean), it seems to me I should go with a much lower replication
> factor...
> >> something along the lines of 3 or roughly ceiling(N / 2) and just deal
> with
> >> the latency when one of the nodes has to route a request to another
> server
> >> when it doesn't contain the value.
> >> Is there a better way to accomplish what I want, or is keeping the
> >> replication factor that aggressively high generally a bad thing and
> using
> >> Cassandra in the "wrong" way?
> >> Thank you for the help.
> >> -Riyad
> >>
> >> On Sun, Nov 6, 2011 at 11:14 PM, chovatia jaydeep
> >> <chovatia_jayd...@yahoo.co.in> wrote:
> >>>
> >>> Hi Riyad,
> >>> You can set replication = 5 (number of replicas) and write with CL =
> ONE.
> >>> There is no hard requirement from Cassandra to write with CL=ALL to
> >>> replicate the data unless you need it. Considering your example, If you
> >>> write with CL=ONE then also it will replicate your data to all 5
> replicas
> >>> eventually.
> >>> Thank you,
> >>> Jaydeep
> >>> ________________________________
> >>> From: Riyad Kalla <rka...@gmail.com>
> >>> To: "user@cassandra.apache.org" <user@cassandra.apache.org>
> >>> Sent: Sunday, 6 November 2011 9:50 PM
> >>> Subject: Will writes with < ALL consistency eventually propagate?
> >>>
> >>> I am new to Cassandra and was curious about the following scenario...
> >>>
> >>> Lets say i have a ring of 5 servers. Ultimately I would like each
> server
> >>> to be a full replication of the next (master-master-*).
> >>>
> >>> In a presentation i watched today on Cassandra, the presenter mentioned
> >>> that the ring members will shard data and route your requests to the
> right
> >>> host when they come in to a server that doesnt physically contain the
> value
> >>> you wanted. To the client requesting this is seamless excwpt for the
> added
> >>> latency.
> >>>
> >>> If i wanted to avoid the routing and latency and ensure every server
> had
> >>> the full data set, do i have to write with a consistency level of ALL
> and
> >>> wait for all of those writes to return in my code, or can i write with
> a CL
> >>> of 1 or 2 and let the ring propagate the rest of the copies to the
> other
> >>> servers in the background after my code has continued executing?
> >>>
> >>> I dont mind eventual consistency in my case, but i do (eventually) want
> >>> all nodes to have all values and cannot tell if this is default
> behavior, or
> >>> if sharding is the default and i can only force duplicates onto the
> other
> >>> servers explicitly with a CL of ALL.
> >>>
> >>> Best,
> >>> Riyad
> >>>
> >>
> >
> >
>

Reply via email to