Re: Will writes with < ALL consistency eventually propagate?

Stephen Connolly Mon, 07 Nov 2011 09:19:19 -0800

Plan for the future....

At some point your data set will become too big for the node that it
is running on, or your load will force you to split nodes.... once you
do that RF < N


To solve performance issues with C* the solution is add more nodes

To solve storage issues with C* the solution is add more nodes

In most cases the solution in C* is add more nodes.

Don't assume RF=Number of nodes as a core design decision of your
application and you will not have your ass bitten

;-)

-Stephen
P.S. making the point more extreme to make it clear

On 7 November 2011 15:04, Riyad Kalla <rka...@gmail.com> wrote:
> Stephen,
> Excellent breakdown; I appreciate all the detail.
> Your last comment about RF being smaller than N (number of nodes) -- in my
> particular case my data set isn't particularly large (a few GB) and is
> distributed globally across a handful of data centers. What I am utilizing
> Cassandra for is the replication in order to minimize latency for requests.
> So when a request comes into any location, I want each node in the ring to
> contain the full data set so it never needs to defer to another member of
> the ring to answer a question (even if this means eventually consistency,
> that is alright in my case).
> Given that, the way I've understood this discussion so far is I would have a
> RF of N (my total node count) but my Consistency Level with all my writes
> will *likely* be QUORUM -- I think that is a good/safe default for me to use
> as writes aren't the scenario I need to optimize for latency; that being
> said, I also don't want to wait for a ConsistencyLevel of ALL to complete
> before my code continues though.
> Would you agree with this assessment or am I missing the boat on something?
> Best,
> Riyad
>
> On Mon, Nov 7, 2011 at 7:42 AM, Stephen Connolly
> <stephen.alan.conno...@gmail.com> wrote:
>>
>> Consistency Level is a pseudo-enum...
>>
>> you have the choice between
>>
>> ONE
>> Quorum (and there are different types of this)
>> ALL
>>
>> At CL=ONE, only one node is guaranteed to have got the write if the
>> operation is a success.
>> At CL=ALL, all nodes that the RF says it should be stored at must
>> confirm the write before the operation succeeds, but a partial write
>> will succeed eventually if at least one node recorded the write
>> At CL=QUORUM, at least ((N/2)+1) nodes must confirm the write for the
>> operation to succeed, otherwise failure, but a partial write will
>> succeed eventually if at least one node recorded the write.
>>
>> Read repair will eventually ensure that the write is replicated across
>> all RF nodes in the cluster.
>>
>> The N in QUORUM above depends on the type of QUORUM you choose, in
>> general think N=RF unless you choose a fancy QUORUM.
>>
>> To have a consistent read, CL of write + CL of read must be > RF...
>>
>> Write at ONE, read at ONE => may not get the most recent write if RF >
>> 1 [fastest write, fastest read] {data loss possible if node lost
>> before read repair}
>> Write at QUORUM, read at ONE => consistent read [moderate write,
>> fastest read] {multiple nodes must be lost for data loss to be
>> possible}
>> Write at ALL, read at ONE => consistent read, writes may be blocked if
>> any node fails [slowest write, fastest read]
>>
>> Write at ONE, read at QUORUM => may not get the most recent write if
>> RF > 2 [fastest write, moderate read]  {data loss possible if node
>> lost before read repair}
>> Write at QUORUM, read at QUORUM => consistent read [moderate write,
>> moderate read] {multiple nodes must be lost for data loss to be
>> possible}
>> Write at ALL, read at QUORUM => consistent read, writes may be blocked
>> if any node fails [slowest write, moderate read]
>>
>> Write at ONE, read at ALL => consistent read, reads may fail if any
>> node fails [fastest write, slowest read] {data loss possible if node
>> lost before read repair}
>> Write at QUORUM, read at ALL => consistent read, reads may fail if any
>> node fails [moderate write, slowest read] {multiple nodes must be lost
>> for data loss to be possible}
>> Write at ALL, read at ALL => consistent read, writes may be blocked if
>> any node fails, reads may fail if any node fails [slowest write,
>> slowest read]
>>
>> Note: You can choose the CL for each and every operation. This is
>> something that you should design into your application (unless you
>> exclusively use QUORUM for all operations, in which case you are
>> advised to bake the logic in, but it is less necessary)
>>
>> The other thing to remember is that RF does not have to equal the
>> number of nodes in your cluster... in fact I would recommend designing
>> your app on the basis that RF < number of nodes in your cluster...
>> because at some point, when your data set grows big enough, you will
>> end up with RF < number of nodes.
>>
>> -Stephen
>>
>> On 7 November 2011 13:03, Riyad Kalla <rka...@gmail.com> wrote:
>> > Ah! Ok I was interpreting what you were saying to mean that if my RF was
>> > too
>> > high, then the ring would die if I lost one.
>> > Ultimately what I want (I think) is:
>> > Replication Factor: 5 (aka "all of my nodes")
>> > Consistency Level: 2
>> > Put another way, when I write a value, I want it to exist on two servers
>> > *at
>> > least* before I consider that write "successful" enough for my code to
>> > continue, but in the background I would like Cassandra to keep copying
>> > that
>> > value around at its leisure until all the ring nodes know about it.
>> > This sounds like what I need. Thanks for pointing me in the right
>> > direction.
>> > Best,
>> > Riyad
>> >
>> > On Mon, Nov 7, 2011 at 5:47 AM, Anthony Ikeda
>> > <anthony.ikeda....@gmail.com>
>> > wrote:
>> >>
>> >> Riyad, I'm also just getting to know the different settings and values
>> >> myself :)
>> >> I believe, and it also depends on your config, CL.ONE Should ignore the
>> >> loss of a node if your RF is 5, once you increase the CL then if you
>> >> lose a
>> >> node the CL is not met and you will get exceptions returned.
>> >> Sent from my iPhone
>> >> On 07/11/2011, at 4:32, Riyad Kalla <rka...@gmail.com> wrote:
>> >>
>> >> Anthony and Jaydeep, thank you for weighing in. I am glad to see that
>> >> they
>> >> are two different values (makes more sense mentally to me).
>> >> Anthony, what you said caught my attention "to ensure all nodes have a
>> >> copy you may not be able to survive the loss of a single node." -- why
>> >> would
>> >> this be the case?
>> >> I assumed (incorrectly?) that a node would simply disappear off the map
>> >> until I could bring it back up again, at which point all the missing
>> >> values
>> >> that it didn't get while it was done, it would slowly retrieve from
>> >> other
>> >> members of the ring. Is this the wrong understanding?
>> >> If forcing a replication factor equal to the number of nodes in my ring
>> >> will cause a hard-stop when one ring goes down (as I understood your
>> >> comment
>> >> to mean), it seems to me I should go with a much lower replication
>> >> factor...
>> >> something along the lines of 3 or roughly ceiling(N / 2) and just deal
>> >> with
>> >> the latency when one of the nodes has to route a request to another
>> >> server
>> >> when it doesn't contain the value.
>> >> Is there a better way to accomplish what I want, or is keeping the
>> >> replication factor that aggressively high generally a bad thing and
>> >> using
>> >> Cassandra in the "wrong" way?
>> >> Thank you for the help.
>> >> -Riyad
>> >>
>> >> On Sun, Nov 6, 2011 at 11:14 PM, chovatia jaydeep
>> >> <chovatia_jayd...@yahoo.co.in> wrote:
>> >>>
>> >>> Hi Riyad,
>> >>> You can set replication = 5 (number of replicas) and write with CL =
>> >>> ONE.
>> >>> There is no hard requirement from Cassandra to write with CL=ALL to
>> >>> replicate the data unless you need it. Considering your example, If
>> >>> you
>> >>> write with CL=ONE then also it will replicate your data to all 5
>> >>> replicas
>> >>> eventually.
>> >>> Thank you,
>> >>> Jaydeep
>> >>> ________________________________
>> >>> From: Riyad Kalla <rka...@gmail.com>
>> >>> To: "user@cassandra.apache.org" <user@cassandra.apache.org>
>> >>> Sent: Sunday, 6 November 2011 9:50 PM
>> >>> Subject: Will writes with < ALL consistency eventually propagate?
>> >>>
>> >>> I am new to Cassandra and was curious about the following scenario...
>> >>>
>> >>> Lets say i have a ring of 5 servers. Ultimately I would like each
>> >>> server
>> >>> to be a full replication of the next (master-master-*).
>> >>>
>> >>> In a presentation i watched today on Cassandra, the presenter
>> >>> mentioned
>> >>> that the ring members will shard data and route your requests to the
>> >>> right
>> >>> host when they come in to a server that doesnt physically contain the
>> >>> value
>> >>> you wanted. To the client requesting this is seamless excwpt for the
>> >>> added
>> >>> latency.
>> >>>
>> >>> If i wanted to avoid the routing and latency and ensure every server
>> >>> had
>> >>> the full data set, do i have to write with a consistency level of ALL
>> >>> and
>> >>> wait for all of those writes to return in my code, or can i write with
>> >>> a CL
>> >>> of 1 or 2 and let the ring propagate the rest of the copies to the
>> >>> other
>> >>> servers in the background after my code has continued executing?
>> >>>
>> >>> I dont mind eventual consistency in my case, but i do (eventually)
>> >>> want
>> >>> all nodes to have all values and cannot tell if this is default
>> >>> behavior, or
>> >>> if sharding is the default and i can only force duplicates onto the
>> >>> other
>> >>> servers explicitly with a CL of ALL.
>> >>>
>> >>> Best,
>> >>> Riyad
>> >>>
>> >>
>> >
>> >
>
>

Re: Will writes with < ALL consistency eventually propagate?

Reply via email to