Re: Will writes with < ALL consistency eventually propagate?

Stephen Connolly Mon, 07 Nov 2011 11:43:48 -0800

at that point, your cluster will either have so much data on each node that
you will need to split them, keeping rf=5 so you have 10 nodes... or the
intra cluster traffic will swap you and you will split each node keeping
rf=5 so you have 10 nodes again.


safest thing is not to design with the assumption that rf=n

- Stephen

---
Sent from my Android phone, so random spelling mistakes, random nonsense
words and other nonsense are a direct result of using swype to type on the
screen
On 7 Nov 2011 17:47, "Riyad Kalla" <rka...@gmail.com> wrote:

> Stephen,
>
> I appreciate you making the point more strongly; I won't make this
> decision lightly given the stress you are putting on it, but the technical
> aspects of this make me curious...
>
> If I start with RF=N (number of nodes) now, and in 2 years
> (hypothetically) my dataset is too large and I say to myself "Dangit,
> Stephen was right...", couldn't I just change the RF to some smaller value,
> say "3" at that point or would the Cassandra ring not rebalance the data
> set nicely at that point?
>
> More specifically, would it not know how best to slowly remove extraneous
> copies from the nodes and make the data more sparse among the ring members?
>
> Thanks for the hand-holding; it is helping me understand the operational
> landscape quickly.
>
> -R
>
> On Mon, Nov 7, 2011 at 10:18 AM, Stephen Connolly <
> stephen.alan.conno...@gmail.com> wrote:
>
>> Plan for the future....
>>
>> At some point your data set will become too big for the node that it
>> is running on, or your load will force you to split nodes.... once you
>> do that RF < N
>>
>> To solve performance issues with C* the solution is add more nodes
>>
>> To solve storage issues with C* the solution is add more nodes
>>
>> In most cases the solution in C* is add more nodes.
>>
>> Don't assume RF=Number of nodes as a core design decision of your
>> application and you will not have your ass bitten
>>
>> ;-)
>>
>> -Stephen
>> P.S. making the point more extreme to make it clear
>>
>> On 7 November 2011 15:04, Riyad Kalla <rka...@gmail.com> wrote:
>> > Stephen,
>> > Excellent breakdown; I appreciate all the detail.
>> > Your last comment about RF being smaller than N (number of nodes) -- in
>> my
>> > particular case my data set isn't particularly large (a few GB) and is
>> > distributed globally across a handful of data centers. What I am
>> utilizing
>> > Cassandra for is the replication in order to minimize latency for
>> requests.
>> > So when a request comes into any location, I want each node in the ring
>> to
>> > contain the full data set so it never needs to defer to another member
>> of
>> > the ring to answer a question (even if this means eventually
>> consistency,
>> > that is alright in my case).
>> > Given that, the way I've understood this discussion so far is I would
>> have a
>> > RF of N (my total node count) but my Consistency Level with all my
>> writes
>> > will *likely* be QUORUM -- I think that is a good/safe default for me
>> to use
>> > as writes aren't the scenario I need to optimize for latency; that being
>> > said, I also don't want to wait for a ConsistencyLevel of ALL to
>> complete
>> > before my code continues though.
>> > Would you agree with this assessment or am I missing the boat on
>> something?
>> > Best,
>> > Riyad
>> >
>> > On Mon, Nov 7, 2011 at 7:42 AM, Stephen Connolly
>> > <stephen.alan.conno...@gmail.com> wrote:
>> >>
>> >> Consistency Level is a pseudo-enum...
>> >>
>> >> you have the choice between
>> >>
>> >> ONE
>> >> Quorum (and there are different types of this)
>> >> ALL
>> >>
>> >> At CL=ONE, only one node is guaranteed to have got the write if the
>> >> operation is a success.
>> >> At CL=ALL, all nodes that the RF says it should be stored at must
>> >> confirm the write before the operation succeeds, but a partial write
>> >> will succeed eventually if at least one node recorded the write
>> >> At CL=QUORUM, at least ((N/2)+1) nodes must confirm the write for the
>> >> operation to succeed, otherwise failure, but a partial write will
>> >> succeed eventually if at least one node recorded the write.
>> >>
>> >> Read repair will eventually ensure that the write is replicated across
>> >> all RF nodes in the cluster.
>> >>
>> >> The N in QUORUM above depends on the type of QUORUM you choose, in
>> >> general think N=RF unless you choose a fancy QUORUM.
>> >>
>> >> To have a consistent read, CL of write + CL of read must be > RF...
>> >>
>> >> Write at ONE, read at ONE => may not get the most recent write if RF >
>> >> 1 [fastest write, fastest read] {data loss possible if node lost
>> >> before read repair}
>> >> Write at QUORUM, read at ONE => consistent read [moderate write,
>> >> fastest read] {multiple nodes must be lost for data loss to be
>> >> possible}
>> >> Write at ALL, read at ONE => consistent read, writes may be blocked if
>> >> any node fails [slowest write, fastest read]
>> >>
>> >> Write at ONE, read at QUORUM => may not get the most recent write if
>> >> RF > 2 [fastest write, moderate read]  {data loss possible if node
>> >> lost before read repair}
>> >> Write at QUORUM, read at QUORUM => consistent read [moderate write,
>> >> moderate read] {multiple nodes must be lost for data loss to be
>> >> possible}
>> >> Write at ALL, read at QUORUM => consistent read, writes may be blocked
>> >> if any node fails [slowest write, moderate read]
>> >>
>> >> Write at ONE, read at ALL => consistent read, reads may fail if any
>> >> node fails [fastest write, slowest read] {data loss possible if node
>> >> lost before read repair}
>> >> Write at QUORUM, read at ALL => consistent read, reads may fail if any
>> >> node fails [moderate write, slowest read] {multiple nodes must be lost
>> >> for data loss to be possible}
>> >> Write at ALL, read at ALL => consistent read, writes may be blocked if
>> >> any node fails, reads may fail if any node fails [slowest write,
>> >> slowest read]
>> >>
>> >> Note: You can choose the CL for each and every operation. This is
>> >> something that you should design into your application (unless you
>> >> exclusively use QUORUM for all operations, in which case you are
>> >> advised to bake the logic in, but it is less necessary)
>> >>
>> >> The other thing to remember is that RF does not have to equal the
>> >> number of nodes in your cluster... in fact I would recommend designing
>> >> your app on the basis that RF < number of nodes in your cluster...
>> >> because at some point, when your data set grows big enough, you will
>> >> end up with RF < number of nodes.
>> >>
>> >> -Stephen
>> >>
>> >> On 7 November 2011 13:03, Riyad Kalla <rka...@gmail.com> wrote:
>> >> > Ah! Ok I was interpreting what you were saying to mean that if my RF
>> was
>> >> > too
>> >> > high, then the ring would die if I lost one.
>> >> > Ultimately what I want (I think) is:
>> >> > Replication Factor: 5 (aka "all of my nodes")
>> >> > Consistency Level: 2
>> >> > Put another way, when I write a value, I want it to exist on two
>> servers
>> >> > *at
>> >> > least* before I consider that write "successful" enough for my code
>> to
>> >> > continue, but in the background I would like Cassandra to keep
>> copying
>> >> > that
>> >> > value around at its leisure until all the ring nodes know about it.
>> >> > This sounds like what I need. Thanks for pointing me in the right
>> >> > direction.
>> >> > Best,
>> >> > Riyad
>> >> >
>> >> > On Mon, Nov 7, 2011 at 5:47 AM, Anthony Ikeda
>> >> > <anthony.ikeda....@gmail.com>
>> >> > wrote:
>> >> >>
>> >> >> Riyad, I'm also just getting to know the different settings and
>> values
>> >> >> myself :)
>> >> >> I believe, and it also depends on your config, CL.ONE Should ignore
>> the
>> >> >> loss of a node if your RF is 5, once you increase the CL then if you
>> >> >> lose a
>> >> >> node the CL is not met and you will get exceptions returned.
>> >> >> Sent from my iPhone
>> >> >> On 07/11/2011, at 4:32, Riyad Kalla <rka...@gmail.com> wrote:
>> >> >>
>> >> >> Anthony and Jaydeep, thank you for weighing in. I am glad to see
>> that
>> >> >> they
>> >> >> are two different values (makes more sense mentally to me).
>> >> >> Anthony, what you said caught my attention "to ensure all nodes
>> have a
>> >> >> copy you may not be able to survive the loss of a single node." --
>> why
>> >> >> would
>> >> >> this be the case?
>> >> >> I assumed (incorrectly?) that a node would simply disappear off the
>> map
>> >> >> until I could bring it back up again, at which point all the missing
>> >> >> values
>> >> >> that it didn't get while it was done, it would slowly retrieve from
>> >> >> other
>> >> >> members of the ring. Is this the wrong understanding?
>> >> >> If forcing a replication factor equal to the number of nodes in my
>> ring
>> >> >> will cause a hard-stop when one ring goes down (as I understood your
>> >> >> comment
>> >> >> to mean), it seems to me I should go with a much lower replication
>> >> >> factor...
>> >> >> something along the lines of 3 or roughly ceiling(N / 2) and just
>> deal
>> >> >> with
>> >> >> the latency when one of the nodes has to route a request to another
>> >> >> server
>> >> >> when it doesn't contain the value.
>> >> >> Is there a better way to accomplish what I want, or is keeping the
>> >> >> replication factor that aggressively high generally a bad thing and
>> >> >> using
>> >> >> Cassandra in the "wrong" way?
>> >> >> Thank you for the help.
>> >> >> -Riyad
>> >> >>
>> >> >> On Sun, Nov 6, 2011 at 11:14 PM, chovatia jaydeep
>> >> >> <chovatia_jayd...@yahoo.co.in> wrote:
>> >> >>>
>> >> >>> Hi Riyad,
>> >> >>> You can set replication = 5 (number of replicas) and write with CL
>> =
>> >> >>> ONE.
>> >> >>> There is no hard requirement from Cassandra to write with CL=ALL to
>> >> >>> replicate the data unless you need it. Considering your example, If
>> >> >>> you
>> >> >>> write with CL=ONE then also it will replicate your data to all 5
>> >> >>> replicas
>> >> >>> eventually.
>> >> >>> Thank you,
>> >> >>> Jaydeep
>> >> >>> ________________________________
>> >> >>> From: Riyad Kalla <rka...@gmail.com>
>> >> >>> To: "user@cassandra.apache.org" <user@cassandra.apache.org>
>> >> >>> Sent: Sunday, 6 November 2011 9:50 PM
>> >> >>> Subject: Will writes with < ALL consistency eventually propagate?
>> >> >>>
>> >> >>> I am new to Cassandra and was curious about the following
>> scenario...
>> >> >>>
>> >> >>> Lets say i have a ring of 5 servers. Ultimately I would like each
>> >> >>> server
>> >> >>> to be a full replication of the next (master-master-*).
>> >> >>>
>> >> >>> In a presentation i watched today on Cassandra, the presenter
>> >> >>> mentioned
>> >> >>> that the ring members will shard data and route your requests to
>> the
>> >> >>> right
>> >> >>> host when they come in to a server that doesnt physically contain
>> the
>> >> >>> value
>> >> >>> you wanted. To the client requesting this is seamless excwpt for
>> the
>> >> >>> added
>> >> >>> latency.
>> >> >>>
>> >> >>> If i wanted to avoid the routing and latency and ensure every
>> server
>> >> >>> had
>> >> >>> the full data set, do i have to write with a consistency level of
>> ALL
>> >> >>> and
>> >> >>> wait for all of those writes to return in my code, or can i write
>> with
>> >> >>> a CL
>> >> >>> of 1 or 2 and let the ring propagate the rest of the copies to the
>> >> >>> other
>> >> >>> servers in the background after my code has continued executing?
>> >> >>>
>> >> >>> I dont mind eventual consistency in my case, but i do (eventually)
>> >> >>> want
>> >> >>> all nodes to have all values and cannot tell if this is default
>> >> >>> behavior, or
>> >> >>> if sharding is the default and i can only force duplicates onto the
>> >> >>> other
>> >> >>> servers explicitly with a CL of ALL.
>> >> >>>
>> >> >>> Best,
>> >> >>> Riyad
>> >> >>>
>> >> >>
>> >> >
>> >> >
>> >
>> >
>>
>
>

Re: Will writes with < ALL consistency eventually propagate?

Reply via email to