Got it. Thanks for the responses and the patience. Tim -----Original Message----- From: "Joseph Blomstedt" <j...@basho.com> Sent: Thursday, January 5, 2012 2:14pm To: "Tim Robinson" <t...@blackstag.com> Cc: "Aphyr" <ap...@aphyr.com>, riak-users@lists.basho.com Subject: Re: Absolute consistency
Internet went down as I was writing an email. Looks like everyone already did a great job answering the availability issues. Although, I might as well chime in as a Basho engineer. On a side note, it looks like we've completely highjacked the "Absolute consistency" question initially proposed. I'll email out some thoughts on that later. > Why would anyone ever want 2 copies on one physical PC? You wouldn't want 2 copies on one machine. But, if a user requests 3 replicas, and there are only 2 machines, then that third replica has to go somewhere. This is a common scenario during initial data loading, development, and testing/evaluation of Riak. For example, people often start up a single-node RIak cluster, load up some data, and then add 2 or more nodes to turn it into a properly sized cluster. Before the new nodes are added, all the replicas live on the single-node. As nodes are added, replicas will move to the additional nodes to ensure availability. Simply put, unless you have enough machines to hold all your requested replicas, there isn't much RIak can do for you. You could certainly argue that perhaps you could merge replicas and only have 2 written in the reduced node case, and then copy a replica as enough machines are added to match/exceed the replica count. But, that's additional complexity for very minor gain. I would rather Riak have a single easily understood operating mode that you can rely on to understand your availability guarantees then have alternative operating modes depending on cluster sizing. If you want to run a Riak cluster with N=3, you should have 3+ nodes. With that said, Kyle correctly mentioned edge cases where having more nodes than N could still leave to reduced availability. Specifically, if the number of nodes does not cleanly divide the ring size, then there may be reduced availability preference lists at the wrap-around point of the Riak ring. For example, a 64 partition ring with 4 nodes won't have this problem; but a 64 partition ring with 3 nodes may. This is considered a bug, and is documented at: https://issues.basho.com/show_bug.cgi?id=228 As listed on the issue, there are operational workarounds that ensure this doesn't occur. Such as going with 64/4 rather than 64/3. Fixing the issue entirely is something we at Basho are working towards. The new ring claim algorithm in the upcoming release of Riak makes the wrap-around issue much less likely anytime you have more than N nodes. A future release will address the issue more directly. -Joe On Thu, Jan 5, 2012 at 1:12 PM, Tim Robinson <t...@blackstag.com> wrote: > Thank you for this info. I'm still somewhat confused. > > Why would anyone ever want 2 copies on one physical PC? Correct me if I am > wrong, but part of the sales pitch for Riak is that the cost of hardware is > lessened by distributing your data across a cluster of less expensive > machines as opposed to having it all one reside on an enormous server with > very little redundancy. > > The 2 copies of data on one physical PC provides no redundancy, but increases > hardware costs quite a bit. > > Right? > > Thanks, > Tim > > -----Original Message----- > From: "Aphyr" <ap...@aphyr.com> > Sent: Thursday, January 5, 2012 1:01pm > To: "Tim Robinson" <t...@blackstag.com> > Cc: "Runar Jordahl" <runar.jord...@gmail.com>, riak-users@lists.basho.com > Subject: Re: Absolute consistency > > On 01/05/2012 11:44 AM, Tim Robinson wrote: >> Ouch. >> >> I'm shocked that is not considered a major bug. At minimum that kind of >> stuff should be front and center in their wiki/docs. Here I am thinking n 2 >> on a 3 node cluster means I'm covered when in fact I am not. It's the whole >> reason I gave Riak consideration. >> >> Tim > > I think you may have this backwards. N=3 and 2 nodes would mean one node > has 1 copy, and 1 node has 2 copies, of any given piece. For n=2 and 3 > nodes, there should be no overlap. > > The other thing to consider is that for certain combinations of > partition number P and node number N, distributing partitions mod N can > result in overlaps at the edge of the ring. This means zero to n > preflists can overlap on some nodes. That means n=3 can, *with the wrong > choice of N and P*, result in minimum 2 machines having copies of any > given key, assuming P > N. > > There are also failure modes to consider. I haven't read the new key > balancing algo, so my explanation may be out of date. > > --Kyle > > > Tim Robinson > > > > _______________________________________________ > riak-users mailing list > riak-users@lists.basho.com > http://lists.basho.com/mailman/listinfo/riak-users_lists.basho.com -- Joseph Blomstedt <j...@basho.com> Software Engineer Basho Technologies, Inc. http://www.basho.com/ Tim Robinson _______________________________________________ riak-users mailing list riak-users@lists.basho.com http://lists.basho.com/mailman/listinfo/riak-users_lists.basho.com