Justin, Thanks a ton for the quick response, that makes perfect sense. It also explains why I've only seen this behavior in our test environment. It may make sense to note this info in the "Replication" node of the wiki, perhaps in the "What does N=3 really mean?" section (there's an example of a 2-node cluster with N=3 in there). I think our case of "Let's quickly test this thing out on more than 1 box with default config" is probably common and this behavior could lead some to believe that Riak isn't replicating properly.
Thanks again. -Alan On Mon, Jun 14, 2010 at 12:54 PM, Justin Sheehy <jus...@basho.com> wrote: > Hi, Alan. > > Your replicas do in fact exist on both nodes. However, I understand > that the situation you are observing is confusing. I will attempt to > explain. > > Quite some time ago, something surprising was noticed by some of our > users during their pre-production testing. Some intentional failure > scenarios (with busted nodes, etc) would fail much more slowly when > R=1 than when R=2. This was due to the fact that to satisfy a R=1 > request with a non-object response (timeout or notfound), we would > wait for all N nodes to reply. With R=2, we could send this response > as soon as N-1 nodes reply. In some situations this is a dramatic > difference in time. > > To remove this perceived problem we implemented what we refer to as > "basic quorum". If a simple majority of vnodes have produced > non-successful internal replies, we return a non-success value such as > a notfound. This means that if there is only one copy of the object > out there, and the node holding it is slowest to respond, the client > will not see that object in their response but will instead get the > notfound instead of waiting for the last node to respond or time out. > > (note that read-repair will still occur in any case) > > This could be avoided if we considered "not found" to be a success > condition, but then in the above situation you would see not founds > even with R=2. That would simply be defined as another kind of > "successful" response. Either way, it is a tradeoff of different > kinds of surprise. > > I hope that this explanation helps with your understanding. > > On another note, it's not useful to run Riak with a number of physical > hosts less than your N value unless you're planning on expanding it > soon. So: testing with 2 hosts and N=3 means that you are testing > against a very much not-recommended configuration. I suggest either > using more hosts or else changing your default bucket N value to 2. > > -Justin > > > On Mon, Jun 14, 2010 at 1:59 PM, Alan McConnell <ala...@swingvine.com> > wrote: > > Hey Dan, > > I have a 2-node cluster with default bucket settings (N=3, etc.), and if > I > > take one of the boxes down (and perform reads with R=1) I get tons of > "key > > not found" errors for keys I know exist in the cluster. Seems like for > many > > keys, all 3 replicas live on one host. From what you've written here > > though, it seems like that should not happen. Do you know of any way my > > cluster could have gotten into this state? > > I did run a restore on this cluster using a riak-admin backup from a > > different, single-node cluster. I wonder if that caused an uneven > > distribution. > > Any help would be appreciated. As it stands now our 2-node cluster has > > serious read problems if either node goes down. > > -Alan >
_______________________________________________ riak-users mailing list riak-users@lists.basho.com http://lists.basho.com/mailman/listinfo/riak-users_lists.basho.com