One thing that I *think* I've figured out is that the number of "how many replicas can you lose and stay up" is actually n-w for writes, and n-r for reads -
So with n=3 and r=2 and w=2, the loss of two replicas due to AZ failure means that I still *have* my data ("durability") but I might lose _access_ to it ("availability") for a little bit. And with that weird feature that Riak has (the feature's name escapes me for now?) I might even be able to write new data if my cluster figures out that the downed nodes are actually down; I think it just stores the writes on the remaining boxen, and eventually it gets distributed back once the nodes come back. Neat stuff. So after working through all of that, I *think* I actually have an argument I can make for 4 replicas as being somewhat superior to 5. Since I'm on AWS, I can scale by "embiggening" my nodes for a while, until I hit up to around the 128GB RAM boxes; then I can start to double-up on AZ's (to keep things simple, I'd probably go from 4 straight to 8). I would probably - at that point - start to have to do some math to figure out what new 'n' might make sense. Maybe n: 5, r: 3, w: 3? I'll cross that bridge when I come to it (and I know there's all kinds of awful misery with changing 'n' values in a bucket; forcing read-repairs and all kinds of stuff so that your reads and writes don't start failing. But again, by then I might have dedicated minions I could make figure that stuff out). Or maybe there's an inherent advantage to going straight to 8 instead of just 'embiggening'. Again, I'll cross that bridge (probably by talking to you all!) when I come to it. I think the Rack Awareness sounds like a *great* feature - but I'd also love something that's a little more...strict about making sure that my replicas never live on the same node (current advice is that you should have four boxes for an 'n' of 3 to ensure one box doesn't have two copies of data; I'd love it if at some point they could make that guarantee with number of boxes=n. I understand it's being worked-on). Once rack-awareness comes in - or the n=number of boxes fix comes in - I'll probably have to re-ponder my math. That'll be a good problem for me to have, though :) -B. On Tue, Aug 13, 2013 at 8:21 PM, John Eikenberry <j...@zhar.net> wrote: > Brady Wetherington wrote: > > > First off - I know 5 instances is the "magic number" of instances to > have. > > If I understand the thinking here, it's that at the default redundancy > > level ('n'?) of 3, it is most likely to start getting me some scaling > > (e.g., performance > just that of a single node), and yet also have > > redundancy; whereby I can lose one box and not start to take a > performance > > hit. > > With n=3 wouldn't you just need to avoid having more than 2 (of 5) nodes > in the > same zone? With 5 nodes you shouldn't have to worry about replicas being > on the > same node, so if you only have 2 nodes in 1 zone you wouldn't lose data if > you > lost a zone. > > The only place I see there being a problem is in regions with only 2 zones > or > when you need to expand beyond the 2/zone number. Then you just have to do > backups and accept that you will suffer an outage if you lose a zone. > > The cure for all this is having riak get so called "rack awareness" so you > can > configure it to make sure that data is replicated across multiple zones. > This > is supposed to be coming at some point [1]. > > [1] https://github.com/basho/riak/issues/308 > > > My question is - I think I can only do 4 in a way that makes sense. I > only > > have 4 AZ's that I can use right now; AWS won't let me boot instances in > > 1a. My concern is if I try to do 5, I will be "doubling up" in one AZ - > and > > in AWS you're almost as likely to lose an entire AZ as you are a single > > instance. And so, if I have instances doubled-up in one AZ (let's say > > us-east-1e), and then I lose 1e, I've now lost two instances. What are > the > > chances that all three of my replicas of some chunk of my data are on > those > > two instances? I know that it's not guaranteed that all replicas are on > > separate nodes. > > > > So is it better for me to ignore the recommendation of 5 nodes, and just > do > > 4? Or to ignore the fact that I might be doubling-up in one AZ? Also, > > another note. These are designed to be 'durable' nodes, so if one should > go > > down I would expect to bring it back up *with* its data - or, if I > > couldn't, I would do a force-replace or replace and rebuild it from the > > other replicas. I'm definitely not doing instance-store. So I don't know > if > > that mitigates my need for a full 5 nodes. I would also consider losing > one > > node to be "degraded" and would probably seek to fix that problem as soon > > as possible, so I wouldn't expect to be in that situation for long. I > would > > probably tolerate a drop in performance during that time, too. (Not a > > super-severe one, but 20-30 percent? Sure.) > > > > What do you folks think? > > > > -B. > > > _______________________________________________ > > riak-users mailing list > > riak-users@lists.basho.com > > http://lists.basho.com/mailman/listinfo/riak-users_lists.basho.com > > > -- > > John Eikenberry > [ j...@zhar.net - http://zhar.net ] > [ PGP public key @ http://zhar.net/jae_at_zhar_net.gpg ] > ________________________________________________________________________ > Sic gorgiamus allos subjectatos nunc > > _______________________________________________ > riak-users mailing list > riak-users@lists.basho.com > http://lists.basho.com/mailman/listinfo/riak-users_lists.basho.com > >
_______________________________________________ riak-users mailing list riak-users@lists.basho.com http://lists.basho.com/mailman/listinfo/riak-users_lists.basho.com