Thanks for your thoughts guys. I agree that with vnodes total downtime is lessened. Although it also seems that the total number of outages (however small) would be greater.
But I think downtime is only lessened up to a certain cluster size. I'm thinking that as the cluster continues to grow: - node rebuild time will max out (a single node only has so much write bandwidth) - the probability of 2 nodes being down at any given time will continue to increase -- even if you consider only non-correlated failures. Therefore, when adding nodes beyond the point where node rebuild time maxes out, both the total number of outages *and* overall downtime would increase? Thanks, Eric On Mon, Dec 10, 2012 at 7:00 AM, Edward Capriolo <edlinuxg...@gmail.com>wrote: > Assuming you need to work with quorum in a non-vnode scenario. That means > that if 2 nodes in a row in the ring are down some number of quorum > operations will fail with UnavailableException (TimeoutException right > after the failures). This is because the for a given range of tokens quorum > will be impossible, but quorum will be possible for others. > > In a vnode world if any two nodes are down, then the intersection of > vnode token ranges they have are unavailable. > > I think it is two sides of the same coin. > > > On Mon, Dec 10, 2012 at 7:41 AM, Richard Low <r...@acunu.com> wrote: > >> Hi Tyler, >> >> You're right, the math does assume independence which is unlikely to be >> accurate. But if you do have correlated failure modes e.g. same power, >> racks, DC, etc. then you can still use Cassandra's rack-aware or DC-aware >> features to ensure replicas are spread around so your cluster can survive >> the correlated failure mode. So I would expect vnodes to improve uptime in >> all scenarios, but haven't done the math to prove it. >> >> Richard. >> > >