Excerpts from Fox, Kevin M's message of 2015-11-05 13:18:13 -0800: > Your assuming there are only 2 choices, > zk or db+rabbit. I'm claiming both hare suboptimal at present. a 3rd might > be needed. Though even with its flaws, the db+rabbit choice has a few > benefits too. >
Well, I'm assuming it is zk/etcd/consul, because while the java argument is rather religious, the reality is all three are significantly different from databases and message queues and thus will be "snowflakes". But yes, I _am_ assuming that Zookeeper is a natural, logical, simple choice, and that fact that it runs in a jvm is a poor reason to avoid it. > You also seem to assert that to support large clouds, the default must be > something that can scale that large. While that would be nice, I don't think > its a requirement if its overly burdensome on deployers of non huge clouds. > I think the current solution even scales poorly for medium sized clouds. Only the tiniest of clouds with the fewest nodes can really sustain all of that polling without incurring cost for that overhead that would be better spent on serviceing users. > I don't have metrics, but I would be surprised if most deployments today > (production + other) used 3 controllers with a full ha setup. I would guess > that the majority are single controller setups. With those, the overhead of > maintaining a whole dlm like zk seems like overkill. If db+rabbit would work > for that one case, that would be one less thing to have to setup for an op. > They already have to setup db+rabbit. Or even a clm plugin of some sort, that > won't scale, but would be very easy to deploy, and change out later when > needed would be very useful. > We do have metrics: http://www.openstack.org/assets/survey/Public-User-Survey-Report.pdf Page 35, "How many physical compute nodes do OpenStack clouds have?" 10-99: 42% 1-9: 36% 100-999: 15% 1000-9999: 7% So for respondents to that survey, yes, "most" are running less than 100 nodes. However, by compute node count, if we extrapolate a bit: There were 154 respondents so: 10-99 * 42% = 640 - 6403 nodes 1-9 * 36% = 55 - 498 nodes 100-999 * 15% = 2300 - 23076 nodes 1000-9999 * 7% = 10000 - 107789 nodes So in terms of the number of actual computers running OpenStack compute, as an example, from the survey respondents, there are more computes running in *one* of the clouds with more than 1000 nodes than there are in *all* of the clouds with less than 10 nodes, and certainly more in all of the clouds over 1000 nodes, than in all of the clouds with less than 100 nodes. What this means, to me, is that the investment in OpenStack should focus on those with > 1000, since those orgs are definitely investing a lot more today. We shouldn't make it _hard_ to do a tiny cloud, but I think it's ok to make the tiny cloud less efficient if it means we can grow it into a monster cloud at any point and we continue to garner support from orgs who need to build large scale clouds. (I realize I'm biased because I want to build a cloud with more than 1000 nodes ;) > etcd is starting to show up in a lot of other projects, and so it may be at > sites already. being able to support it may be less of a burden to operators > then zk in some cases. > Sure, just like some shops already have postgres and in theory you can still run OpenStack on postgres. But the testing level for postgres support is so abyssmal that I'd be surprised if anybody was actually _choosing_ to do this. I can see this going the same way, where we give everyone a choice, but then end up with almost nobody using any alternative choices because the community has only rallied around the one dominat choice. > If your cloud grows to the point where the dlm choice really matters for > scalability/correctness, then you probably have enough staff members to deal > with adding in zk, and that's probably the right choice. > If your cloud is 40 compute nodes, and three nines (which, lets face it, thats the availability profile of a cloud with one controller), we can just throw Zookeeper up untuned and satisfy the needs. Why would we want to put up a custom homegrown db+mq solution and then force a change later on if the cloud grows? A single code path seems a lot better than multiple code paths, some of which are not really well tested. > You can have multiple suggested things in addition to one default. Default to > the thing that makes the most sense in the common most deployments, and make > specific recommendations for certain scenarios. like, "if greater then 100 > nodes, we strongly recommend using zk" or something to that effect. > Choices are not free either. Just edit that statement there: "We strongly recommend using zk." Nothing about ZK, etcd, or consul, invalidates running on a small cloud. In many ways it makes things simpler, since the user doesn't have to decide on a DLM, but instead just installs the thing we recommend. __________________________________________________________________________ OpenStack Development Mailing List (not for usage questions) Unsubscribe: openstack-dev-requ...@lists.openstack.org?subject:unsubscribe http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev