Your assuming there are only 2 choices,
 zk or db+rabbit. I'm claiming both hare suboptimal at present. a 3rd might be 
needed. Though even with its flaws, the db+rabbit choice has a few benefits too.

You also seem to assert that to support large clouds, the default must be 
something that can scale that large. While that would be nice, I don't think 
its a requirement if its overly burdensome on deployers of non huge clouds.

I don't have metrics, but I would be surprised if most deployments today 
(production + other) used 3 controllers with a full ha setup. I would guess 
that the majority are single controller setups. With those, the overhead of 
maintaining a whole dlm like zk seems like overkill. If db+rabbit would work 
for that one case, that would be one less thing to have to setup for an op. 
They already have to setup db+rabbit. Or even a clm plugin of some sort, that 
won't scale, but would be very easy to deploy, and change out later when needed 
would be very useful.

etcd is starting to show up in a lot of other projects, and so it may be at 
sites already. being able to support it may be less of a burden to operators 
then zk in some cases.

If your cloud grows to the point where the dlm choice really matters for 
scalability/correctness, then you probably have enough staff members to deal 
with adding in zk, and that's probably the right choice.

You can have multiple suggested things in addition to one default. Default to 
the thing that makes the most sense in the common most deployments, and make 
specific recommendations for certain scenarios. like, "if greater then 100 
nodes, we strongly recommend using zk" or something to that effect.

Thanks,
Kevin


________________________________________
From: Clint Byrum [cl...@fewbar.com]
Sent: Thursday, November 05, 2015 11:44 AM
To: openstack-dev
Subject: Re: [openstack-dev] [all] Outcome of distributed lock manager  
discussion @ the summit

Excerpts from Fox, Kevin M's message of 2015-11-04 14:32:42 -0800:
> To clarify that statement a little more,
>
> Speaking only for myself as an op, I don't want to support yet one more 
> snowflake in a sea of snowflakes, that works differently then all the rest, 
> without a very good reason.
>
> Java has its own set of issues associated with the JVM. Care, and feeding 
> sorts of things. If we are to invest time/money/people in learning how to 
> properly maintain it, its easier to justify if its not just a one off for 
> just DLM,
>
> So I wouldn't go so far as to say we're vehemently opposed to java, just that 
> DLM on its own is probably not a strong enough feature all on its own to 
> justify requiring pulling in java. Its been only a very recent thing that you 
> could convince folks that DLM was needed at all. So either make java 
> optional, or find some other use cases that needs java badly enough that you 
> can make java a required component. I suspect some day searchlight might be 
> compelling enough for that, but not today.
>
> As for the default, the default should be good reference. if most sites would 
> run with etc or something else since java isn't needed, then don't default 
> zookeeper on.
>

There are a number of reasons, but the most important are:

* Resilience in the face of failures - The current database+MQ based
  solutions are all custom made and have unknown characteristics when
  there are network partitions and node failures.
* Scalability - The current database+MQ solutions rely on polling the
  database and/or sending lots of heartbeat messages or even using the
  database to store heartbeat transactions. This scales fine for tiny
  clusters, but when every new node adds more churn to the MQ and
  database, this will (and has been observed to) be intractable.
* Tech debt - OpenStack is inventing lock solutions and then maintaining
  them. And service discovery solutions, and then maintaining them.
  Wouldn't you rather have better upgrade stories, more stability, more
  scale, and more featuers?

If those aren't compelling enough reasons to deploy a mature java service
like Zookeeper, I don't know what would be. But I do think using the
abstraction layer of tooz will at least allow us to move forward without
having to convince everybody everywhere that this is actually just the
path of least resistance.

__________________________________________________________________________
OpenStack Development Mailing List (not for usage questions)
Unsubscribe: openstack-dev-requ...@lists.openstack.org?subject:unsubscribe
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev

__________________________________________________________________________
OpenStack Development Mailing List (not for usage questions)
Unsubscribe: openstack-dev-requ...@lists.openstack.org?subject:unsubscribe
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev

Reply via email to