Re: [openstack-dev] [all] Outcome of distributed lock manager discussion @ the summit

Joshua Harlow Mon, 09 Nov 2015 23:18:27 -0800

On Mon, Nov 9, 2015, at 10:24 PM, Kevin Carter wrote:
> Hello all, 
> 
> The rational behind using a solution like zookeeper makes sense however
> in reviewing the thread I found myself asking if there was a better way
> to address the problem without the addition of a Java based solution as
> the default. While it has been covered that the current implementation
> would be a reference and that "other" driver support in Tooz would allow
> for any backend a deployer may want, the work being proposed within
> devstack [0] would become the default development case thus making it the
> de-facto standard and I think we could do better in terms of supporting
> developers and delivering capability.
> 
> My thoughts on using Redis+Redislock instead of Java+Zookeeper as the
> default option:
> * Tooz already support redislock
> * Redis has an established cluster system known for general ease of use
> and reliability on distributed systems.


This one I somewhat suspect, the clustering support was released about
six months ago:

https://github.com/antirez/redis/blob/3.0/00-RELEASENOTES#L130

So I'm not exactly sure how established (or even well deployed
and tested it is); does anyone have experience with it, configuring it,
handling
its failure modes?? It'd be nice to know how it works (and I'm generally
curious).

> * Several OpenStack projects already support Redis as a backend option or
> have extended capabilities using a Redis.
> * Redis can be implemented in RHEL, SUSE, and DEB based systems with
> ease. 
> * Redis is Opensource software licensed under the "three clause BSD
> license" and would not have any of the same questionable license
> implications as found when dealing with anything Java.
> * The inclusion of Redis would work on a single node allowing developers
> to continue work using VMs running on Laptops with 4GB or ram but would
> also scale to support the multi-controller use case with ease. This would
> also give developers the ability to work on a systems that will actually
> resemble production.
> * Redislock will bring with it no additional developer facing language
> dependencies (Redis is written in ANSI C and works ... without external
> dependencies [1]) while also providing a plethora of language bindings
> [2].
> 
> 
> I apologize for questioning the proposed solution so late into the
> development of this thread and for not making the summit conversations to
> talk more with everyone whom worked on the proposal. While the ship may
> have sailed on this point for now I figured I'd ask why we might go down
> the path of Zookeeper+Java when a solution with likely little to no
> development effort already exists, can support just about any
> production/development environment, has lots of bindings, and (IMHO)
> would integrate with the larger community easier; many OpenStack
> developers and deployers already know Redis. With the inclusion of
> ZK+Java in DevStack and the act of making it the default it essentially
> creates new hard dependencies one of which is Java and I'd like to avoid
> that if at all possible; basically I think we can do better.
> 
> 
> [0] - https://review.openstack.org/#/c/241040/
> [1] - http://redis.io/topics/introduction
> [2] - http://redis.io/topics/distlock
> 
> --
> 
> Kevin Carter
> IRC: cloudnull
> 
> 
> ________________________________________
> From: Fox, Kevin M <kevin....@pnnl.gov>
> Sent: Monday, November 9, 2015 1:54 PM
> To: maishsk+openst...@maishsk.com; OpenStack Development Mailing List
> (not for usage questions)
> Subject: Re: [openstack-dev] [all] Outcome of distributed lock manager
> discussion @ the summit
> 
> Dedicating 3 controller nodes in a small cloud is not the best allocation
> of resources sometimes.  Your thinking of medium to large clouds. Small
> production clouds are a thing too. and at that scale, a little downtime
> if you actually hit the rare case of a node failure on the controller may
> be acceptable. Its up for an OP to decide.
> 
> We've also experienced that sometimes HA software causes more, or longer
> downtimes then it solves sometimes. Due to its complexity, knowledge
> required, proper testing, etc. Again, the risk gets higher the smaller
> the cloud is in some ways.
> 
> Being able to keep it simple and small for that case, then scale with
> switching out pieces as needed does have some tangible benefits.
> 
> Thanks,
> Kevin
> ________________________________________
> From: Maish Saidel-Keesing [mais...@maishsk.com]
> Sent: Monday, November 09, 2015 11:35 AM
> To: OpenStack Development Mailing List (not for usage questions)
> Subject: Re: [openstack-dev] [all] Outcome of distributed lock manager
> discussion @ the summit
> 
> On 11/05/15 23:18, Fox, Kevin M wrote:
> > Your assuming there are only 2 choices,
> >   zk or db+rabbit. I'm claiming both hare suboptimal at present. a 3rd 
> > might be needed. Though even with its flaws, the db+rabbit choice has a few 
> > benefits too.
> >
> > You also seem to assert that to support large clouds, the default must be 
> > something that can scale that large. While that would be nice, I don't 
> > think its a requirement if its overly burdensome on deployers of non huge 
> > clouds.
> >
> > I don't have metrics, but I would be surprised if most deployments today 
> > (production + other) used 3 controllers with a full ha setup. I would guess 
> > that the majority are single controller setups. With those, the
> I think it would be safe to assume - that any kind of production cloud -
> or any operator that considers their OpenStack environment something
> that is close to production ready - would not be daft enough to deploy
> their whole environment based on a single controller - which is a
> whopper of a single point of failure.
> 
> Most Fuel (mirantis) deployments are multiple controllers.
> RHOS also recommends doing multiple controllers.
> 
> I don't think that we as a community can afford to assume that 1
> controller will suffice.
> This does not say that maintaining zk will be any easier though.
> > overhead of maintaining a whole dlm like zk seems like overkill. If 
> > db+rabbit would work for that one case, that would be one less thing to 
> > have to setup for an op. They already have to setup db+rabbit. Or even a 
> > clm plugin of some sort, that won't scale, but would be very easy to 
> > deploy, and change out later when needed would be very useful.
> >
> > etcd is starting to show up in a lot of other projects, and so it may be at 
> > sites already. being able to support it may be less of a burden to 
> > operators then zk in some cases.
> >
> > If your cloud grows to the point where the dlm choice really matters for 
> > scalability/correctness, then you probably have enough staff members to 
> > deal with adding in zk, and that's probably the right choice.
> >
> > You can have multiple suggested things in addition to one default. Default 
> > to the thing that makes the most sense in the common most deployments, and 
> > make specific recommendations for certain scenarios. like, "if greater then 
> > 100 nodes, we strongly recommend using zk" or something to that effect.
> >
> > Thanks,
> > Kevin
> >
> >
> > ________________________________________
> > From: Clint Byrum [cl...@fewbar.com]
> > Sent: Thursday, November 05, 2015 11:44 AM
> > To: openstack-dev
> > Subject: Re: [openstack-dev] [all] Outcome of distributed lock manager  
> > discussion @ the summit
> >
> > Excerpts from Fox, Kevin M's message of 2015-11-04 14:32:42 -0800:
> >> To clarify that statement a little more,
> >>
> >> Speaking only for myself as an op, I don't want to support yet one more 
> >> snowflake in a sea of snowflakes, that works differently then all the 
> >> rest, without a very good reason.
> >>
> >> Java has its own set of issues associated with the JVM. Care, and feeding 
> >> sorts of things. If we are to invest time/money/people in learning how to 
> >> properly maintain it, its easier to justify if its not just a one off for 
> >> just DLM,
> >>
> >> So I wouldn't go so far as to say we're vehemently opposed to java, just 
> >> that DLM on its own is probably not a strong enough feature all on its own 
> >> to justify requiring pulling in java. Its been only a very recent thing 
> >> that you could convince folks that DLM was needed at all. So either make 
> >> java optional, or find some other use cases that needs java badly enough 
> >> that you can make java a required component. I suspect some day 
> >> searchlight might be compelling enough for that, but not today.
> >>
> >> As for the default, the default should be good reference. if most sites 
> >> would run with etc or something else since java isn't needed, then don't 
> >> default zookeeper on.
> >>
> > There are a number of reasons, but the most important are:
> >
> > * Resilience in the face of failures - The current database+MQ based
> >    solutions are all custom made and have unknown characteristics when
> >    there are network partitions and node failures.
> > * Scalability - The current database+MQ solutions rely on polling the
> >    database and/or sending lots of heartbeat messages or even using the
> >    database to store heartbeat transactions. This scales fine for tiny
> >    clusters, but when every new node adds more churn to the MQ and
> >    database, this will (and has been observed to) be intractable.
> > * Tech debt - OpenStack is inventing lock solutions and then maintaining
> >    them. And service discovery solutions, and then maintaining them.
> >    Wouldn't you rather have better upgrade stories, more stability, more
> >    scale, and more featuers?
> >
> > If those aren't compelling enough reasons to deploy a mature java service
> > like Zookeeper, I don't know what would be. But I do think using the
> > abstraction layer of tooz will at least allow us to move forward without
> > having to convince everybody everywhere that this is actually just the
> > path of least resistance.
> >
> >
> 
> --
> Best Regards,
> Maish Saidel-Keesing
> 
> __________________________________________________________________________
> OpenStack Development Mailing List (not for usage questions)
> Unsubscribe:
> openstack-dev-requ...@lists.openstack.org?subject:unsubscribe
> http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev
> 
> __________________________________________________________________________
> OpenStack Development Mailing List (not for usage questions)
> Unsubscribe:
> openstack-dev-requ...@lists.openstack.org?subject:unsubscribe
> http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev
> __________________________________________________________________________
> OpenStack Development Mailing List (not for usage questions)
> Unsubscribe:
> openstack-dev-requ...@lists.openstack.org?subject:unsubscribe
> http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev

__________________________________________________________________________
OpenStack Development Mailing List (not for usage questions)
Unsubscribe: openstack-dev-requ...@lists.openstack.org?subject:unsubscribe
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev

Re: [openstack-dev] [all] Outcome of distributed lock manager discussion @ the summit

Reply via email to