On Thu, 12 Jun 2014 05:06:38 PM Julien Danjou wrote: > On Thu, Jun 12 2014, Matthew Booth wrote: > > This looks interesting. It doesn't have hooks for fencing, though. > > > > What's the status of tooz? Would you be interested in adding fencing > > hooks? > > It's maintained and developer, we have plan to use it in Ceilometer and > others projects. Joshua also wants to use it for Taskflow. > > We are blocked for now by https://review.openstack.org/#/c/93443/ and by > the lack of resource to complete that request obviously, so help > appreciated. :) > > As for fencing hooks, it sounds like a good idea.
As far as I understand these things, in distributed-locking-speak "fencing" just means "breaking someone else's lock". I think your options here are (and apologies if I'm repeating things that are obvious): 1. Have a "force unlock" protocol (numerous alternatives exist). Assume the lock holder implements it properly and stops accessing the shared resource when asked. 2. Kill the lock holder using some method unrelated to the locking service and wait for the locking protocol to realise ex-holder is dead through usual liveness tests. Assume not being able to hold the lock implies no longer able to access the shared resource. The "liveness test" usually involves the holder pinging the lock service periodically, and everyone has to wait for some agreed timeout before assuming a client is dead. (1) involves a lot of trust - and seems particularly bad if the reason you are breaking the lock is because the holder is misbehaving. Assuming (2) is the only reasonable choice, I don't think the lock service needs explicit support for fencing, since the exact method for killing the holder is unrelated, and relatively uninteresting (probably always going to be an instance delete in OS). Perhaps more interesting is exactly what conditions you require before attempting to kill the lock holder - you wouldn't want just any job deciding it was warranted, or else a misbehaving client would cause mayhem. Again, I suggest your options here are: 1. Require human judgement. ie: provide monitoring for whatever is misbehaving and make it obvious that one mitigation is to nuke the apparent holder. 2. Require the lock breaker to be able to reach a majority of nodes as some proof of "I'm working, my opinion must be right". In a paxos system, reaching a majority of nodes basically becomes "hold a lock", we end back up with "my liveness test is better than yours somehow", and I'm not sure how to resolve that without human judgement (but I'm not familiar with existing approaches). Again, I don't think this needs additional support from the lock service, beyond a liveness test (which zookeeper, for example, has). tl;dr: I'm interested in what sort of automated fencing behaviour you'd like. -- - Gus _______________________________________________ OpenStack-dev mailing list OpenStack-dev@lists.openstack.org http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev