On Fri, Jul 31, 2015 at 01:47:22AM -0700, Mike Perez wrote: > On Mon, Jul 27, 2015 at 12:35 PM, Gorka Eguileor <gegui...@redhat.com> wrote: > > I know we've all been looking at the HA Active-Active problem in Cinder > > and trying our best to figure out possible solutions to the different > > issues, and since current plan is going to take a while (because it > > requires that we finish first fixing Cinder-Nova interactions), I've been > > looking at alternatives that allow Active-Active configurations without > > needing to wait for those changes to take effect. > > > > And I think I have found a possible solution, but since the HA A-A > > problem has a lot of moving parts I ended up upgrading my initial > > Etherpad notes to a post [1]. > > > > Even if we decide that this is not the way to go, which we'll probably > > do, I still think that the post brings a little clarity on all the > > moving parts of the problem, even some that are not reflected on our > > Etherpad [2], and it can help us not miss anything when deciding on a > > different solution. > > Based on IRC conversations in the Cinder room and hearing people's > opinions in the spec reviews, I'm not convinced the complexity that a > distributed lock manager adds to Cinder for both developers and the > operators who ultimately are going to have to learn to maintain things > like Zoo Keeper as a result is worth it.
Hi Mike, I think you are right in bringing up the cost that adding a DLM to the solution brings to operators, as it is something important to take into consideration, and I would like to say that Ceilometer is already using Tooz so operators are already familiar with these DLM, but unfortunately that would be stretching the truth, since Cinder is present in 73% of OpenStack production workloads while Ceilometer is only in 33% of them, so we would be certainly disturbing some operators. But we must not forget that the only operators that would need to worry about deploying and maintaining the DLM are those wanting to deploy Active-Active configurations (for Active-Passive configuration Tooz will be working with local file locks like we are doing now), and some of those may think like Duncan does: "I already have to administer rabbit, mysql, backends, horizon, load ballancers, rate limiters... adding redis isn't going to make it that much harder". That's why I don't think this is such a big deal for the vast majority of operators. On the developer side I have to disagree, there is no difference between using Tooz and using current oslo synchronization mechanism for non Active-Active deployments. > > **Key point**: We're not scaling Cinder itself, it's about scaling to > avoid build up of operations from the storage backend solutions > themselves. You must also consider that Active-Active solution will help deployments where downtime is not an option or have SLAs with uptime or operational requirements, it's not only about increasing volume of operations and reducing times. > > Whatever people think ZooKeeper "scaling level" is going to accomplish > is not even a question. We don't need it, because Cinder isn't as > complex as people are making it. > > I'd like to think the Cinder team is a great in recognizing potential > cross project initiatives. Look at what Thang Pham has done with > Nova's version object solution. He made a generic solution into an > Oslo solution for all, and Cinder is using it. That was awesome, and > people really appreciated that there was a focus for other projects to > get better, not just Cinder. To be fair, Tooz is just one of those cross project initiatives you are describing, it's a generic solution that can be used in all projects, not just Ceilometer. > > Have people consider Ironic's hash ring solution? The project Akanda > is now adopting it [1], and I think it might have potential. I'd > appreciate it if interested parties could have this evaluated before > the Cinder midcycle sprint next week, to be ready for discussion. > I will have a look at the hash ring solution you mention and see if it makes sense to use it. And I would really love to see the HA A-A discussion enabled for remote people, as some of us are interested in the discussion but won't be able to attend. In my case problems with living in the Old World :-( In a way I have to agree with you that sometimes we make Cinder look more complex than it really is, and in my case the solution I proposed in the post was way too complex as it has been pointed out. I just tried to solve de A-A problem and fix some other issues like recovering lost jobs (those waiting for locks) at the same time. There is an alternative solution I am considering that will be much simpler and will align with Walter's efforts to remove locks from the Volume Manager. I just need to give it a hard think to make sure the solution has all bases covered. The main reason why I am suggesting using Tooz and a DLM is because I think it will allow us to reach Active-Active faster and with less effort, not because I think it will fix all our problems or that we'll have to keep using it forever. It's basically replacing our current local locks. As I see the road of HA A-A for Cinder would look like: Step 1: Get A-A with Tooz locks and a DLM. There are other pieces of the puzzle to solve this, but those pieces will carry on to the final solution. Step 2: Remove locks from the manager, here we'll be keeping locks in drivers. Step 3: See what drivers can work without locks in Active-Passive configurations (for example LVM will still need local file locks to work as seen in bug #1460692) and in Active-Active configurations there may be some file based solutions that require additional locks. Looking for an alternative solution to DLM will require more work and bring more bugs into the code, and for what? After all we are going to get rid of any additional mechanism in the manager and just use the DB to return resource is busy errors. We know that our current locking mechanism works, lets use that to our advantage for a little while. If people still think we should not go with a DLM I'll write a proposal that doesn't need it, but it's going to be more work until we can see an Active-Active configuration working and we'll probably still need a DLM for some drivers. Cheers, Gorka. PS: I have given a good thought to the solution you proposed the other day and I can discuss it now. > [1] - https://review.openstack.org/#/c/195366/ > > -- > Mike Perez > > __________________________________________________________________________ > OpenStack Development Mailing List (not for usage questions) > Unsubscribe: openstack-dev-requ...@lists.openstack.org?subject:unsubscribe > http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev __________________________________________________________________________ OpenStack Development Mailing List (not for usage questions) Unsubscribe: openstack-dev-requ...@lists.openstack.org?subject:unsubscribe http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev