On Thu, Aug 4, 2016 at 11:14 PM, Russell Bryant <russ...@ovn.org> wrote:
> > > On Thu, Aug 4, 2016 at 8:17 PM, Andy Zhou <az...@ovn.org> wrote: > >> >> On Wed, Jul 27, 2016 at 1:04 PM, Andy Zhou <az...@ovn.org> wrote: >> >>> >>> >>> On Tue, Jul 26, 2016 at 6:20 PM, Russell Bryant <russ...@ovn.org> wrote: >>> >>>> >>>> >>>> On Tue, Jul 26, 2016 at 3:48 PM, Andy Zhou <az...@ovn.org> wrote: >>>> >>>>> >>>>> >>>>> On Tue, Jul 26, 2016 at 11:59 AM, Russell Bryant <russ...@ovn.org> >>>>> wrote: >>>>> >>>>>> >>>>>> >>>>>> On Tue, Jul 26, 2016 at 2:41 PM, Andy Zhou <az...@ovn.org> wrote: >>>>>> >>>>>>> >>>>>>> >>>>>>> On Tue, Jul 26, 2016 at 5:37 AM, Russell Bryant <russ...@ovn.org> >>>>>>> wrote: >>>>>>> >>>>>>>> >>>>>>>> >>>>>>>> On Mon, Jul 25, 2016 at 8:15 PM, Andy Zhou <az...@ovn.org> wrote: >>>>>>>> >>>>>>>>> Hi, Rayn and Russell, >>>>>>>>> >>>>>>>> >>>>>>>> Can we move this discussion to the ovs dev mailing list? Feel free >>>>>>>> to just add it in a reply if you'd like. >>>>>>>> >>>>>>> Done. >>>>>>> >>>>>>>> >>>>>>>> >>>>>>>>> I am wondering how we can actually use the active/backup feature >>>>>>>>> that is now part of >>>>>>>>> OVSDB to increase OVN availability. >>>>>>>>> >>>>>>>> >>>>>>>> TO be clear, I haven't actually tried this yet. I'm only speaking >>>>>>>> about how I think it should work. >>>>>>>> >>>>>>>> >>>>>>>>> Specifically: >>>>>>>>> >>>>>>>>> 1. When the active OVSDB server failed, should the back up server >>>>>>>>> take over, and allow write transactions? One simpler possibility is to >>>>>>>>> allow read only access to the backup serve. >>>>>>>>> >>>>>>>> >>>>>>>> The backup server needs to take over. It's OK if that requires >>>>>>>> intervention by an HA manager like Pacemaker. If we can't make the >>>>>>>> passive >>>>>>>> server take over, I'd say the solution is incomplete. >>>>>>>> >>>>>>> >>>>>>> O.K. make sense. >>>>>>> >>>>>>> One possible issue with backup server taking over is "split head". >>>>>>> In case due to network error, backup server becomes disconnected from >>>>>>> the >>>>>>> active >>>>>>> server, then we may have both server thinking they are active server >>>>>>> now. Does Pacemaker help with solving this issue. >>>>>>> >>>>>> >>>>>> It can, yes. I would expect Pacemaker to explicitly configure a node >>>>>> to be either the active or passive node. >>>>>> >>>>> Manual switching is more straight forward. I agree. >>>>> >>>>>> >>>>>>>> >>>>>>>>> 2. When a crashed active OVSDB server recovers, should it become >>>>>>>>> the new backup, or it should switch back. >>>>>>>>> >>>>>>>> >>>>>>>> Becoming the new backup is fine. Again, this can be orchestrated >>>>>>>> by an HA manager (Pacemaker). >>>>>>>> >>>>>>> I am not familiar with pacemaker. Can I assume it can provide a >>>>>>> correct --sync-from argument (pointing to backup server) when relaunch >>>>>>> OVSDB server? >>>>>>> >>>>>> >>>>>> Yes. I'd have to consult with some Pacemaker experts on exactly what >>>>>> the implementation would look like, but roughly: >>>>>> >>>>>> Pacemaker manages services using "OCF Resource Agents", which are >>>>>> just scripts with a defined set of inputs and outputs for service >>>>>> management. I would imagine a Pacemaker cluster being told it must have >>>>>> exactly 1 active and 1 passive OVSDB service. When the passive OVSDB >>>>>> service is started, it would include the "sync-from" argument based on >>>>>> where the active OVSDB service is currently running. >>>>>> >>>>>> We really need to prototype this and document it. I'm guessing too >>>>>> much. Pacemaker is frequently used to manage active/passive HA, though. >>>>>> >>>>>> Sounds reasonable, I will work on ovsdb internal changes to support >>>>> manual switching, using appctl commands. Then looking into prototyping >>>>> with >>>>> HA systems. I have not used pacemaker in the past, so it may take some >>>>> time to ramp up. >>>>> >>>> >>>> I should be able to help. We need to do this work anyway for >>>> integration into OpenStack deployment tools. Let me see if I can get some >>>> helpful examples to follow. >>>> >>> >>> Thanks for helping out. >>> >>> Given that, I now plan to work from bottom up, initially focusing on >>> ovsdb server changes. >>> >>> 1. Add a state in ovsdb-server for it to know whether it is an active >>> server. Backup server will not accept any connections. Server started with >>> --sync-from argument will be put in the back state by default. >>> >>> 2. Add appctl commands to allow manually switch state. >>> >>> 3. Add a new table for backup server to register its address and ports. >>> OVSDB clients can learn about them at run time. Back up server should issue >>> an >>> transaction to register its address before issuing the monitoring >>> request. This feature is not strictly necessary, and can be pushed to HA >>> manager, >>> but having it built into ovsdb-server may make it simpler for >>> integrationl. >>> >>> What do you think? >>> >>> >>> >> Russell, Would HA manager also manage ovn-controller switch over? >> > > Yes, indirectly. The way this is typically handled is by using a virtual > IP that moves to whatever host is currently the master > Cool, then ovn-controller does not have to be HA aware. > > > > -- > Russell Bryant > _______________________________________________ dev mailing list dev@openvswitch.org http://openvswitch.org/mailman/listinfo/dev