good work Rohit, I'll review 2508 https://github.com/apache/cloudstack/pull/2508
On Tue, May 1, 2018 at 12:08 PM, Rohit Yadav <rohit.ya...@shapeblue.com> wrote: > All, > > > A short-term solution to VR upgrade or network restart (with cleanup=true) > has been implemented: > > > - The strategy for redundant VRs builds on top of Wei's original patch > where backup routers are removed and replace in a rolling basis. The > downtime I saw was usually 0-2 seconds, and theoretically downtime is > maximum of [0, 3*advertisement interval + skew seconds] or 0-10 seconds > (with cloudstack's default of 1s advertisement interval). > > > - For non-redundant routers, I've implemented a strategy where first a new > VR is deployed, then old VR is powered-off/destroyed, and the new VR is > again re-programmed. With this strategy, two identical VRs may be up for a > brief moment (few seconds) where both can serve traffic, however the new VR > performs arp-ping on its interfaces to update neighbours. After the old VR > is removed, the new VR is re-programmed which among many things performs > another arpping. The theoretical downtime is therefore limited by the > arp-cache refresh which can be up to 30 seconds. In my experiments, against > various VMware, KVM and XenServer versions I found that the downtime was > indeed less than 30s, usually between 5-20 seconds. Compared to older ACS > versions, especially in cases where VRs deployment require full volume copy > (like in VMware) a 10x-12x improvement was seen. > > > Please review, test the following PRs which has test details, benchmarks, > and some screenshots: > > https://github.com/apache/cloudstack/pull/2508 > > > Future work can be driven towards making all VRs redundant enabled by > default that can allow for a firewall+connections state transfer > (conntrackd + VRRP2/3 based) during rolling reboots. > > > - Rohit > > <https://cloudstack.apache.org> > > > > ________________________________ > From: Daan Hoogland <daan.hoogl...@gmail.com> > Sent: Thursday, February 8, 2018 3:11:51 PM > To: dev > Subject: Re: [DISCUSS] VR upgrade downtime reduction > > to stop the vote and continue the discussion. I personally want unification > of all router vms: VR, 'shared network', rVR, VPC, rVPC, and eventually the > one we want to create for 'enterprise topology hand-off points'. And I > think we have some level of consensus on that but the path there is a > concern for Wido and for some of my colleagues as well, and rightly so. One > issue is upgrades from older versions. > > I the common scenario as follows: > + redundancy is deprecated and only number of instances remain. > + an old VR is replicated in memory by an redundant enabled version, that > will be in a state of running but inactive. > - the old one will be destroyed while a ping is running > - as soon as the ping fails more then three times in a row (this might have > to have a hypervisor specific implementation or require a helper vm) > + the new one is activated > > after this upgrade Wei's and/or Remi's code will do the work for any > following upgrade. > > flames, please > > > > On Wed, Feb 7, 2018 at 12:17 PM, Nux! <n...@li.nux.ro> wrote: > > > +1 too > > > > -- > > Sent from the Delta quadrant using Borg technology! > > > > Nux! > > www.nux.ro > > > > > rohit.ya...@shapeblue.com > www.shapeblue.com > 53 Chandos Place, Covent Garden, London WC2N 4HSUK > @shapeblue > > > > ----- Original Message ----- > > > From: "Rene Moser" <m...@renemoser.net> > > > To: "dev" <dev@cloudstack.apache.org> > > > Sent: Wednesday, 7 February, 2018 10:11:45 > > > Subject: Re: [DISCUSS] VR upgrade downtime reduction > > > > > On 02/06/2018 02:47 PM, Remi Bergsma wrote: > > >> Hi Daan, > > >> > > >> In my opinion the biggest issue is the fact that there are a lot of > > different > > >> code paths: VPC versus non-VPC, VPC versus redundant-VPC, etc. That's > > why you > > >> cannot simply switch from a single VPC to a redundant VPC for example. > > >> > > >> For SBP, we mitigated that in Cosmic by converting all non-VPCs to a > > VPC with a > > >> single tier and made sure all features are supported. Next we merged > > the single > > >> and redundant VPC code paths. The idea here is that redundancy or not > > should > > >> only be a difference in the number of routers. Code should be the > same. > > A > > >> single router, is also "master" but there just is no "backup". > > >> > > >> That simplifies things A LOT, as keepalived is now the master of the > > whole > > >> thing. No more assigning ip addresses in Python, but leave that to > > keepalived > > >> instead. Lots of code deleted. Easier to maintain, way more stable. We > > just > > >> released Cosmic 6 that has this feature and are now rolling it out in > > >> production. Looking good so far. This change unlocks a lot of > > possibilities, > > >> like live upgrading from a single VPC to a redundant one (and back). > In > > the > > >> end, if the redundant VPC is rock solid, you most likely don't even > > want single > > >> VPCs any more. But that will come. > > >> > > >> As I said, we're rolling this out as we speak. In a few weeks when > > everything is > > >> upgraded I can share what we learned and how well it works. CloudStack > > could > > >> use a similar approach. > > > > > > +1 Pretty much this. > > > > > > René > > > > > > -- > Daan > -- Daan