Re: [openstack-dev] [octavia][upgrades] upgrade loadbalancer to new amphora image

Lingxian Kong Sun, 03 Jul 2016 03:45:43 -0700

Hi guys, in case you are interested, here is a script that will do the
amphora upgrade automatically (ok, it's not totally automatic, need
two inputs).


https://github.com/LingxianKong/octavia-stuff/blob/master/utils/octavia-upgrade-vms.py

Regards!
-----------------------------------
Lingxian Kong


On Fri, Jul 1, 2016 at 4:53 AM, Doug Wiegley
<doug...@parksidesoftware.com> wrote:
>
>> On Jun 30, 2016, at 7:15 AM, Ihar Hrachyshka <ihrac...@redhat.com> wrote:
>>
>>>
>>> On 30 Jun 2016, at 01:16, Brandon Logan <brandon.lo...@rackspace.com> wrote:
>>>
>>> Hi Ihar, thanks for starting this discussion.  Comments in-line.
>>>
>>> After writing my comments in line, I might now realize that you're just
>>> talking about documenting  a way for a user to do this, and not have
>>> Octavia handle it at all.  If that's the case I apologize for my reading
>>> comprehension, but I'll keep my comments in case I'm wrong.  My brain is
>>> not working well today, sorry :(
>>
>> Right. All the mechanisms needed to apply the approach are already in place 
>> in both Octavia and Neutron as of Mitaka. The question is mostly about 
>> whether the team behind the project may endorse the alternative approach in 
>> addition to whatever is in the implementation in regards to failovers by 
>> giving space to describe it in the official docs. I don’t suggest that the 
>> approach is the sole documented, or that octavia team need to implement 
>> anything. [That said, it may be wise to look at providing some smart scripts 
>> on top of neutron/octavia API that would realize the approach without 
>> putting the burden of multiple API calls onto users.]
>
> I don’t have a problem documenting it, but I also wouldn’t personally want to 
> recommend it.
>
> We’re adding a layer of NAT, which has performance and HA implications of its 
> own.
>
> We’re adding FIPs, when the neutron advice for “simple nova-net like 
> deployment” is provider nets and linuxbridge, which don’t support them.
>
> Thanks,
> doug
>
>
>>
>>>
>>> Thanks,
>>> Brandon
>>>
>>> On Wed, 2016-06-29 at 18:14 +0200, Ihar Hrachyshka wrote:
>>>> Hi all,
>>>>
>>>> I was looking lately at upgrades for octavia images. This includes using 
>>>> new images for new loadbalancers, as well as for existing balancers.
>>>>
>>>> For the first problem, the amp_image_tag option that I added in Mitaka 
>>>> seems to do the job: all new balancers are created with the latest image 
>>>> that is tagged properly.
>>>>
>>>> As for balancers that already exist, the only way to get them use a new 
>>>> image is to trigger an instance failure, that should rebuild failed nova 
>>>> instance, using the new image. AFAIU the failover process is not currently 
>>>> automated, requiring from the user to set the corresponding port to DOWN 
>>>> and waiting for failover to be detected. I’ve heard there are plans to 
>>>> introduce a specific command to trigger a quick-failover, that would 
>>>> streamline the process and reduce the time needed for the process because 
>>>> the failover would be immediately detected and processed instead of 
>>>> waiting for keepalived failure mode to occur. Is it on the horizon? 
>>>> Patches to review?
>>>
>>> Not that I know of and with all the work slated for Newton, I'm 99% sure
>>> it won't be done in Newton.  Perhaps Ocata.
>>
>> I see. Do we maybe want to provide a smart script that would help to trigger 
>> a failover with neutron API? [detect the port id, set it to DOWN, …]
>>
>>>>
>>>> While the approach seems rather promising and may be applicable for some 
>>>> environments, I have several concerns about the failover approach that we 
>>>> may want to address.
>>>>
>>>> 1. HA assumption. The approach assumes there is another node running 
>>>> available to serve requests while instance is rebuilding. For non-HA 
>>>> amphoras, it’s not the case, meaning the image upgrade process has a 
>>>> significant downtime.
>>>>
>>>> 2. Even if we have HA, for the time of instance rebuilding, the balancer 
>>>> cluster is degraded to a single node.
>>>>
>>>> 3. (minor) during the upgrade phase, instances that belong to the same HA 
>>>> amphora may run different versions of the image.
>>>>
>>>> What’s the alternative?
>>>>
>>>> One idea I was running with for some time is moving the upgrade complexity 
>>>> one level up. Instead of making Octavia aware of upgrade intricacies, 
>>>> allow it to do its job (load balance), while use neutron floating IP 
>>>> resource to flip a switch from an old image to a new one. Let me elaborate.
>>> I'm not sure I like the idea of tying this to floating IP as there are
>>> deployers who do not use floating IPs.  Then again, we are currently
>>> depending on allowed address pairs which is also an extension, but I
>>> suspect its probably deployed in more places.  I have no proof of this
>>> though.
>>
>> I guess you already deduced that, but just for the sake of completeness: no, 
>> I don’t suggest that octavia ties its backend to FIPs. I merely suggest to 
>> document the proposed approach as ‘yet another way of doing it’, at least 
>> until we tackle the first two concerns raised.
>>
>>>>
>>>> Let’s say we have a load balancer LB1 that is running Image1. In this 
>>>> scenario, we assume that access to LB1 VIP is proxied through a floating 
>>>> ip FIP that points to LB1 VIP. Now, the operator uploaded a new Image2 to 
>>>> glance registry and tagged it for octavia usage. The user now wants to 
>>>> migrate the load balancer function to using the new image. To achieve 
>>>> this, the user follows the steps:
>>>>
>>>> 1. create an independent clone of LB1 (let’s call it LB2) that has exact 
>>>> same attributes (members) as LB1.
>>>> 2. once LB2 is up and ready to process requests incoming to its VIP, 
>>>> redirect FIP to the LB2 VIP.
>>>> 3. now all new flows are immediately redirected to LB2 VIP, no downtime 
>>>> (for new flows) due to atomic nature of FIP update on the backend (we use 
>>>> iptables-save/iptables-restore to update FIP rules on the router).
>>> Will this sever any existing connections? Is there a way to drain
>>> connections? Or is that already done?
>>
>> Not sure. Hopefully conntrack entries still apply until you shutdown the 
>> node or close all current sessions. I don’t know of a way to detect if there 
>> are active sessions running. The safe fallback would be giving the load 
>> balancer enough time for any connections to die (a day?) before 
>> deprovisioning the old balancer.
>>
>>>> 4. since LB1 is no longer handling any flows, we can deprovision it. LB2 
>>>> is now the only balancer handling members.
>>>>
>>>> With that approach, 1) we provide for consistent downtime expectations 
>>>> irrelevant to amphora architecture chosen (HA or not); 2) we flip the 
>>>> switch when the clone is up and ready, so no degraded state for the 
>>>> balancer function; 3) all instances in an HA amphora run the same image.
>>>>
>>>> Of course, it won’t provide no downtime for existing flows that may 
>>>> already be handled by the balancer function. That’s a limitation that I 
>>>> believe is shared by all approaches currently at the table.
>>>>
>>>> As a side note, the approach would work for other lbaas drivers, like 
>>>> namespaces, f.e. in case we want to update haproxy.
>>>>
>>>> Several questions in regards to the topic:
>>>>
>>>> 1. are there any drawbacks with the approach? can we consider it an 
>>>> alternative way of doing image upgrades that could find its way into 
>>>> official documentation?
>>>
>>> Echoing my comment above of being tightly coupled with floating IPs is a
>>> draw back.
>>>
>>> Another way would be to make use of the allowed address pairs:
>>> 1) spin up a clone of the amp cluster for a loadbalancer but don't bring
>>> up the VIP IP Interface and don't start keepalived (or just prevent
>>> garping)
>>> 2) update the allowed address pairs for the clones to accept the vip IP
>>> 3) bring up VIP IP interface up and start keepalived (or do a garp)
>>> 4) stop keepalived on the old cluster, take the interface down
>>> 5) deprovision old cluster.
>>>
>>> I feel bad things can happen between 3 and 4 though.  This is just a
>>> thought to play around with, I'm sure I'm not realizing some minute
>>> details that may cause this to not work.  Plus, its a bit more involved
>>> that the FIP solution you proposed.
>>
>> I think there is benefit to discuss how to make upgrades more atomic. Pairs 
>> are indeed something to consider, that would allow us to proceed without 
>> introducing port replug in neutron.
>>
>> Anyway, that’s a lot more involving than either FIP or failover approach, 
>> and would take a lot of time to properly plan for it.
>>
>>>>
>>>> 2. if the answer is yes, then how can I contribute the piece? should I 
>>>> sync with some other doc related work that I know is currently ongoing in 
>>>> the team?
>>>>
>>>> Ihar
>>>> __________________________________________________________________________
>>>> OpenStack Development Mailing List (not for usage questions)
>>>> Unsubscribe: openstack-dev-requ...@lists.openstack.org?subject:unsubscribe
>>>> http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev
>>> __________________________________________________________________________
>>> OpenStack Development Mailing List (not for usage questions)
>>> Unsubscribe: openstack-dev-requ...@lists.openstack.org?subject:unsubscribe
>>> http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev
>>
>>
>> __________________________________________________________________________
>> OpenStack Development Mailing List (not for usage questions)
>> Unsubscribe: openstack-dev-requ...@lists.openstack.org?subject:unsubscribe
>> http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev
>
>
> __________________________________________________________________________
> OpenStack Development Mailing List (not for usage questions)
> Unsubscribe: openstack-dev-requ...@lists.openstack.org?subject:unsubscribe
> http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev

__________________________________________________________________________
OpenStack Development Mailing List (not for usage questions)
Unsubscribe: openstack-dev-requ...@lists.openstack.org?subject:unsubscribe
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev

Re: [openstack-dev] [octavia][upgrades] upgrade loadbalancer to new amphora image

Reply via email to