Dima, Kevin, There are PreStop hooks that can be used to gracefully bring down stuff running in containers: http://kubernetes.io/docs/user-guide/container-environment/
-- Dims On Thu, Jul 28, 2016 at 8:22 AM, Dmitry Mescheryakov <[email protected]> wrote: > > 2016-07-26 21:20 GMT+03:00 Fox, Kevin M <[email protected]>: >> >> It only relates to Kubernetes in that Kubernetes can do automatic rolling >> upgrades by destroying/replacing a service. If the services don't clean up >> after themselves, then performing a rolling upgrade will break things. >> >> So, what do you think is the best approach to ensuring all the services >> shut things down properly? Seems like its a cross project issue? Should a >> spec be submitted? > > > I think that it would be fair if Kubernates sends a sigterm to OpenStack > service in a container, then wait for the service to shut down and only then > destroy the container. > > It might be not very important for our case though, if we agree to split > expiration time for fanout and reply queues. And I don't know of any other > case where an OpenStack service needs to clean up on shutdown in some > external place. > > Thanks, > > Dmitry > >> >> Thanks, >> Kevin >> ________________________________ >> From: Dmitry Mescheryakov [[email protected]] >> Sent: Tuesday, July 26, 2016 11:01 AM >> To: Fox, Kevin M >> Cc: Sam Morrison; OpenStack Operators >> >> Subject: Re: [Openstack-operators] [oslo] RabbitMQ queue TTL issues moving >> to Liberty >> >> >> >> 2016-07-25 18:47 GMT+03:00 Fox, Kevin M <[email protected]>: >>> >>> Ah. Interesting. >>> >>> The graceful shutdown would really help the Kubernetes situation too. >>> Kubernetes can do easy rolling upgrades and having the processes being able >>> to clean up after themselves as they are upgraded is important. Is this >>> something that needs to go into oslo.messaging or does it have to be added >>> to all projects using it? >> >> >> It both needs to be fixed on oslo.messaging side (delete fanout queue on >> RPC server stop, which is done by Kirill's CR) and on side of projects using >> it, as they need to actually stop RPC server before shutting down. As I >> wrote earlier, among Neutron processes right now only openvswitch and >> metadata agents do not stop RPC server. >> >> I am not sure how that relates to Kubernates, as I not much familiar with >> it. >> >> Thanks, >> >> Dmitry >> >>> >>> >>> Thanks, >>> Kevin >>> ________________________________ >>> From: Dmitry Mescheryakov [[email protected]] >>> Sent: Monday, July 25, 2016 3:47 AM >>> To: Sam Morrison >>> Cc: OpenStack Operators >>> Subject: Re: [Openstack-operators] [oslo] RabbitMQ queue TTL issues >>> moving to Liberty >>> >>> Sam, >>> >>> For your case I would suggest to lower rabbit_transient_queues_ttl until >>> you are comfortable with volume of messages which comes during that time. >>> Setting the parameter to 1 will essentially replicate bahaviour of >>> auto_delete queues. But I would suggest not to set it that low, as otherwise >>> your OpenStack will suffer from the original bug. Probably a value like 20 >>> seconds should work in most cases. >>> >>> I think that there is a space for improvement here - we can delete reply >>> and fanout queues on graceful shutdown. But I am not sure if it will be easy >>> to implement, as it requires services (Nova, Neutron, etc.) to stop RPC >>> server on sigint and I don't know if they do it right now. >>> >>> I don't think we can make case with sigkill any better. Other than that, >>> the issue could be investigated on Neutron side, maybe number of messages >>> could be reduced there. >>> >>> Thanks, >>> >>> Dmitry >>> >>> 2016-07-25 9:27 GMT+03:00 Sam Morrison <[email protected]>: >>>> >>>> We recently upgraded to Liberty and have come across some issues with >>>> queue build ups. >>>> >>>> This is due to changes in rabbit to set queue expiries as opposed to >>>> queue auto delete. >>>> See https://bugs.launchpad.net/oslo.messaging/+bug/1515278 for more >>>> information. >>>> >>>> The fix for this bug is in liberty and it does fix an issue however it >>>> causes another one. >>>> >>>> Every time you restart something that has a fanout queue. Eg. >>>> cinder-scheduler or the neutron agents you will have >>>> a queue in rabbit that is still bound to the rabbitmq exchange (and so >>>> still getting messages in) but no consumers. >>>> >>>> These messages in these queues are basically rubbish and don’t need to >>>> exist. Rabbit will delete these queues after 10 mins (although the default >>>> in master is now changed to 30 mins) >>>> >>>> During this time the queue will grow and grow with messages. This sets >>>> off our nagios alerts and our ops guys have to deal with something that >>>> isn’t really an issue. They basically delete the queue. >>>> >>>> A bad scenario is when you make a change to your cloud that means all >>>> your 1000 neutron agents are restarted, this causes a couple of dead queues >>>> per agent to hang around. (port updates and security group updates) We get >>>> around 25 messages / second on these queues and so you can see after 10 >>>> minutes we have a ton of messages in these queues. >>>> >>>> 1000 x 2 x 25 x 600 = 30,000,000 messages in 10 minutes to be precise. >>>> >>>> Has anyone else been suffering with this before a raise a bug? >>>> >>>> Cheers, >>>> Sam >>>> >>>> >>>> _______________________________________________ >>>> OpenStack-operators mailing list >>>> [email protected] >>>> http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-operators >>> >>> >> > > > _______________________________________________ > OpenStack-operators mailing list > [email protected] > http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-operators > -- Davanum Srinivas :: https://twitter.com/dims _______________________________________________ OpenStack-operators mailing list [email protected] http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-operators
