Re: [DISCUSS] OOM killer and Routing/System VM's = :(

Chiradeep Vittal Thu, 05 Sep 2013 14:35:37 -0700

Maintaining a custom kernel is a big hassle, even if it is a few lines of
code change. 
Can we do something in userspace? What about the software watchdog that is
available?
Along the lines of: http://goo.gl/oO3Lzr
http://linux.die.net/man/8/watchdog



On 9/5/13 7:13 AM, "Funs Kessen" <fkes...@schubergphilis.com> wrote:

>
>> Well, you can't as far as I've looked in the source of panic.c. So I'm
>>thinking of 
>> investigating of adding -1 as an option and seeing if I can push halt
>>in, let's hope 
>> the guys that do kernel stuff find this useful too.....
>>
>So it seems the patch, I conjured up for panic.c,  is seen as not so
>useful, there 
>is however another way to achieve the same result. This would mean that
>we 
>load a crash kernel with our own .sh script as init to do our bidding.
>
>Would that be a plan ?
>
>Cheers,
>
>Funs
>
>Sent from my iPhone
>
>On 4 sep. 2013, at 23:35, "Marcus Sorensen" <shadow...@gmail.com> wrote:
>
>> What would work as a quick fix for this sort of situation would be if
>> the machine could be configured to power off rather than rebooting on
>> oom. Then the HA system would restart the VM, applying all configs.
>> 
>> Anyone know how to do that? :-)
>> 
>> On Wed, Sep 4, 2013 at 1:14 PM, Darren Shepherd
>> <darren.s.sheph...@gmail.com> wrote:
>>> On 09/04/2013 11:37 AM, Roeland Kuipers wrote:
>>>> 
>>>> Hi Darren,
>>>> 
>>>> Thanks for your reply! Could you share a bit more on your plans/ideas?
>>>> 
>>>> We also have been braining on other approaches of managing the
>>>> systemvm's, especially small customizations for specific tenants.
>>>> And maybe even leveraging a config mgmt tools like chef or puppet
>>>> with the ability to integrate CS with that in some way.
>>> 
>>> I'll have to send the full details later but here's a rough idea.
>>> The basic approach is this.  Logical changes to the VRs (or system
>>> vms in general) get mapped to configuration items.  So add a LB rule
>>> maps to iptables config and haproxy config.  When you change a LB
>>> rule we then bump up the requested version of the configuration for
>>> iptables/haproxy.  So the requested version will be 4 maybe.  The
>>> applied version will be 3 as the VR still has the old configuration.
>>> Since 4 != 3, the VR will be signaled to pull the latest
>>> iptables/haproxy config.  So it will pull the configuration.  Say in
>>> the mean time somebody else adds four other LB rules.  So the
>>> requested version is now at 8.  So when the VR pulls the config it
>>> will get version 8, and then reply back saying it applied version 8.
>>> The applied version is now 8 which is greater than 4 (the version the
>>> first LB rule change was waiting
>>> for) so basically all async jobs waiting for the LB change will be
>>>done.
>>> 
>>> To pull the configuration from the VR, the VR will be hitting a
>>> templating configuration system.  So it pulls the full iptables and
>>>haproxy config.
>>> Not incremental changes.
>>> 
>>> So if the VR ever reboots itself, it can easily just pull the latest
>>> config of everything and apply it.  So it will be consistent.
>>> 
>>> I'd be interested to hear what type of customizations you would like
>>>to add.
>>> It will definitely be an extensible system, but the problem is if
>>> your extensions wants to touch the same configuration files that ACS
>>> wants to manage.  That gets a bit tricky as its really easy for each
>>> to break each other.  But I can definitely add some hooks that users
>>> can use to mess up things and "void the warranty."
>>> 
>>> I've thought about chef and puppet for this, but basically it comes
>>> down to two things.  I'm really interested in this being fast and
>>>light weight.
>>> Ruby is neither of those.  So the core ACS stuff will probably remain
>>> as very simple shell scripts.  Simple in that they really just need
>>> to download configuration and restart services.  They know nothing
>>> about the nature of the changes.  If, as an extension, you want to do
>>> something with puppet, chef, I'd be open to that.  That's your deal.
>>> 
>>> This approach has many other benefits.  Like, for example, we can
>>> ensure that as we deploy a new ACS release existing system VMs can be
>>> updated (without a reboot, unless the kernel changes).  Additionally,
>>> its fast and updates happen in near constant time.  So most changes
>>> will be just a couple of seconds, even if you have 4000 LB rules.
>>> 
>>> Darren
>>>

Re: [DISCUSS] OOM killer and Routing/System VM's = :(

Reply via email to