Maintaining a custom kernel is a big hassle, even if it is a few lines of code change. Can we do something in userspace? What about the software watchdog that is available? Along the lines of: http://goo.gl/oO3Lzr http://linux.die.net/man/8/watchdog
On 9/5/13 7:13 AM, "Funs Kessen" <fkes...@schubergphilis.com> wrote: > >> Well, you can't as far as I've looked in the source of panic.c. So I'm >>thinking of >> investigating of adding -1 as an option and seeing if I can push halt >>in, let's hope >> the guys that do kernel stuff find this useful too..... >> >So it seems the patch, I conjured up for panic.c, is seen as not so >useful, there >is however another way to achieve the same result. This would mean that >we >load a crash kernel with our own .sh script as init to do our bidding. > >Would that be a plan ? > >Cheers, > >Funs > >Sent from my iPhone > >On 4 sep. 2013, at 23:35, "Marcus Sorensen" <shadow...@gmail.com> wrote: > >> What would work as a quick fix for this sort of situation would be if >> the machine could be configured to power off rather than rebooting on >> oom. Then the HA system would restart the VM, applying all configs. >> >> Anyone know how to do that? :-) >> >> On Wed, Sep 4, 2013 at 1:14 PM, Darren Shepherd >> <darren.s.sheph...@gmail.com> wrote: >>> On 09/04/2013 11:37 AM, Roeland Kuipers wrote: >>>> >>>> Hi Darren, >>>> >>>> Thanks for your reply! Could you share a bit more on your plans/ideas? >>>> >>>> We also have been braining on other approaches of managing the >>>> systemvm's, especially small customizations for specific tenants. >>>> And maybe even leveraging a config mgmt tools like chef or puppet >>>> with the ability to integrate CS with that in some way. >>> >>> I'll have to send the full details later but here's a rough idea. >>> The basic approach is this. Logical changes to the VRs (or system >>> vms in general) get mapped to configuration items. So add a LB rule >>> maps to iptables config and haproxy config. When you change a LB >>> rule we then bump up the requested version of the configuration for >>> iptables/haproxy. So the requested version will be 4 maybe. The >>> applied version will be 3 as the VR still has the old configuration. >>> Since 4 != 3, the VR will be signaled to pull the latest >>> iptables/haproxy config. So it will pull the configuration. Say in >>> the mean time somebody else adds four other LB rules. So the >>> requested version is now at 8. So when the VR pulls the config it >>> will get version 8, and then reply back saying it applied version 8. >>> The applied version is now 8 which is greater than 4 (the version the >>> first LB rule change was waiting >>> for) so basically all async jobs waiting for the LB change will be >>>done. >>> >>> To pull the configuration from the VR, the VR will be hitting a >>> templating configuration system. So it pulls the full iptables and >>>haproxy config. >>> Not incremental changes. >>> >>> So if the VR ever reboots itself, it can easily just pull the latest >>> config of everything and apply it. So it will be consistent. >>> >>> I'd be interested to hear what type of customizations you would like >>>to add. >>> It will definitely be an extensible system, but the problem is if >>> your extensions wants to touch the same configuration files that ACS >>> wants to manage. That gets a bit tricky as its really easy for each >>> to break each other. But I can definitely add some hooks that users >>> can use to mess up things and "void the warranty." >>> >>> I've thought about chef and puppet for this, but basically it comes >>> down to two things. I'm really interested in this being fast and >>>light weight. >>> Ruby is neither of those. So the core ACS stuff will probably remain >>> as very simple shell scripts. Simple in that they really just need >>> to download configuration and restart services. They know nothing >>> about the nature of the changes. If, as an extension, you want to do >>> something with puppet, chef, I'd be open to that. That's your deal. >>> >>> This approach has many other benefits. Like, for example, we can >>> ensure that as we deploy a new ACS release existing system VMs can be >>> updated (without a reboot, unless the kernel changes). Additionally, >>> its fast and updates happen in near constant time. So most changes >>> will be just a couple of seconds, even if you have 4000 LB rules. >>> >>> Darren >>>