Been thinking of this a little more. >From my experience with embedded programming, memory overcommit is not usually allowed (the RAM is sized appropriately to the expected workload). So, throwing this out there: should we set /proc/sys/vm/overcommit_memory = 2 so that the kernel does not allow overcommit? This will ensure that the user space tasks that cannot allocate more memory than available will die.
On 9/6/13 2:35 AM, "Funs Kessen" <fkes...@schubergphilis.com> wrote: >Hi Alex and Chiradeep, > >@Alex: Yes it would work, but also means that everybody would have to >implement this on a machine that runs syslog, and that it is not part of >CloudStack, while I think it would be wonderful to have the SystemVM, as >being an entity within CloudStack, combined with CloudStack to be >self-sustaining, and not depend on an external scripts that do API calls. >For the short term, yes it might be a viable solution, but in the long >term it would feel kind of hack-ish ? > >@Chiradeep: I agree, it was also not acceptable for some of the guys on a >linux kernel irc channel, and they had fair points, although I do believe >people should have the option to choose. They pointed me towards kcrash, >like I mentioned before. Yesterday I've tested kcrash and it works. It >means that a bit of the memory will be used to load a crash kernel and an >"adapted" init that does a poweroff at the moment the crash kernel is >loaded, it also means we can save the core and analyze why it crashed >before doing a power off if required. The watchdog functionality is >something I found too, but I didn't feel comfortable with it somehow, >I'll have a deeper look at it to see if it does the trick, so thanks for >bringing it up! > >Cheers, > >Funs > > >-----Original Message----- >From: Alex Huang [mailto:alex.hu...@citrix.com] >Sent: vrijdag 6 september 2013 2:05 >To: dev@cloudstack.apache.org; Marcus Sorensen >Cc: Roeland Kuipers; int-cloud >Subject: RE: [DISCUSS] OOM killer and Routing/System VM's = :( > >If I recall correctly, oom actually prints something into syslog so a >cron job that watches syslog and simply just shuts down the vm should >work. > >--Alex > >> -----Original Message----- >> From: Chiradeep Vittal [mailto:chiradeep.vit...@citrix.com] >> Sent: Thursday, September 5, 2013 12:48 PM >> To: dev@cloudstack.apache.org; Marcus Sorensen >> Cc: Roeland Kuipers; int-cloud >> Subject: Re: [DISCUSS] OOM killer and Routing/System VM's = :( >> >> Maintaining a custom kernel is a big hassle, even if it is a few lines >> of code change. >> Can we do something in userspace? What about the software watchdog >> that is available? >> Along the lines of: http://goo.gl/oO3Lzr >> http://linux.die.net/man/8/watchdog >> >> >> On 9/5/13 7:13 AM, "Funs Kessen" <fkes...@schubergphilis.com> wrote: >> >> > >> >> Well, you can't as far as I've looked in the source of panic.c. So >> >>I'm thinking of investigating of adding -1 as an option and seeing >> >>if I can push halt in, let's hope the guys that do kernel stuff >> >>find this useful too..... >> >> >> >So it seems the patch, I conjured up for panic.c, is seen as not so >> >useful, there is however another way to achieve the same result. This >> >would mean that we load a crash kernel with our own .sh script as >> >init to do our bidding. >> > >> >Would that be a plan ? >> > >> >Cheers, >> > >> >Funs >> > >> >Sent from my iPhone >> > >> >On 4 sep. 2013, at 23:35, "Marcus Sorensen" <shadow...@gmail.com> >> wrote: >> > >> >> What would work as a quick fix for this sort of situation would be >> >> if the machine could be configured to power off rather than >> >> rebooting on oom. Then the HA system would restart the VM, applying >>all configs. >> >> >> >> Anyone know how to do that? :-) >> >> >> >> On Wed, Sep 4, 2013 at 1:14 PM, Darren Shepherd >> >> <darren.s.sheph...@gmail.com> wrote: >> >>> On 09/04/2013 11:37 AM, Roeland Kuipers wrote: >> >>>> >> >>>> Hi Darren, >> >>>> >> >>>> Thanks for your reply! Could you share a bit more on your >>plans/ideas? >> >>>> >> >>>> We also have been braining on other approaches of managing the >> >>>> systemvm's, especially small customizations for specific tenants. >> >>>> And maybe even leveraging a config mgmt tools like chef or puppet >> >>>> with the ability to integrate CS with that in some way. >> >>> >> >>> I'll have to send the full details later but here's a rough idea. >> >>> The basic approach is this. Logical changes to the VRs (or system >> >>>vms in general) get mapped to configuration items. So add a LB >> >>>rule maps to iptables config and haproxy config. When you change a >> >>>LB rule we then bump up the requested version of the configuration >> >>>for iptables/haproxy. So the requested version will be 4 maybe. >> >>>The applied version will be 3 as the VR still has the old >>configuration. >> >>> Since 4 != 3, the VR will be signaled to pull the latest >> >>>iptables/haproxy config. So it will pull the configuration. Say >> >>>in the mean time somebody else adds four other LB rules. So the >> >>>requested version is now at 8. So when the VR pulls the config it >> >>>will get version 8, and then reply back saying it applied version 8. >> >>> The applied version is now 8 which is greater than 4 (the version >> >>>the first LB rule change was waiting >> >>> for) so basically all async jobs waiting for the LB change will be >> >>>done. >> >>> >> >>> To pull the configuration from the VR, the VR will be hitting a >> >>>templating configuration system. So it pulls the full iptables and >> >>>haproxy config. >> >>> Not incremental changes. >> >>> >> >>> So if the VR ever reboots itself, it can easily just pull the >> >>> latest config of everything and apply it. So it will be consistent. >> >>> >> >>> I'd be interested to hear what type of customizations you would >> >>>like to add. >> >>> It will definitely be an extensible system, but the problem is if >> >>>your extensions wants to touch the same configuration files that >> >>>ACS wants to manage. That gets a bit tricky as its really easy for >> >>>each to break each other. But I can definitely add some hooks that >> >>>users can use to mess up things and "void the warranty." >> >>> >> >>> I've thought about chef and puppet for this, but basically it >> >>>comes down to two things. I'm really interested in this being fast >> >>>and light weight. >> >>> Ruby is neither of those. So the core ACS stuff will probably >> >>>remain as very simple shell scripts. Simple in that they really >> >>>just need to download configuration and restart services. They >> >>>know nothing about the nature of the changes. If, as an >> >>>extension, you want to do something with puppet, chef, I'd be open >> >>>to that. That's your >> deal. >> >>> >> >>> This approach has many other benefits. Like, for example, we can >> >>> ensure that as we deploy a new ACS release existing system VMs can >> >>> be updated (without a reboot, unless the kernel changes). >> >>> Additionally, its fast and updates happen in near constant time. >> >>> So most changes will be just a couple of seconds, even if you have >> >>> 4000 LB >> rules. >> >>> >> >>> Darren >> >>> >