Thx for putting this together, Jayapal. A few comments: I'd really like to have a config flag to specify if things should be restarted automatically or not. Worst case, track the restarts - if a service is restarted more than X times in Y seconds, something's obviously wrong so stop tail-chasing[1]. Personally I'm much more interested in knowing there's a problem and then taking whatever happens to be the appropriate actions for our situation.
Regarding communicating with a monitoring system - what makes more sense to me is setting up a solid framework that provides folks flexibility to use various monitoring tools, from sending an email to contacting pager duty or whatever. So, to me there's 3 parts to that: 1) At VR creation, ACS calls defined hook-script which knows how to contact monitoring system to tell it about system to monitor 2) At boot, VR sends API query to which the mgmt server responds with a URL for an install script - VR runs that to download/setup appropriate monitoring agent 3) VR has standardized scripts for agent to call to find out what should be running, and then agent can go check for itself. With a setup like this, you can support SNMP, Opsview/Nagios, Monit, NSA, Zenoss, HPOV, Tivoli, etc etc etc. I'll happily write the Opsview/Nagios module (I'm thinking module is hosted outside ACS, but I guess it could be a plugin - see earlier licensing points). Thoughts? Just my 2c. Happy to tweak wiki if folks lean towards this. John 1: Aside - this applies to SSVM creation currently - that hamster[2] keeps trying to spin that create SSVM wheel.. 2: Apache CloudHamster, CloudMonkey's furry monitoring friend? On Nov 6, 2013, at 7:58 AM, Jayapal Reddy Uradi <jayapalreddy.ur...@citrix.com> wrote: > Please find below update FS > https://cwiki.apache.org/confluence/display/CLOUDSTACK/Monitoring+VR+services > > Thanks, > Jayapal > > On 05-Oct-2013, at 6:54 PM, Santhosh Edukulla <santhosh.eduku...@citrix.com> > wrote: > >> A shell script can be used. Few thoughts below: >> >> 1. Collect the process id of all daemons you wanted to monitor using "pidof" >> of command and then use "kill" command to check if the pid you got is valid. >> Using kill we can send a signal 0, then check the status using echo $? . For >> sending a notification use linux syslog call ( man 3 syslogd) or "logger" >> command to send to syslog. If wanted to send email then you may also have to >> look for firewall not allowing outbound smtp port communiation. Even for >> snmp this holds same( i mean if any blocking through firewall rules ). >> Using syslog may be good as it by default exposes various debug log levels >> through its api call. >> >> Now, to keep the monitor script up always up and runninig. Keep the monitor >> script run continuosly through cron or at at regular\scheduled intervals. >> This way even if monitor script goes down, the next xth interval, it is up >> again. >> >> With this there is a catch though, we may got multiple pids for a given >> daemon provided if there are multiple daemons spawned by same\multiple >> applications, if this scenario is not common then its ok, otherwise we may >> have to track it differently maintaining state of each spawned daemon and >> see if it exists. If multiple applications launch the same daemon, you may >> also wanted to say its application which got killed. EX: A launched httpd, >> and during its exit logic, it is killing all daemons it launched, then you >> may wanted to add A is not available, rather than just http is not >> available. >> >> >> 2. Using netstat command : Check for available, listening and active ports >> on local host, provided all the daemons you wanted to monitor are running on >> "standard" ports or if we know the listening ports of those deamons to be >> monitored. Again, this script can be added through cron\at to be scheduled >> to run x units, if it gets killed the next x units after the monitor script >> is up again. >> >> Also, there could be many other approaches as well. >> >> >> Thanks! >> Santhosh >> ________________________________________ >> From: Jayapal Reddy Uradi [jayapalreddy.ur...@citrix.com] >> Sent: Saturday, October 05, 2013 5:17 AM >> To: <dev@cloudstack.apache.org> >> Cc: <us...@cloudstack.apache.org> >> Subject: Re: [PROPOSAL] Service monitoring tool in virtual router >> >> Hi, >> >> +users list >> If any one is already using any tools for monitoring then please share your >> ideas. >> Also share the cases where you experienced service crashes. >> >> Thanks, >> Jayapal >> >> On 05-Oct-2013, at 4:12 AM, Chiradeep Vittal <chiradeep.vit...@citrix.com> >> wrote: >> >>> Well just make sure that your script is resilient to its own crashes as >>> well. >>> >>> On 10/4/13 1:59 AM, "Jayapal Reddy Uradi" <jayapalreddy.ur...@citrix.com> >>> wrote: >>> >>>> Hi, >>>> >>>> I am planning to write script utility to monitor processes and restart on >>>> the event of failure. It will also logs the events. >>>> >>>> Thanks, >>>> Jayapal >>>> >>>> On 02-Oct-2013, at 3:25 AM, Simon Weller <swel...@ena.com> wrote: >>>> >>>>> supervisord maybe? >>>>> >>>>> ----- Original Message ----- >>>>> >>>>> From: "Chiradeep Vittal" <chiradeep.vit...@citrix.com> >>>>> To: dev@cloudstack.apache.org >>>>> Sent: Tuesday, October 1, 2013 4:45:56 PM >>>>> Subject: Re: [PROPOSAL] Service monitoring tool in virtual router >>>>> >>>>> Got it. Any other OSS tool out there similar to monit? >>>>> >>>>> On 10/1/13 8:24 AM, "David Nalley" <da...@gnsa.us> wrote: >>>>> >>>>>> On Thu, Sep 26, 2013 at 1:27 AM, Chiradeep Vittal >>>>>> <chiradeep.vit...@citrix.com> wrote: >>>>>>> SNMP wouldn't restart a failed process nor would it generate alerts. >>>>>>> It >>>>>>> is >>>>>>> simply too generic for the requirements outlined here. The proposal >>>>>>> does >>>>>>> not talk about modifying monit, just using it. That wouldn't trigger >>>>>>> the >>>>>>> AGPL. >>>>>> >>>>>> Let me restate my objection to anything AGPL. >>>>>> People are largely comfortable with GPLv2 software - Linux is >>>>>> ubiquitous. Many legal departments routinely prohibit GPLv3 software >>>>>> (we actually saw this when CS was GPLv3 licensed.) But the Affero GPL >>>>>> license is anathema in many corporate environments, and by forcing it >>>>>> on folks in the default System VM I fear it will hurt adoption of >>>>>> CloudStack. >>>>>> >>>>>> --David >>>>> >>>>> >>>> >>> >> >