Very interesting this MONIT thing... is it possible to see how your script starts kannel.. my init.d script does not create a PID file in /var/run for smsbox or sqlbox or... any box. How do you manage that??
Thank you! Alejandro RamÃrez ----- Original Message ----- From: Jonathan Houser <[EMAIL PROTECTED]> To: [EMAIL PROTECTED] Cc: [email protected] Sent: Fri, 2 Sep 2005 07:44:47 -0600 Subject: Re: Kannel Suicide > > Reuben, > > > My problem is that when I start kannel and send an sms via the HTTP > > service on my server everything is ok and I leave bearerbox and smsbox > > running. But when I check back on them, I always find something wrong. > > Sometimes smsbox kills itself and the HTTP service is no longer > > available.. Sometimes both processes kill themselves. Sometimes > > brearerbox looses connection with the SMSC and I have to restart > > bearerbox for it to try to login again. Is there a way how this can be > > avoided? If the connection is lost with the SMSC, can bearerbox > > reconnect to the SMSC automatically? If the processes are terminated > > unexpectedly, can I create a bash script that detects if they are > > running and if not, it launches them? I was going to killall bearerbox > > and killall smsbox every few hours and re launch them again but it is > > not very wise because kannel might still be down for hours until the > > next killall and relaunch commands are executed. I dont afford one > > second of downtime. And if kannel was performing ok, it would have been > > killed and relaunched for nothing. Can anyone help please in avoiding > > kannel to commit suicide? Or if it does, can we resurrect it immediately? > > Been there. Done that. Two things: First of all, it would be > good to do a "grep -rF "PANIC" /your/kannel/log/dir/*" to find what's > causing them to crash -- it may be something that's been fixed or could > be fixed. Secondly, use a monitoring daemon. I built some init scripts > for Kannel (one for bearerbox, one for smsbox -- then since I'm using > Gentoo I made smsbox depend on {and thus automatically start} bearerbox, > and bearerbox will also stop smsbox before it stops itself accordingly). > I personally use monit. I have it watch the PID file of smsbox, and > have it call "/etc/init.d/bearerbox stop" to stop the service, > "/etc/init.d/smsbox start" to start it. I did have to do some extra > work with scripts for monit because monit will call only the start > script if the process is not running, which won't work because the > damaged bearerbox really needs to be stopped first too. Anyway, back to > the first statement. I have had what *was* the latest Kannel break WAP > with some handsets, so I instead had to backport and add new patches to > my working version to keep it from PANIC'ing. For whatever reason, if > the slightest little thing goes wrong and triggers an assert(), Kannel > freaks out and exits. For me it was empty PDUs from WAP and then more > recently empty List's. Neither of those actually cause catastrophic, > irrecoverable damage to Kannel if I wrap them up in quick "if ( x == > NULL )" checks, so they obviously weren't that bad. You're probably > seeing the same sort of non-fatal "oh crap, let's panic and exit" checks > biting you. As stated, they may have already been fixed, or may be > really easy to wrap up in a quick check to keep them from taking down > your Kannel as well. > > Jon > >
