Reuben,

> My problem is that when I start kannel and send an sms via the HTTP
> service on my server everything is ok and I leave bearerbox and smsbox
> running. But when I check back on them, I always find something wrong.
> Sometimes smsbox kills itself and the HTTP service is no longer
> available.. Sometimes both processes kill themselves. Sometimes
> brearerbox looses connection with the SMSC and I have to restart
> bearerbox for it to try to login again. Is there a way how this can be
> avoided? If the connection is lost with the SMSC, can bearerbox
> reconnect to the SMSC automatically? If the processes are terminated
> unexpectedly, can I create a bash script that detects if they are
> running and if not, it launches them? I was going to killall bearerbox
> and killall smsbox every few hours and re launch them again but it is
> not very wise because kannel might still be down for hours until the
> next killall and relaunch commands are executed. I dont afford one
> second of downtime. And if kannel was performing ok, it would have been
> killed and relaunched for nothing. Can anyone help please in avoiding
> kannel to commit suicide? Or if it does, can we resurrect it immediately?

     Been there.  Done that.  Two things:  First of all, it would be
good to do a "grep -rF "PANIC" /your/kannel/log/dir/*" to find what's
causing them to crash -- it may be something that's been fixed or could
be fixed.  Secondly, use a monitoring daemon.  I built some init scripts
for Kannel (one for bearerbox, one for smsbox -- then since I'm using
Gentoo I made smsbox depend on {and thus automatically start} bearerbox,
and bearerbox will also stop smsbox before it stops itself accordingly).
 I personally use monit.  I have it watch the PID file of smsbox, and
have it call "/etc/init.d/bearerbox stop" to stop the service,
"/etc/init.d/smsbox start" to start it.  I did have to do some extra
work with scripts for monit because monit will call only the start
script if the process is not running, which won't work because the
damaged bearerbox really needs to be stopped first too.  Anyway, back to
the first statement.  I have had what *was* the latest Kannel break WAP
with some handsets, so I instead had to backport and add new patches to
my working version to keep it from PANIC'ing.  For whatever reason, if
the slightest little thing goes wrong and triggers an assert(), Kannel
freaks out and exits.  For me it was empty PDUs from WAP and then more
recently empty List's.  Neither of those actually cause catastrophic,
irrecoverable damage to Kannel if I wrap them up in quick "if ( x ==
NULL )" checks, so they obviously weren't that bad.  You're probably
seeing the same sort of non-fatal "oh crap, let's panic and exit" checks
biting you.  As stated, they may have already been fixed, or may be
really easy to wrap up in a quick check to keep them from taking down
your Kannel as well.

Jon

Reply via email to