I just maybe apparently (inch’Allah !) about (I am definitely unsure...)
got rid of ("solved" would probably be too strong a word for that
haphazard process) a very vexing problem that kept me wondering for
weeks. I post here just in case someone recognizes this problem as
something familiar...

On an otherwise very healthy Debian Linux host I saw idle /USR/SBIN/CRON
processes begin to accumulate by the hundreds at a rate of a few every
few minutes and after some time inducing significant load although some
of them eventually died. Killing them was only a temporary remedy as
they kept reappearing. I could not link their appearance to specific
cron jobs nor could I link them to a specific command. And hours of
sifting through forums and mailing lists yielded nothing conclusive :
/USR/SBIN/CRON processes not terminating were not unheard of but their
causes seemed to be varied and most often quite mysterious.

Liberal use of strace with various combinations of ‘-p’ ‘-f’ ‘-F’ and
‘-ff’ binding to the running cron daemon process and following vforks
showed that the undead processes were left listening on an open
connection. I also observed that the /USR/SBIN/CRON spawning was
inhibited by an attached strace - in presence of strace the children did
receive their missing SIGSTOP. And sometimes days went by with no
manifestation of the dreaded processes - but as soon as I thought the
problem was solved they began to reappear…

Anyway, finding that the undead processes were left listening on an open
connection was the smelly trail I was looking for. ‘netstat -p | grep
tcp | grep CRON’ soon showed me that each one of them had an open
connexion to the local LDAP server. Then ‘lsof | grep cron | grep ldap’
hinted that it was not the cron process itself that was directly
connecting to the LDAP server but an underlying library involved in our
PAM LDAP user management system.

Armed with those new results I went hunting for some wild data and found
a discussion between Robert Rakowicz and Jerome Reinert about a somewhat
similar problem
(http://lists.debian.org/debian-user-german/2005/10/msg00989.html). But
the maintenance operations Jerome Reinert suggested on slapd’s Berkeley
DB database did not solve the problem.

For now I have read another post mentioning that versions mismatches and
assorted maintainance issues in slapd’s Berkeley DB database can cause a
similar problem. I can’t find the adress anymore but if I do I’ll post
it here. We found that a simple slapd restart got us rid of the undead
/USR/SBIN/CRON. It has been a few days and I have not seen one again… We
keep our fingers crossed - maybe an upgrade silently fixed the problem… 

I also posted that on my blog at
http://serendipity.ruwenzori.net/index.php/2006/05/23/attack-of-the-undead-usrsbincron

Attachment: signature.asc
Description: Digital signature

Reply via email to