Why you don't put this cron jobs to run say every 1 hour, so it'll not to took your months for debugging ?
Valery. --- On Fri, 5/9/08, Shlomo Solomon <[EMAIL PROTECTED]> wrote: > From: Shlomo Solomon <[EMAIL PROTECTED]> > Subject: crash with no log entry > To: Linux-IL@cs.huji.ac.il > Date: Friday, May 9, 2008, 3:25 PM > I've been having what "seemed" to be random > crashes that left nothing in the > logs, until I noticed that they always happen just after > 2:02 (while my daily > cron jobs are running) - so they're not random after > all. Here are the last 3 > crashes - from 10/4, 6/5 and 9/5. You can see that there > are no log entries > after 2:02, until I do a hard re-boot: > > ----- 1 ------ > Apr 10 01:58:01 shlomo1 crond[9786]: (root) CMD > (/data1/myscripts/myADSLtest) > Apr 10 02:00:01 shlomo1 crond[9811]: (root) CMD > (/data1/myscripts/myADSLtest) > Apr 10 02:00:01 shlomo1 crond[9812]: (root) CMD > (/data1/myscripts/myAlive) > Apr 10 02:01:01 shlomo1 crond[9830]: (root) CMD (nice -n 19 > > run-parts /etc/cron.hourly) > Apr 10 02:02:01 shlomo1 crond[9845]: (root) CMD > (/data1/myscripts/myADSLtest) > Apr 10 02:02:01 shlomo1 crond[9846]: (root) CMD (nice -n 19 > time > run-parts /etc/cron.daily) > Apr 10 02:02:02 shlomo1 anacron[9856]: Updated timestamp > for job `cron.daily' > to 2008-04-10 > Apr 10 02:02:02 shlomo1 /etc/cron.daily/awffull[9859]: the > /tmp/awffull.lock > file was found indicating an error. Maybe awffull is still > running... > Apr 10 02:02:03 shlomo1 logrotate: ALERT exited abnormally > with [1] > Apr 10 05:38:51 shlomo1 syslogd 1.4.2: restart. > Apr 10 05:38:51 shlomo1 kernel: klogd 1.4.2, log source = > /proc/kmsg started. > Apr 10 05:38:51 shlomo1 kernel: Linux version > 2.6.22.12-desktop586-1mdv > ([EMAIL PROTECTED]) (gcc version 4.2.2 20070909 > (prerelease) > (4.2.2-0.RC.1mdv2008.0)) #1 SMP Tue Nov 20 08:09:17 EST > 2007 > > > ----- 2 ------ > May 6 01:58:01 shlomo1 crond[21897]: (root) CMD > (/data1/myscripts/myADSLtest) > May 6 02:00:01 shlomo1 crond[21916]: (root) CMD > (/data1/myscripts/myAlive) > May 6 02:00:01 shlomo1 crond[21917]: (root) CMD > (/data1/myscripts/myADSLtest) > May 6 02:01:01 shlomo1 crond[21937]: (root) CMD (nice -n > 19 > run-parts /etc/cron.hourly) > May 6 02:02:01 shlomo1 crond[21951]: (root) CMD > (/data1/myscripts/myADSLtest) > May 6 02:02:01 shlomo1 crond[21952]: (root) CMD (nice -n > 19 time > run-parts /etc/cron.daily) > May 6 02:02:02 shlomo1 anacron[21962]: Updated timestamp > for job `cron.daily' > to 2008-05-06 > May 6 02:02:02 shlomo1 /etc/cron.daily/awffull[21965]: the > /tmp/awffull.lock > file was found indicating an error. Maybe awffull is still > running... > May 6 02:02:03 shlomo1 logrotate: ALERT exited abnormally > with [1] > May 6 04:47:50 shlomo1 syslogd 1.4.2: restart. > May 6 04:47:50 shlomo1 kernel: klogd 1.4.2, log source = > /proc/kmsg started. > May 6 04:47:50 shlomo1 kernel: Linux version > 2.6.22.12-desktop586-1mdv > ([EMAIL PROTECTED]) (gcc version 4.2.2 20070909 > (prerelease) > (4.2.2-0.RC.1mdv2008.0)) #1 SMP Tue Nov 20 08:09:17 EST > 2007 > > > ----- 3 ------ > May 9 01:58:01 shlomo1 crond[27692]: (root) CMD > (/data1/myscripts/myADSLtest) > May 9 02:00:01 shlomo1 crond[27708]: (root) CMD > (/data1/myscripts/myAlive) > May 9 02:00:01 shlomo1 crond[27709]: (root) CMD > (/data1/myscripts/myADSLtest) > May 9 02:01:01 shlomo1 crond[27726]: (root) CMD (nice -n > 19 > run-parts /etc/cron.hourly) > May 9 02:02:01 shlomo1 crond[27741]: (root) CMD > (/data1/myscripts/myADSLtest) > May 9 02:02:01 shlomo1 crond[27742]: (root) CMD (nice -n > 19 time > run-parts /etc/cron.daily) > May 9 02:02:01 shlomo1 anacron[27752]: Updated timestamp > for job `cron.daily' > to 2008-05-09 > May 9 02:02:01 shlomo1 /etc/cron.daily/awffull[27755]: the > /tmp/awffull.lock > file was found indicating an error. Maybe awffull is still > running... > May 9 02:02:02 shlomo1 logrotate: ALERT exited abnormally > with [1] > May 9 05:36:05 shlomo1 syslogd 1.4.2: restart. > May 9 05:36:05 shlomo1 kernel: klogd 1.4.2, log source = > /proc/kmsg started. > May 9 05:36:05 shlomo1 kernel: Linux version > 2.6.22.12-desktop586-1mdv > ([EMAIL PROTECTED]) (gcc version 4.2.2 20070909 > (prerelease) > (4.2.2-0.RC.1mdv2008.0)) #1 SMP Tue Nov 20 08:09:17 EST > 2007 > > > > The common factor "seems" to be a problem with > logrotate, but that's not the > cause. Here's an example of logrotate aborting and NOT > causing a crash. In > fact, it seems logrotate gives that error every day. The > "strange" thing is > that all the logs seem to be properly rotated, despite the > error message. > > > > May 7 01:58:01 shlomo1 crond[2870]: (root) CMD > (/data1/myscripts/myADSLtest) > May 7 02:00:01 shlomo1 crond[2888]: (root) CMD > (/data1/myscripts/myAlive) > May 7 02:00:01 shlomo1 crond[2889]: (root) CMD > (/data1/myscripts/myADSLtest) > May 7 02:01:01 shlomo1 crond[2906]: (root) CMD (nice -n 19 > > run-parts /etc/cron.hourly) > May 7 02:02:01 shlomo1 crond[2920]: (root) CMD > (/data1/myscripts/myADSLtest) > May 7 02:02:01 shlomo1 crond[2921]: (root) CMD (nice -n 19 > time > run-parts /etc/cron.daily) > May 7 02:02:01 shlomo1 anacron[2931]: Updated timestamp > for job `cron.daily' > to 2008-05-07 > May 7 02:02:01 shlomo1 /etc/cron.daily/awffull[2934]: the > /tmp/awffull.lock > file was found indicating an error. Maybe awffull is still > running... > May 7 02:02:02 shlomo1 logrotate: ALERT exited abnormally > with [1] > May 7 02:04:01 shlomo1 crond[3112]: (root) CMD > (/data1/myscripts/myADSLtest) > May 7 02:06:01 shlomo1 crond[3138]: (root) CMD > (/data1/myscripts/myADSLtest) > May 7 02:08:02 shlomo1 crond[3153]: (root) CMD > (/data1/myscripts/myADSLtest) > May 7 02:09:02 shlomo1 crond[3164]: (root) CMD ([ -d > /var/lib/php ] && > find /var/lib/php/ -type f -mmin > +$(/usr/lib/php/maxlifetime) -print0 | > xargs -r -0 rm) > > > > So, how do I find out what's causing the crash? My > guess is that it's one of > the daily cron jobs, but how can I find out which? Since > the crashes happen > at irregular intervals (sometimes 3 or 4 weeks apart and > sometimes 2 days > apart), it's not a simple matter of disabling some of > the jobs to see if that > solves the problem. That approach could take months. > > BTW, here's a list f the daily cron jobs. My guess is > that the problem is a > job running after logrotate, so that leaves 8 > possibilities. > > > [EMAIL PROTECTED] cron.daily]$ ls -l > total 56 > -rwxr-xr-x 1 root root 276 2007-08-17 02:56 0anacron* > -rwxr-xr-x 1 root root 2575 2007-09-01 13:56 awffull* > -rwxr-xr-x 1 root root 396 2007-11-16 23:00 getskyepg* > -rwxr-xr-x 1 root root 400 2007-08-28 21:44 hylafax* > -rwxr-xr-x 1 root root 37 2007-01-28 19:59 logcheck* > -rwxr-xr-x 1 root root 180 2007-07-19 23:57 logrotate* > -rwxr-xr-x 1 root root 410 2007-08-31 01:48 > makewhatis.cron* > -rwxr-xr-x 1 root root 137 2007-09-24 17:26 mlocate.cron* > lrwxrwxrwx 1 root root 27 2008-01-02 05:56 > msec -> /usr/share/msec/security.sh* > -rwxr-xr-x 1 root root 431 2006-02-05 22:56 > my-aa-findlargefiles* > lrwxrwxrwx 1 root root 26 2008-01-02 20:16 > myRPMlist -> /data1/myscripts/myRPMlist* > -rwxr-xr-x 1 root root 167 2005-01-10 12:51 reoback* > -rwxr-xr-x 1 root root 118 2007-10-02 12:09 rpm* > -rwxr-xr-x 1 root root 101 2007-11-20 19:55 tetex.cron* > -rwxr-xr-x 1 root root 371 2007-08-08 18:35 tmpwatch* > -rwxr-xr-x 1 root root 315 2007-09-05 13:24 > tripwire-check* > > > Can anyone can suggest how to debug this problem? I did > think of one idea and > I'd like comments or suggestions. I could add several > cron jobs to run after > each of the "real" jobs (or add a line to each > existing job) to send myself > an e-mail to know what jobs have run, in order to see when > the e-mails stop > coming. However, I'm not sure if there are overlaps in > the running of cron > jobs - for example, if it possible that job number 2 starts > before job number > 1 has ended? If so, hte my idea probably wouldn't work. > > > -- > Shlomo Solomon > http://the-solomons.net > Sent by KMail (KDE 3.5.7) on LINUX Mandriva 2008.0 > > > ================================================================= > To unsubscribe, send mail to [EMAIL PROTECTED] > with > the word "unsubscribe" in the message body, e.g., > run the command > echo unsubscribe | mail [EMAIL PROTECTED] ____________________________________________________________________________________ Be a better friend, newshound, and know-it-all with Yahoo! Mobile. Try it now. http://mobile.yahoo.com/;_ylt=Ahu06i62sR8HDtDypao8Wcj9tAcJ ================================================================= To unsubscribe, send mail to [EMAIL PROTECTED] with the word "unsubscribe" in the message body, e.g., run the command echo unsubscribe | mail [EMAIL PROTECTED]