Why you don't put this cron jobs to run say every 1 hour, so it'll not to took
your months for debugging ?

Valery.

--- On Fri, 5/9/08, Shlomo Solomon <[EMAIL PROTECTED]> wrote:

> From: Shlomo Solomon <[EMAIL PROTECTED]>
> Subject: crash with no log entry
> To: Linux-IL@cs.huji.ac.il
> Date: Friday, May 9, 2008, 3:25 PM
> I've been having what "seemed" to be random
> crashes that left nothing in the 
> logs, until I noticed that they always happen just after
> 2:02 (while my daily 
> cron jobs are running) - so they're not random after
> all. Here are the last 3 
> crashes - from 10/4, 6/5 and 9/5. You can see that there
> are no log entries 
> after 2:02, until I do a hard re-boot:
> 
> ----- 1 ------
> Apr 10 01:58:01 shlomo1 crond[9786]: (root) CMD
> (/data1/myscripts/myADSLtest)
> Apr 10 02:00:01 shlomo1 crond[9811]: (root) CMD
> (/data1/myscripts/myADSLtest)
> Apr 10 02:00:01 shlomo1 crond[9812]: (root) CMD
> (/data1/myscripts/myAlive)
> Apr 10 02:01:01 shlomo1 crond[9830]: (root) CMD (nice -n 19
> 
> run-parts /etc/cron.hourly)
> Apr 10 02:02:01 shlomo1 crond[9845]: (root) CMD
> (/data1/myscripts/myADSLtest)
> Apr 10 02:02:01 shlomo1 crond[9846]: (root) CMD (nice -n 19
> time 
> run-parts /etc/cron.daily)
> Apr 10 02:02:02 shlomo1 anacron[9856]: Updated timestamp
> for job `cron.daily' 
> to 2008-04-10
> Apr 10 02:02:02 shlomo1 /etc/cron.daily/awffull[9859]: the
> /tmp/awffull.lock 
> file was found indicating an error. Maybe awffull is still
> running...
> Apr 10 02:02:03 shlomo1 logrotate: ALERT exited abnormally
> with [1]
> Apr 10 05:38:51 shlomo1 syslogd 1.4.2: restart.
> Apr 10 05:38:51 shlomo1 kernel: klogd 1.4.2, log source =
> /proc/kmsg started.
> Apr 10 05:38:51 shlomo1 kernel: Linux version
> 2.6.22.12-desktop586-1mdv 
> ([EMAIL PROTECTED]) (gcc version 4.2.2 20070909
> (prerelease) 
> (4.2.2-0.RC.1mdv2008.0)) #1 SMP Tue Nov 20 08:09:17 EST
> 2007
> 
> 
> ----- 2 ------
> May  6 01:58:01 shlomo1 crond[21897]: (root) CMD
> (/data1/myscripts/myADSLtest)
> May  6 02:00:01 shlomo1 crond[21916]: (root) CMD
> (/data1/myscripts/myAlive)
> May  6 02:00:01 shlomo1 crond[21917]: (root) CMD
> (/data1/myscripts/myADSLtest)
> May  6 02:01:01 shlomo1 crond[21937]: (root) CMD (nice -n
> 19 
> run-parts /etc/cron.hourly)
> May  6 02:02:01 shlomo1 crond[21951]: (root) CMD
> (/data1/myscripts/myADSLtest)
> May  6 02:02:01 shlomo1 crond[21952]: (root) CMD (nice -n
> 19 time 
> run-parts /etc/cron.daily)
> May  6 02:02:02 shlomo1 anacron[21962]: Updated timestamp
> for job `cron.daily' 
> to 2008-05-06
> May  6 02:02:02 shlomo1 /etc/cron.daily/awffull[21965]: the
> /tmp/awffull.lock 
> file was found indicating an error. Maybe awffull is still
> running...
> May  6 02:02:03 shlomo1 logrotate: ALERT exited abnormally
> with [1]
> May  6 04:47:50 shlomo1 syslogd 1.4.2: restart.
> May  6 04:47:50 shlomo1 kernel: klogd 1.4.2, log source =
> /proc/kmsg started.
> May  6 04:47:50 shlomo1 kernel: Linux version
> 2.6.22.12-desktop586-1mdv 
> ([EMAIL PROTECTED]) (gcc version 4.2.2 20070909
> (prerelease) 
> (4.2.2-0.RC.1mdv2008.0)) #1 SMP Tue Nov 20 08:09:17 EST
> 2007
> 
> 
> ----- 3 ------
> May  9 01:58:01 shlomo1 crond[27692]: (root) CMD
> (/data1/myscripts/myADSLtest)
> May  9 02:00:01 shlomo1 crond[27708]: (root) CMD
> (/data1/myscripts/myAlive)
> May  9 02:00:01 shlomo1 crond[27709]: (root) CMD
> (/data1/myscripts/myADSLtest)
> May  9 02:01:01 shlomo1 crond[27726]: (root) CMD (nice -n
> 19 
> run-parts /etc/cron.hourly)
> May  9 02:02:01 shlomo1 crond[27741]: (root) CMD
> (/data1/myscripts/myADSLtest)
> May  9 02:02:01 shlomo1 crond[27742]: (root) CMD (nice -n
> 19 time 
> run-parts /etc/cron.daily)
> May  9 02:02:01 shlomo1 anacron[27752]: Updated timestamp
> for job `cron.daily' 
> to 2008-05-09
> May  9 02:02:01 shlomo1 /etc/cron.daily/awffull[27755]: the
> /tmp/awffull.lock 
> file was found indicating an error. Maybe awffull is still
> running...
> May  9 02:02:02 shlomo1 logrotate: ALERT exited abnormally
> with [1]
> May  9 05:36:05 shlomo1 syslogd 1.4.2: restart.
> May  9 05:36:05 shlomo1 kernel: klogd 1.4.2, log source =
> /proc/kmsg started.
> May  9 05:36:05 shlomo1 kernel: Linux version
> 2.6.22.12-desktop586-1mdv 
> ([EMAIL PROTECTED]) (gcc version 4.2.2 20070909
> (prerelease) 
> (4.2.2-0.RC.1mdv2008.0)) #1 SMP Tue Nov 20 08:09:17 EST
> 2007
> 
> 
> 
> The common factor "seems" to be a problem with
> logrotate, but that's not the 
> cause. Here's an example of logrotate aborting and NOT
> causing a crash. In 
> fact, it seems logrotate gives that error every day. The
> "strange" thing is 
> that all the logs seem to be properly rotated, despite the
> error message. 
> 
> 
> 
> May  7 01:58:01 shlomo1 crond[2870]: (root) CMD
> (/data1/myscripts/myADSLtest)
> May  7 02:00:01 shlomo1 crond[2888]: (root) CMD
> (/data1/myscripts/myAlive)
> May  7 02:00:01 shlomo1 crond[2889]: (root) CMD
> (/data1/myscripts/myADSLtest)
> May  7 02:01:01 shlomo1 crond[2906]: (root) CMD (nice -n 19
> 
> run-parts /etc/cron.hourly)
> May  7 02:02:01 shlomo1 crond[2920]: (root) CMD
> (/data1/myscripts/myADSLtest)
> May  7 02:02:01 shlomo1 crond[2921]: (root) CMD (nice -n 19
> time 
> run-parts /etc/cron.daily)
> May  7 02:02:01 shlomo1 anacron[2931]: Updated timestamp
> for job `cron.daily' 
> to 2008-05-07
> May  7 02:02:01 shlomo1 /etc/cron.daily/awffull[2934]: the
> /tmp/awffull.lock 
> file was found indicating an error. Maybe awffull is still
> running...
> May  7 02:02:02 shlomo1 logrotate: ALERT exited abnormally
> with [1]
> May  7 02:04:01 shlomo1 crond[3112]: (root) CMD
> (/data1/myscripts/myADSLtest)
> May  7 02:06:01 shlomo1 crond[3138]: (root) CMD
> (/data1/myscripts/myADSLtest)
> May  7 02:08:02 shlomo1 crond[3153]: (root) CMD
> (/data1/myscripts/myADSLtest)
> May  7 02:09:02 shlomo1 crond[3164]: (root) CMD ([ -d
> /var/lib/php ] && 
> find /var/lib/php/ -type f -mmin
> +$(/usr/lib/php/maxlifetime) -print0 | 
> xargs -r -0 rm)
> 
> 
> 
> So, how do I find out what's causing the crash? My
> guess is that it's one of 
> the daily cron jobs, but how can I find out which? Since
> the crashes happen 
> at irregular intervals (sometimes 3 or 4 weeks apart and
> sometimes 2 days 
> apart), it's not a simple matter of disabling some of
> the jobs to see if that 
> solves the problem. That approach could take months.
> 
> BTW, here's a list f the daily cron jobs. My guess is
> that the problem is a 
> job running after logrotate, so that leaves 8
> possibilities.
> 
> 
> [EMAIL PROTECTED] cron.daily]$ ls -l
> total 56
> -rwxr-xr-x 1 root root  276 2007-08-17 02:56 0anacron*
> -rwxr-xr-x 1 root root 2575 2007-09-01 13:56 awffull*
> -rwxr-xr-x 1 root root  396 2007-11-16 23:00 getskyepg*
> -rwxr-xr-x 1 root root  400 2007-08-28 21:44 hylafax*
> -rwxr-xr-x 1 root root   37 2007-01-28 19:59 logcheck*
> -rwxr-xr-x 1 root root  180 2007-07-19 23:57 logrotate*
> -rwxr-xr-x 1 root root  410 2007-08-31 01:48
> makewhatis.cron*
> -rwxr-xr-x 1 root root  137 2007-09-24 17:26 mlocate.cron*
> lrwxrwxrwx 1 root root   27 2008-01-02 05:56 
> msec -> /usr/share/msec/security.sh*
> -rwxr-xr-x 1 root root  431 2006-02-05 22:56
> my-aa-findlargefiles*
> lrwxrwxrwx 1 root root   26 2008-01-02 20:16 
> myRPMlist -> /data1/myscripts/myRPMlist*
> -rwxr-xr-x 1 root root  167 2005-01-10 12:51 reoback*
> -rwxr-xr-x 1 root root  118 2007-10-02 12:09 rpm*
> -rwxr-xr-x 1 root root  101 2007-11-20 19:55 tetex.cron*
> -rwxr-xr-x 1 root root  371 2007-08-08 18:35 tmpwatch*
> -rwxr-xr-x 1 root root  315 2007-09-05 13:24
> tripwire-check*
> 
> 
> Can anyone can suggest how to debug this problem? I did
> think of one idea and 
> I'd like comments or suggestions. I could add several
> cron jobs to run after 
> each of the "real" jobs (or add a line to each
> existing job) to send myself 
> an e-mail to know what jobs have run, in order to see when
> the e-mails stop 
> coming. However, I'm not sure if there are overlaps in
> the running of cron 
> jobs - for example, if it possible that job number 2 starts
> before job number 
> 1 has ended? If so, hte my idea probably wouldn't work.
> 
> 
> -- 
> Shlomo Solomon
> http://the-solomons.net
> Sent by KMail (KDE 3.5.7) on LINUX Mandriva 2008.0
> 
> 
> =================================================================
> To unsubscribe, send mail to [EMAIL PROTECTED]
> with
> the word "unsubscribe" in the message body, e.g.,
> run the command
> echo unsubscribe | mail [EMAIL PROTECTED]


      
____________________________________________________________________________________
Be a better friend, newshound, and 
know-it-all with Yahoo! Mobile.  Try it now.  
http://mobile.yahoo.com/;_ylt=Ahu06i62sR8HDtDypao8Wcj9tAcJ

=================================================================
To unsubscribe, send mail to [EMAIL PROTECTED] with
the word "unsubscribe" in the message body, e.g., run the command
echo unsubscribe | mail [EMAIL PROTECTED]

Reply via email to