To polish my Bacula (2.2.4 on CentOS5) conf a little bit, I tried to add max wait time directive to job defaults. More specifically:
Max Start Delay = 7200 Max Run Time = 7000 Max Wait Time = 900 So, my goal was that - job won't start if it's delayed 2 hours - no single job may run longer than appr. 1 hour 56 minutes - job would be canceled, if it's hung for 15 minutes due to eg. missing tape Last night there were two full backups scheduled to run at 01.05, both with the same priority of 10. No concurrent jobs are allowed, so one of the jobs was expected to wait for the first one to finish. This is what happened: 01-Oct 01:05 dogbert-dir: Start Backup JobId 163, Job=Backup-Dogbert.2007-10-01_01.05.00 ... 01-Oct 01:20 dogbert-dir: Backup-Dogbert.2007-10-01_01.05.00 Fatal error: Max wait time exceeded. Job canceled. 01-Oct 01:20 dogbert-fd: Backup-Dogbert.2007-10-01_01.05.00 Fatal error: backup.c:892 Network send error to SD. ERR=Input/output error 01-Oct 01:20 dogbert-fd: Backup-Dogbert.2007-10-01_01.05.00 Error: bsock.c:311 Wrote 65536 bytes to Storage daemon:dogbert:9103, but only 16040 accepted. 01-Oct 01:20 dogbert-sd: Job Backup-Dogbert.2007-10-01_01.05.00 marked to be canceled. 01-Oct 01:20 dogbert-sd: Job Backup-Dogbert.2007-10-01_01.05.00 marked to be canceled. 01-Oct 01:20 dogbert-sd: Backup-Dogbert.2007-10-01_01.05.00 Fatal error: append.c:259 Network error on data channel. ERR=Connection reset by peer 01-Oct 01:20 dogbert-sd: Job write elapsed time = 00:11:41, Transfer rate = 6.008 M bytes/second 01-Oct 01:20 dogbert-dir: Bacula dogbert-dir 2.2.4 (14Sep07): 01-Oct-2007 01:20:22 ... FD Bytes Written: 4,192,251,934 (4.192 GB) SD Bytes Written: 4,211,990,004 (4.211 GB) Rate: 4556.8 KB/s So, the job had been running succesfully for the first 15 minutes, backing up over 4GB of data (reasonable rate for my hardware), and then it is canceled due to 15 minutes wait time limit??? Is this all quite correct? My idea was that "max run time" would be the limiting factor to behave like this? Then, the other job that was scheduled to start at the same time -but couldn't start before the first one finishes: 01-Oct 01:20 dogbert-dir: Backup-Dilbert.2007-10-01_01.05.01 Fatal error: Max wait time exceeded. Job canceled. So this job was canceled immediatedly, since "max wait time had been exeeded". However, Bacula documentation says: Max Wait Time = <time> The time specifies the maximum allowed time that a job may block waiting for a resource (such as waiting for a tape to be mounted, or waiting for the storage or file daemons to perform their duties), counted from the when the job starts, (not necessarily the same as when the job was scheduled). Now, it looks to me that max wait time is counted from the scheduled time anyway. I increased the max wait time to 6000 and run the jos manually, it worked for now. But I'm still wondering why things went this way? Regards, Timo ------------------------------------------------------------------------- This SF.net email is sponsored by: Microsoft Defy all challenges. Microsoft(R) Visual Studio 2005. http://clk.atdmt.com/MRT/go/vse0120000070mrt/direct/01/ _______________________________________________ Bacula-users mailing list Bacula-users@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/bacula-users