Am I correct in assuming that the watchdog timer is killing my jobs? I have to constrain my backup jobs to less that 15 MBps with traffic shaping in this environment to avoid bandwidth contention, so even small jobs take a long time for bacula-fd to copy over.
Assuming I am correct that the watchdog timer is to blame, what's the work around? The traffic shaping is a hard client requirement. 23-Dec 14:02 bacula-dir JobId 60: Using Device "Drive0" to write. 23-Dec 14:02 Scalar-i40 JobId 60: Spooling data ... 29-Dec 14:02 bacula-dir JobId 60: Error: Watchdog sending kill after 518417 secs to thread stalled reading Storage daemon. 29-Dec 14:02 bacula-dir JobId 60: Error: Watchdog sending kill after 518417 secs to thread stalled reading File daemon. 29-Dec 14:02 bacula-dir JobId 60: Fatal error: Network error with FD during Backup: ERR=Interrupted system call 29-Dec 14:02 bacula-dir JobId 60: Error: Director's comm line to SD dropped. 29-Dec 14:02 bacula-dir JobId 60: Fatal error: No Job status returned from FD. 29-Dec 14:02 bacula-dir JobId 60: Error: Bacula bacula-dir 5.2.13 (19Jan13): Build OS: x86_64-redhat-linux-gnu redhat (Core) JobId: 60 Job: GCC_archive-everyone-else.2019-12-23_14.02.07_03 Backup Level: Full Client: "bock" 5.2.13 (19Jan13) x86_64-redhat-linux-gnu,redhat,(Core) FileSet: "GCC_archive-everyone-else-fileset" 2019-12-23 13:59:50 Pool: "lto6-pool" (From Job resource) Catalog: "GACCatalog" (From Client resource) Storage: "Scalar-i40" (From Pool resource) Scheduled time: 23-Dec-2019 14:02:05 Start time: 23-Dec-2019 14:02:09 End time: 29-Dec-2019 14:02:26 Elapsed time: 6 days 17 secs Priority: 12 FD Files Written: 0 SD Files Written: 0 FD Bytes Written: 0 (0 B) SD Bytes Written: 0 (0 B) Rate: 0.0 KB/s Software Compression: None VSS: no Encryption: no Accurate: no Volume name(s): Volume Session Id: 6 Volume Session Time: 1577127434 Last Volume Bytes: 387,072 (387.0 KB) Non-fatal FD errors: 3 SD Errors: 0 FD termination status: Error SD termination status: Error Termination: *** Backup Error *** [root@bock Job]# cat GCC_archive-everyone-else-job.conf #JN job file for Bock #---------------------------------- Job { Name = GCC_archive-everyone-else Type = Backup Client = bock Schedule = ManualOnly Messages = Daemon FileSet = GCC_archive-everyone-else-fileset Level = Full Pool = lto6-pool Priority = 12 Max Run Time = 8035200 # default limit is 6 days, 518400sec. bumped 3x just in case Spool Data = yes Spool Attributes = yes # spools catalog entries to disk until after file is backed up. If a job fails, catalogue remains clean ##JN backup Bacula DB when job is done. RunScript { RunsWhen = After FailJobOnError = No Command = "/usr/local/sbin/backup-bacula-db.sh" } } #---------------------------------- https://www.bacula.org/5.2.x-manuals/en/main/main/Configuring_Director.html Max Run Time = time The time specifies the maximum allowed time that a job may run, counted from when the job starts, (not necessarily the same as when the job was scheduled). By default, the the watchdog thread will kill any Job that has run more than 6 days. The maximum watchdog timeout is independent of MaxRunTime and cannot be changed. -- Thanks, John H. Nyhuis Desk: (206)-685-8334 jnyh...@uw.edu Box 359461, 15th floor, 106 _______________________________________________ Bacula-users mailing list Bacula-users@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/bacula-users