Re: [Bacula-users] Getting around the watchdog timer in bacula 5.2.13-23.1.el7.x86_64

Martin Simmons Fri, 03 Jan 2020 09:03:23 -0800

According to your log, it transferred nothing in 6 days.  That is a very
extreme definition of traffic shaping!  Are you sure you haven't set the
transfer rate too low?


__Martin


>>>>> On Mon, 30 Dec 2019 18:03:35 +0000, John H Nyhuis said:
> 
> Am I correct in assuming that the watchdog timer is killing my jobs?
> 
> I have to constrain my backup jobs to less that 15 MBps with traffic 
> shaping in this environment to avoid bandwidth contention, so even small 
> jobs take a long time for bacula-fd to copy over.
> 
> Assuming I am correct that the watchdog timer is to blame, what's the 
> work around?  The traffic shaping is a hard client requirement.
> 
> 
> 23-Dec 14:02 bacula-dir JobId 60: Using Device "Drive0" to write.
> 23-Dec 14:02 Scalar-i40 JobId 60: Spooling data ...
> 29-Dec 14:02 bacula-dir JobId 60: Error: Watchdog sending kill after 
> 518417 secs to thread stalled reading Storage daemon.
> 29-Dec 14:02 bacula-dir JobId 60: Error: Watchdog sending kill after 
> 518417 secs to thread stalled reading File daemon.
> 29-Dec 14:02 bacula-dir JobId 60: Fatal error: Network error with FD 
> during Backup: ERR=Interrupted system call
> 29-Dec 14:02 bacula-dir JobId 60: Error: Director's comm line to SD dropped.
> 29-Dec 14:02 bacula-dir JobId 60: Fatal error: No Job status returned 
> from FD.
> 29-Dec 14:02 bacula-dir JobId 60: Error: Bacula bacula-dir 5.2.13 (19Jan13):
>    Build OS:               x86_64-redhat-linux-gnu redhat (Core)
>    JobId:                  60
>    Job:                    GCC_archive-everyone-else.2019-12-23_14.02.07_03
>    Backup Level:           Full
>    Client:                 "bock" 5.2.13 (19Jan13) 
> x86_64-redhat-linux-gnu,redhat,(Core)
>    FileSet:                "GCC_archive-everyone-else-fileset" 
> 2019-12-23 13:59:50
>    Pool:                   "lto6-pool" (From Job resource)
>    Catalog:                "GACCatalog" (From Client resource)
>    Storage:                "Scalar-i40" (From Pool resource)
>    Scheduled time:         23-Dec-2019 14:02:05
>    Start time:             23-Dec-2019 14:02:09
>    End time:               29-Dec-2019 14:02:26
>    Elapsed time:           6 days 17 secs
>    Priority:               12
>    FD Files Written:       0
>    SD Files Written:       0
>    FD Bytes Written:       0 (0 B)
>    SD Bytes Written:       0 (0 B)
>    Rate:                   0.0 KB/s
>    Software Compression:   None
>    VSS:                    no
>    Encryption:             no
>    Accurate:               no
>    Volume name(s):
>    Volume Session Id:      6
>    Volume Session Time:    1577127434
>    Last Volume Bytes:      387,072 (387.0 KB)
>    Non-fatal FD errors:    3
>    SD Errors:              0
>    FD termination status:  Error
>    SD termination status:  Error
>    Termination:            *** Backup Error ***
> 
> 
> 
> [root@bock Job]# cat GCC_archive-everyone-else-job.conf
> #JN job file for Bock
> #----------------------------------
> Job {
>    Name = GCC_archive-everyone-else
>    Type = Backup
>    Client = bock
>    Schedule = ManualOnly
>    Messages = Daemon
>    FileSet = GCC_archive-everyone-else-fileset
>    Level = Full
>    Pool = lto6-pool
>    Priority = 12
>    Max Run Time = 8035200 # default limit is 6 days, 518400sec. bumped 
> 3x just in case
>    Spool Data = yes
>    Spool Attributes = yes  # spools catalog entries to disk until after 
> file is backed up.  If a job fails, catalogue remains clean
> 
> ##JN backup Bacula DB when job is done.
> RunScript {
>          RunsWhen = After
>          FailJobOnError = No
>          Command = "/usr/local/sbin/backup-bacula-db.sh"
>     }
> }
> #----------------------------------
> 
> 
> 
> https://www.bacula.org/5.2.x-manuals/en/main/main/Configuring_Director.html
> 
> Max Run Time = time
>      The time specifies the maximum allowed time that a job may run, 
> counted from when the job starts, (not necessarily the same as when the 
> job was scheduled).
>      By default, the the watchdog thread will kill any Job that has run 
> more than 6 days. The maximum watchdog timeout is independent of 
> MaxRunTime and cannot be changed.
> 
> -- 
> Thanks,
> 
> John H. Nyhuis
> Desk: (206)-685-8334
> jnyh...@uw.edu
> Box 359461, 15th floor, 106
> 
> _______________________________________________
> Bacula-users mailing list
> Bacula-users@lists.sourceforge.net
> https://lists.sourceforge.net/lists/listinfo/bacula-users
> 


_______________________________________________
Bacula-users mailing list
Bacula-users@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/bacula-users

Re: [Bacula-users] Getting around the watchdog timer in bacula 5.2.13-23.1.el7.x86_64

Reply via email to