Hi all,
I have a setup when Bacula Director is hosted on AWS while one of bacula
clients is hosted elsewhere and I quite
often see the errors like this (for backup jobs):
2014-03-17 07:25:04 XXX-sd JobId 1179: Recycled volume "XXX_pool_0255" on
device "FileStorage5" (/mnt/backups), all previous data lost.
2014-03-17 07:25:04 XXX-dir JobId 1179: Volume used once. Marking Volume
"XXX_pool_0255" as Used.
2014-03-17 07:41:06 XXX-sd JobId 1179: Fatal error: append.c:161 Error
reading data header from FD. ERR=Connection timed out
2014-03-17 07:41:06 XXX-sd JobId 1179: Job write elapsed time = 00:16:02,
Transfer rate = 0 Bytes/second
2014-03-17 08:41:36 XXX-dir JobId 1179: Fatal error: Network error with
FD during Backup: ERR=Connection timed out
or like this (for verify jobs):
2014-03-16 13:10:50 XXX-dir JobId 1154: Start Verify JobId=1154
Level=VolumeToCatalog Job=XXX_verify.2014-03-16_07.00.00_18
2014-03-16 13:10:50 XXX-dir JobId 1154: Using Device "FileStorage5"
2014-03-16 13:38:52 XXX-sd JobId 1154: Ready to read from volume
"XXX_pool_0248" on device "FileStorage5" (/mnt/backups).
2014-03-16 15:49:18 XXX-sd JobId 1154: End of Volume at file 12 on device
"FileStorage5" (/mnt/backups), Volume "hondaextranet.ru_pool_0248"
2014-03-16 16:01:42 XXX-sd JobId 1154: Ready to read from volume
"XXX_pool_0252" on device "FileStorage5" (/mnt/backups).
2014-03-16 16:10:36 XXX-dir JobId 1154: Fatal error: verify.c:758 bdird
2014-03-16 16:10:36 XXX-dir JobId 1154: Fatal error: Network error with
FD during Verify: ERR=Connection reset by peer
2014-03-16 16:10:36 XXX-dir JobId 1154: Fatal error: No Job status
returned from FD.
or like this (for verify jobs):
2014-03-16 16:27:14 XXX-sd JobId 1155: Ready to read from volume
"XXX_pool_0248" on device "FileStorage5" (/mnt/backups).
2014-03-17 03:10:31 XXX-dir JobId 1155: Fatal error: verify.c:758 bdird
2014-03-17 03:10:31 XXX-dir JobId 1155: Fatal error: Network error with
FD during Verify: ERR=Connection timed out
The backups for this client are quite big (~70Gb which are split into 2
volumes) and transfer rate is like 3-4Mb/s and full backup job takes like
6-7 hours to complete.
Sometimes both jobs complete ok but quite often we meet errors like the
above which I think are caused by some kind of network outages. Heartbeat
intervals are set to 60 on all of Dir, SD and FD.
Is there a way to deal with such kind of problems?
Thanks,
Timur
------------------------------------------------------------------------------
Learn Graph Databases - Download FREE O'Reilly Book
"Graph Databases" is the definitive new guide to graph databases and their
applications. Written by three acclaimed leaders in the field,
this first edition is now available. Download your free book today!
http://p.sf.net/sfu/13534_NeoTech
_______________________________________________
Bacula-users mailing list
Bacula-users@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/bacula-users