On a Bacula installation with a CentOS 6.2 (64 bit) server running Bacula 5.2.6 director and SD with an HP LTO3 autoloader and backing up some twenty-odd machines of various platforms, all the jobs report "User specified spool size reached" much too early, currently after 1.45 GB instead of the really specified size of 50 GB. Example log excerpt:
10-Mai 22:09 bacula-sd JobId 2117: Spooling data ... 10-Mai 22:09 bacula-sd JobId 2117: User specified spool size reached. 10-Mai 22:09 bacula-sd JobId 2117: Writing spooled data to Volume. Despooling 1,450,176,877 bytes ... 10-Mai 22:10 bacula-sd JobId 2117: Despooling elapsed time = 00:00:22, Transfer rate = 65.91 M Bytes/second 10-Mai 22:10 bacula-sd JobId 2117: Spooling data again ... 10-Mai 22:10 bacula-sd JobId 2117: User specified spool size reached. 10-Mai 22:10 bacula-sd JobId 2117: Writing spooled data to Volume. Despooling 1,450,176,900 bytes ... The relevant device resource from /etc/bacula/bacula-sd.conf is: Device { Name = LTO-3 Media Type = LTO-3 Archive Device = /dev/nst0 AutomaticMount = yes; # when device opened, read it AlwaysOpen = yes; RemovableMedia = yes; RandomAccess = no; Maximum File Size = 4GB Changer Command = "/usr/libexec/bacula/mtx-changer %c %o %S %a %d" Changer Device = /dev/sg4 AutoChanger = yes SpoolDirectory = /home/backup/spool; MaximumSpoolSize = 50G; # CentOS 6 tapeinfo needs generic device Alert Command = "sh -c 'tapeinfo -f /dev/sg3 |grep TapeAlert|cat'" ## If you have smartctl, enable this, it has more info than tapeinfo # Alert Command = "sh -c 'smartctl -H -l error %c'" } Scanning the log it looks like the storage daemon decreases its spool size limit every time a backup job crashes. The last time it used the full 50 GB was here: 04-Mai 23:36 bacula-sd JobId 2040: Writing spooled data to Volume. Despooling 53,687,127,810 bytes ... That server then crashed during the backup (cause undetermined as yet, but probably hardware related). Bacula reported: 04-Mai 23:36 bacula-sd JobId 2040: User specified spool size reached. 04-Mai 23:36 bacula-sd JobId 2040: Writing spooled data to Volume. Despooling 53,687,127,810 bytes ... 04-Mai 23:52 bacula-sd JobId 2040: Despooling elapsed time = 00:16:16, Transferrate = 55.00 M Bytes/second 04-Mai 23:52 bacula-sd JobId 2040: Spooling data again ... 05-Mai 00:46 backup-dir JobId 2040: Fatal error: Network error with FD during Backup: ERR=Die Verbindung wurde vom Kommunikationspartner zurückgesetzt 05-Mai 00:46 backup-dir JobId 2040: Fatal error: No Job status returned from FD. (The text after ERR= is the German translation of "Connection reset by peer.") Then it proceeded to the next client, but with the spool limit reduced to 10 GB: 05-Mai 01:13 bacula-sd JobId 2042: User specified spool size reached. 05-Mai 01:13 bacula-sd JobId 2042: Writing spooled data to Volume. Despooling 9,902,626,948 bytes ... (Job 2041 was smaller than 10 GB.) Similarly, four days later, a manual backup job failed because of a network error: 09-Mai 10:20 bacula-sd JobId 2075: User specified spool size reached. 09-Mai 10:20 bacula-sd JobId 2075: Writing spooled data to Volume. Despooling 9,902,627,262 bytes ... 09-Mai 10:26 bacula-sd JobId 2075: Despooling elapsed time = 00:05:14, Transfer rate = 31.53 M Bytes/second 09-Mai 10:26 bacula-sd JobId 2075: Spooling data again ... 09-Mai 12:27 backup-dir JobId 2075: Fatal error: Network error with FD during Backup: ERR=Die Verbindung wurde vom Kommunikationspartner zurückgesetzt 09-Mai 12:27 backup-dir JobId 2075: Fatal error: No Job status returned from FD. Subsequent jobs ran with a further reduced spool limit of 2 GB: 09-Mai 22:07 bacula-sd JobId 2097: Spooling data ... 09-Mai 22:08 bacula-sd JobId 2097: Writing spooled data to Volume. Despooling 1,937,978,333 bytes ... Later that night, we had rescheduled the backup of the machine that had crashed on May 4, and it crashed again: 10-Mai 03:55 bacula-sd JobId 2096: User specified spool size reached. 10-Mai 03:55 bacula-sd JobId 2096: Writing spooled data to Volume. Despooling 1,937,978,308 bytes ... 10-Mai 03:56 bacula-sd JobId 2096: Despooling elapsed time = 00:01:02, Transfer rate = 31.25 M Bytes/second 10-Mai 03:56 bacula-sd JobId 2096: Spooling data again ... 10-Mai 04:14 backup-dir JobId 2096: Fatal error: Network error with FD during Backup: ERR=Die Wartezeit für die Verbindung ist abgelaufen 10-Mai 04:14 backup-dir JobId 2096: Fatal error: No Job status returned from FD. Since then, the spool size limit used by the SD sits at 1.45 GB: 10-Mai 22:09 bacula-sd JobId 2117: Writing spooled data to Volume. Despooling 1,450,176,877 bytes ... The spool directory contained just two files of size zero. I have removed those, restarted the SD, and am now waiting for the next scheduled backup. Looks like a bug to me ... Best regards, Tilman
signature.asc
Description: OpenPGP digital signature
------------------------------------------------------------------------------ Live Security Virtual Conference Exclusive live event will cover all the ways today's security and threat landscape has changed and how IT managers can respond. Discussions will include endpoint security, mobile security and the latest in malware threats. http://www.accelacomm.com/jaw/sfrnl04242012/114/50122263/
_______________________________________________ Bacula-users mailing list Bacula-users@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/bacula-users