Hello,
We are running Bacula 5.0.3 on RHEL and Centos. I have recently had a 16.5TB
backup fail at the
end when the system tried to spool the attribute data, messages are below. The
backend database used
is MySQL:
[root@hulsbackup lib]# mysql -V
mysql Ver 14.12 Distrib 5.0.77, for redhat-linux-gnu (x86_64) using readline
5.1
and lives on the same machine partition as the data spool directory. All backup
data was spooled
and dumped to tape successfully it appears.
I have successfully backed up a 5TB data set before this. However, between that
backup and
this failed one, we moved the bacula server to a different net and changed to a
LACP bonded interface.
There is a local iptables firewall running on the Bacula server.
In addition we kept hitting this 6 day limit where backups were getting auto
killed, so I changed
the following lines, and recompiled with a 60 day limit on both the bacula
server and client.
bnet.c: bsock->timeout = 60 * 60 * 60 * 24; /* 60 days timeout */
bsock.c: timeout = 60 * 60 * 60 * 24; /* 60 days timeout */
Other than that, everything is the default code. Has anyone hit this problem
and knows the solution
to this problem ? I can't easily re-run and reproduce this since it runs for
over 9 days.
Thanks,
Mike
...
...
05-Dec 02:48 hulsbackup-sd JobId 109: Alert: Home page is
http://smartmontools.sourceforge.net/
05-Dec 02:48 hulsbackup-sd JobId 109: Alert:
05-Dec 02:48 hulsbackup-sd JobId 109: Alert: TapeAlert: OK
05-Dec 02:48 hulsbackup-sd JobId 109: Alert:
05-Dec 02:48 hulsbackup-sd JobId 109: Alert: Error Counter logging not supported
05-Dec 02:48 hulsbackup-sd JobId 109: Sending spooled attrs to the Director.
Despooling 196,979,273 bytes ...
05-Dec 03:12 hulsbackup-dir JobId 109: Fatal error: Network error with FD
during Backup: ERR=Connection reset by peer
05-Dec 03:12 hulsbackup-dir JobId 109: Fatal error: No Job status returned from
FD.
05-Dec 03:12 hulsbackup-dir JobId 109: Error: Bacula hulsbackup-dir 5.0.3
(04Aug10): 05-Dec-2011 03:12:15
Build OS: x86_64-unknown-linux-gnu redhat Enterprise release
JobId: 109
Job: ceserve1.2011-11-25_21.11.56_11
Backup Level: Full
Client: "ceserve1-fd" 5.0.3 (04Aug10)
x86_64-unknown-linux-gnu,redhat,
FileSet: "ceserve1-data" 2011-11-02 11:03:12
Pool: "Default" (From Job resource)
Catalog: "MyCatalog" (From Client resource)
Storage: "Autochanger" (From command line)
Scheduled time: 25-Nov-2011 21:11:47
Start time: 25-Nov-2011 21:11:58
End time: 05-Dec-2011 03:12:15
Elapsed time: 9 days 6 hours 17 secs
Priority: 10
FD Files Written: 0
SD Files Written: 571,253
FD Bytes Written: 0 (0 B)
SD Bytes Written: 16,495,138,769,029 (16.49 TB)
Rate: 0.0 KB/s
Software Compression: None
VSS: no
Encryption: no
Accurate: no
Volume name(s):
000093L3|000094L3|000095L3|000096L3|000097L3|000098L3|000099L3|000100L3|000101L3|000102L3|000103L3|000104L3|000105L3|000106L3|000107L3|000108L3|000109L3|000110L3|000111L3|000112L3|000113L3|000114L3|000115L3|000127L3|000117L3|000118L3|000119L3|000013L3|000121L3|000122L3|000123L3|000124L3|000125L3|000126L3|000166L3|000128L3|000129L3|000130L3|000131L3|000132L3
Volume Session Id: 2
Volume Session Time: 1322270042
Last Volume Bytes: 246,238,949,376 (246.2 GB)
Non-fatal FD errors: 0
SD Errors: 39
FD termination status: Error
SD termination status: OK
Termination: *** Backup Error ***
------------------------------------------------------------------------------
All the data continuously generated in your IT infrastructure
contains a definitive record of customers, application performance,
security threats, fraudulent activity, and more. Splunk takes this
data and makes sense of it. IT sense. And common sense.
http://p.sf.net/sfu/splunk-novd2d
_______________________________________________
Bacula-users mailing list
Bacula-users@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/bacula-users