Hi all,
I've recently setup a new Bacula director/storage daemon in preparation to move
our existing backups to newer hardware. During testing, I've run into problems
doing restores of backups taken to disk, failing with the messages:
Error: block.c:275 Volume data error at 24:4294944994! Wanted ID: "BB02", got
"". Buffer discarded.
Fatal error: fd_cmds.c:169 Command error with FD, hanging up.
Similar errors are reported for both file-level backups, and block-level
backups made using bpipe. I've seen the instructions in
http://www.bacula.org/en/dev-manual/main/main/Restore_Command.html#SECTION0021100000000000000000,
but these only seem to apply to tape backups rather than disk ones.
Regardless, I've tried striping the positional information from the bootstrap
file with no effect.
Some relevant notes from my testing:
- The issue does not affect every backup made, but does affect a
significant proportion tested.
- A single job can be affected at multiple locations, i.e. skipping
one affected file might see the job fail again at a subsequent file.
- Attempting to restore the same job multiple times elicits failures
at the same block each time. Re-running the job may produce a restorable
backup, otherwise a job that will fail at a different location again. Other
jobs fail at different locations.
- All data is stored on ZFS, which reports completely clean of any
checksum errors at the filesystem level
- The server is not reporting any hardware issues, e.g. corrected or
uncorrectable memory reads, disk accesses etc.
- The backup jobs are multiple TB in size, and restores frequently
fail within the first couple hundred GB.
- The storage daemon is configured with a disk-changer backed
autochanger, writing to 100GB volumes, all residing within the same ZFS
filesystem (sitting atop a large RAID-Z2 disk array).
The director is running "Version: 5.0.2 (28 April 2010) i386-pc-solaris2.10
solaris 5.10" (compiled on solaris 5.10, running on 5.11). Storage daemon runs
on the same machine as the director. (I'm loosely tied to this version so the
director can interact with a storage daemon on another machine connected to a
tape changer).
A sample client is running "Version: 5.2.13 (19 February 2013)
i386-pc-solaris2.11 solaris 5.11".
>From my understanding of how the Bacula components fit together, I suspect the
>corruption must be happening in the Storage daemon (since this is the only
>component that would be interested in the BB02 block header?) before the data
>is written to disk (otherwise ZFS would be reporting read/write errors).
Is this an issue that's been seen before on other disk backups? Can anyone
provide any assistance in locating and fixing the cause of the corruption? Any
help would be greatly appreciated.
Regards,
Ben Roberts
IT Infrastructure
--- Relevant config excerpts:
Autochanger {
Name = backup3-autochanger
Device = drive-restore-backup3, drive-1-backup3
Device = drive-2-backup3, drive-3-backup3
Device = drive-4-backup3, drive-5-backup3
Changer Device = /data2/bacula/storage/backup3-autochanger.conf
Changer Command = "/opt/bacula/etc/disk-changer %c %o %S %a %d"
}
Device {
Name = drive-1-backup3
Archive Device = /data2/bacula/storage/backup3-autochanger/drive1
Device Type = File
Media Type = File-backup3
AutoChanger = yes
Removable media = no
Random access = yes
Requires Mount = no
Always Open = no
Label Media = yes
Maximum Changer Wait = 180
Drive Index = 1
Maximum Spool Size = 100G
}
...
Storage {
Name = backup3-sd
Address = backup3.local
Device = backup3-autochanger
Media Type = File-backup3
Autochanger = yes
}
Pool {
Name = Disk-45Day-backup3
Pool Type = Backup
Recycle = yes
AutoPrune = yes
Job Retention = 45 days
Volume Retention = 45 days
Label Format = Disk-45Day-backup3-
Storage = backup3-sd
Maximum Volume Bytes = 100G
}
________________________________
This email and any files transmitted with it contain confidential and
proprietary information and is solely for the use of the intended recipient. If
you are not the intended recipient please return the email to the sender and
delete it from your computer and you must not use, disclose, distribute, copy,
print or rely on this email or its contents. This communication is for
informational purposes only. It is not intended as an offer or solicitation for
the purchase or sale of any financial instrument or as an official confirmation
of any transaction. Any comments or statements made herein do not necessarily
reflect those of GSA Capital. GSA Capital Partners LLP is authorised and
regulated by the Financial Conduct Authority and is registered in England and
Wales at Stratton House, 5 Stratton Street, London W1J 8LA, number OC309261.
GSA Capital Services Limited is registered in England and Wales at the same
address, number 5320529.
------------------------------------------------------------------------------
CenturyLink Cloud: The Leader in Enterprise Cloud Services.
Learn Why More Businesses Are Choosing CenturyLink Cloud For
Critical Workloads, Development Environments & Everything In Between.
Get a Quote or Start a Free Trial Today.
http://pubads.g.doubleclick.net/gampad/clk?id=119420431&iu=/4140/ostg.clktrk
_______________________________________________
Bacula-users mailing list
Bacula-users@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/bacula-users