Hi Martin, Kern
Confirmed that at least recompiling on the target machine, if not the upgrade
to 5.2.13 at the same time fixed this, and I was able to restore 10TB over the
weekend from backups previously indicated as faulty. It looks like the root
cause was binary-incompatibility in the system libraries bacula was linking to
compared to what it was built against that manifested only as a read error
during restores.
Thanks again for your help, very much appreciated!
Ben Roberts
IT Infrastructure
GSA Capital Partners LLP
Stratton House
5 Stratton Street
London W1J 8LA
D +44 (0)20 7959 7661
T +44 (0)20 7959 8800
www.gsacapital.com<http://www.gsacapital.com/>
From: Kern Sibbald [mailto:k...@sibbald.com]
Sent: 17 January 2014 19:35
To: Roberts, Ben; bacula-users@lists.sourceforge.net
Subject: Re: [Bacula-users] Errors restoring from disk backup: Volume data
error Wanted ID: "BB02", got ""
Hello Ben,
Great! Thanks for the feedback.
Good luck,
Kern
On 01/17/2014 06:34 PM, Roberts, Ben wrote:
Hi Kern,
I verified that the failures were happening on a 5.0.x FD as well as a 5.2.x
FD. At the time, I hadn't realised this was unsupported or that it was even
happening. Following Martin's observation earlier that the corruption was
happening conveniently closely to the 2^32 overflow boundary, I've re-compiled
the director/sd (and took the opportunity to upgrade to 5.2.13) and am just
trying a full restore of one of the failing backups now - so far 460GB restored
which is a record for this server. It looks like the problem was entirely my
fault - using a copy of the DIR/SD compiled for one OS on a newer version of OS
and that the corruption was happening on reading the data stream back in rather
than while it was being written to the backup volumes.
I'll do a few more TB of restores to confirm the upgraded director is doing the
correct thing and that this doesn't need any further investigation from the
Bacula side.
Noted Re the same version of DIR/SD. I have not and will not be attempting to
cross versions here.
Regards,
Ben Roberts
IT Infrastructure
GSA Capital Partners LLP
Stratton House
5 Stratton Street
London W1J 8LA
D +44 (0)20 7959 7661
T +44 (0)20 7959 8800
www.gsacapital.com<http://www.gsacapital.com/>
From: Kern Sibbald [mailto:k...@sibbald.com]
Sent: 17 January 2014 17:23
To: Roberts, Ben;
bacula-users@lists.sourceforge.net<mailto:bacula-users@lists.sourceforge.net>
Subject: Re: [Bacula-users] Errors restoring from disk backup: Volume data
error Wanted ID: "BB02", got ""
Hello,
Every case of this particular error message that I have seen has been
due to data corruption outside of Bacula. Typically this happens when
a disk drive is bad, but since you are running ZFS and its checksums
are good, I can see only several other possibilities:
1. The ZFS code is messed up. Running a current distribution with the
ZFS kernel module should not have this problem, but if you are running
something a bit older or using a user file system rather than the kernel
module you could have problems.
2. You have bad cables or a bad disk controller.
3. You seem to be using 5.2.x FDs with 5.0.x Director/SD,
which is not supported. Your FDs should never be a higher
version that the DIR/SD, but may be lower. In addition your
DIR and SD must always be the same version.
Oops, I just re-read your email and probably point 1 does not apply
since you seem to be running ZFS on Solaris so there is little or no
possibility that the code is bad.
Best regards,
Kern
On 01/14/2014 06:04 PM, Roberts, Ben wrote:
Hi all,
I've recently setup a new Bacula director/storage daemon in preparation to move
our existing backups to newer hardware. During testing, I've run into problems
doing restores of backups taken to disk, failing with the messages:
Error: block.c:275 Volume data error at 24:4294944994! Wanted ID: "BB02", got
"". Buffer discarded.
Fatal error: fd_cmds.c:169 Command error with FD, hanging up.
Similar errors are reported for both file-level backups, and block-level
backups made using bpipe. I've seen the instructions in
http://www.bacula.org/en/dev-manual/main/main/Restore_Command.html#SECTION0021100000000000000000,
but these only seem to apply to tape backups rather than disk ones.
Regardless, I've tried striping the positional information from the bootstrap
file with no effect.
Some relevant notes from my testing:
- The issue does not affect every backup made, but does affect a
significant proportion tested.
- A single job can be affected at multiple locations, i.e. skipping
one affected file might see the job fail again at a subsequent file.
- Attempting to restore the same job multiple times elicits failures
at the same block each time. Re-running the job may produce a restorable
backup, otherwise a job that will fail at a different location again. Other
jobs fail at different locations.
- All data is stored on ZFS, which reports completely clean of any
checksum errors at the filesystem level
- The server is not reporting any hardware issues, e.g. corrected or
uncorrectable memory reads, disk accesses etc.
- The backup jobs are multiple TB in size, and restores frequently
fail within the first couple hundred GB.
- The storage daemon is configured with a disk-changer backed
autochanger, writing to 100GB volumes, all residing within the same ZFS
filesystem (sitting atop a large RAID-Z2 disk array).
The director is running "Version: 5.0.2 (28 April 2010) i386-pc-solaris2.10
solaris 5.10" (compiled on solaris 5.10, running on 5.11). Storage daemon runs
on the same machine as the director. (I'm loosely tied to this version so the
director can interact with a storage daemon on another machine connected to a
tape changer).
A sample client is running "Version: 5.2.13 (19 February 2013)
i386-pc-solaris2.11 solaris 5.11".
>From my understanding of how the Bacula components fit together, I suspect the
>corruption must be happening in the Storage daemon (since this is the only
>component that would be interested in the BB02 block header?) before the data
>is written to disk (otherwise ZFS would be reporting read/write errors).
Is this an issue that's been seen before on other disk backups? Can anyone
provide any assistance in locating and fixing the cause of the corruption? Any
help would be greatly appreciated.
Regards,
Ben Roberts
IT Infrastructure
--- Relevant config excerpts:
Autochanger {
Name = backup3-autochanger
Device = drive-restore-backup3, drive-1-backup3
Device = drive-2-backup3, drive-3-backup3
Device = drive-4-backup3, drive-5-backup3
Changer Device = /data2/bacula/storage/backup3-autochanger.conf
Changer Command = "/opt/bacula/etc/disk-changer %c %o %S %a %d"
}
Device {
Name = drive-1-backup3
Archive Device = /data2/bacula/storage/backup3-autochanger/drive1
Device Type = File
Media Type = File-backup3
AutoChanger = yes
Removable media = no
Random access = yes
Requires Mount = no
Always Open = no
Label Media = yes
Maximum Changer Wait = 180
Drive Index = 1
Maximum Spool Size = 100G
}
...
Storage {
Name = backup3-sd
Address = backup3.local
Device = backup3-autochanger
Media Type = File-backup3
Autochanger = yes
}
Pool {
Name = Disk-45Day-backup3
Pool Type = Backup
Recycle = yes
AutoPrune = yes
Job Retention = 45 days
Volume Retention = 45 days
Label Format = Disk-45Day-backup3-
Storage = backup3-sd
Maximum Volume Bytes = 100G
}
________________________________
This email and any files transmitted with it contain confidential and
proprietary information and is solely for the use of the intended recipient. If
you are not the intended recipient please return the email to the sender and
delete it from your computer and you must not use, disclose, distribute, copy,
print or rely on this email or its contents. This communication is for
informational purposes only. It is not intended as an offer or solicitation for
the purchase or sale of any financial instrument or as an official confirmation
of any transaction. Any comments or statements made herein do not necessarily
reflect those of GSA Capital. GSA Capital Partners LLP is authorised and
regulated by the Financial Conduct Authority and is registered in England and
Wales at Stratton House, 5 Stratton Street, London W1J 8LA, number OC309261.
GSA Capital Services Limited is registered in England and Wales at the same
address, number 5320529.
------------------------------------------------------------------------------
CenturyLink Cloud: The Leader in Enterprise Cloud Services.
Learn Why More Businesses Are Choosing CenturyLink Cloud For
Critical Workloads, Development Environments & Everything In Between.
Get a Quote or Start a Free Trial Today.
http://pubads.g.doubleclick.net/gampad/clk?id=119420431&iu=/4140/ostg.clktrk
_______________________________________________
Bacula-users mailing list
Bacula-users@lists.sourceforge.net<mailto:Bacula-users@lists.sourceforge.net>
https://lists.sourceforge.net/lists/listinfo/bacula-users
________________________________
This email and any files transmitted with it contain confidential and
proprietary information and is solely for the use of the intended recipient. If
you are not the intended recipient please return the email to the sender and
delete it from your computer and you must not use, disclose, distribute, copy,
print or rely on this email or its contents. This communication is for
informational purposes only. It is not intended as an offer or solicitation for
the purchase or sale of any financial instrument or as an official confirmation
of any transaction. Any comments or statements made herein do not necessarily
reflect those of GSA Capital. GSA Capital Partners LLP is authorised and
regulated by the Financial Conduct Authority and is registered in England and
Wales at Stratton House, 5 Stratton Street, London W1J 8LA, number OC309261.
GSA Capital Services Limited is registered in England and Wales at the same
address, number 5320529.
------------------------------------------------------------------------------
CenturyLink Cloud: The Leader in Enterprise Cloud Services.
Learn Why More Businesses Are Choosing CenturyLink Cloud For
Critical Workloads, Development Environments & Everything In Between.
Get a Quote or Start a Free Trial Today.
http://pubads.g.doubleclick.net/gampad/clk?id=119420431&iu=/4140/ostg.clktrk
_______________________________________________
Bacula-users mailing list
Bacula-users@lists.sourceforge.net<mailto:Bacula-users@lists.sourceforge.net>
https://lists.sourceforge.net/lists/listinfo/bacula-users
________________________________
This email and any files transmitted with it contain confidential and
proprietary information and is solely for the use of the intended recipient. If
you are not the intended recipient please return the email to the sender and
delete it from your computer and you must not use, disclose, distribute, copy,
print or rely on this email or its contents. This communication is for
informational purposes only. It is not intended as an offer or solicitation for
the purchase or sale of any financial instrument or as an official confirmation
of any transaction. Any comments or statements made herein do not necessarily
reflect those of GSA Capital. GSA Capital Partners LLP is authorised and
regulated by the Financial Conduct Authority and is registered in England and
Wales at Stratton House, 5 Stratton Street, London W1J 8LA, number OC309261.
GSA Capital Services Limited is registered in England and Wales at the same
address, number 5320529.
------------------------------------------------------------------------------
CenturyLink Cloud: The Leader in Enterprise Cloud Services.
Learn Why More Businesses Are Choosing CenturyLink Cloud For
Critical Workloads, Development Environments & Everything In Between.
Get a Quote or Start a Free Trial Today.
http://pubads.g.doubleclick.net/gampad/clk?id=119420431&iu=/4140/ostg.clktrk
_______________________________________________
Bacula-users mailing list
Bacula-users@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/bacula-users