Hello Ben,
Great! Thanks for the feedback.
Good luck,
Kern
On 01/17/2014 06:34 PM, Roberts, Ben wrote:
>
> Hi Kern,
>
>
>
> I verified that the failures were happening on a 5.0.x FD as well as a
> 5.2.x FD. At the time, I hadn't realised this was unsupported or that
> it was even happening. Following Martin's observation earlier that the
> corruption was happening conveniently closely to the 2^32 overflow
> boundary, I've re-compiled the director/sd (and took the opportunity
> to upgrade to 5.2.13) and am just trying a full restore of one of the
> failing backups now -- so far 460GB restored which is a record for
> this server. It looks like the problem was entirely my fault -- using
> a copy of the DIR/SD compiled for one OS on a newer version of OS and
> that the corruption was happening on reading the data stream back in
> rather than while it was being written to the backup volumes.
>
>
>
> I'll do a few more TB of restores to confirm the upgraded director is
> doing the correct thing and that this doesn't need any further
> investigation from the Bacula side.
>
>
>
> Noted Re the same version of DIR/SD. I have not and will not be
> attempting to cross versions here.
>
>
>
> Regards,
>
>
>
> *Ben Roberts***
>
> IT Infrastructure
>
> *GSA Capital Partners LLP***
>
> Stratton House
>
> 5 Stratton Street
>
> London W1J 8LA
>
> *D*+44 (0)20 7959 7661
>
> *T*+44 (0)20 7959 8800
>
>
> www.gsacapital.com <http://www.gsacapital.com/>
>
>
>
>
>
> *From:*Kern Sibbald [mailto:k...@sibbald.com]
> *Sent:* 17 January 2014 17:23
> *To:* Roberts, Ben; bacula-users@lists.sourceforge.net
> *Subject:* Re: [Bacula-users] Errors restoring from disk backup:
> Volume data error Wanted ID: "BB02", got ""
>
>
>
> Hello,
>
> Every case of this particular error message that I have seen has been
> due to data corruption outside of Bacula. Typically this happens when
> a disk drive is bad, but since you are running ZFS and its checksums
> are good, I can see only several other possibilities:
>
> 1. The ZFS code is messed up. Running a current distribution with the
> ZFS kernel module should not have this problem, but if you are running
> something a bit older or using a user file system rather than the kernel
> module you could have problems.
>
> 2. You have bad cables or a bad disk controller.
>
> 3. You seem to be using 5.2.x FDs with 5.0.x Director/SD,
> which is not supported. Your FDs should never be a higher
> version that the DIR/SD, but may be lower. In addition your
> DIR and SD must always be the same version.
>
> Oops, I just re-read your email and probably point 1 does not apply
> since you seem to be running ZFS on Solaris so there is little or no
> possibility that the code is bad.
>
> Best regards,
> Kern
>
> On 01/14/2014 06:04 PM, Roberts, Ben wrote:
>
> Hi all,
>
>
>
> I've recently setup a new Bacula director/storage daemon in
> preparation to move our existing backups to newer hardware. During
> testing, I've run into problems doing restores of backups taken to
> disk, failing with the messages:
>
>
>
> Error: block.c:275 Volume data error at 24:4294944994! Wanted ID:
> "BB02", got "". Buffer discarded.
>
> Fatal error: fd_cmds.c:169 Command error with FD, hanging up.
>
>
>
> Similar errors are reported for both file-level backups, and
> block-level backups made using bpipe. I've seen the instructions
> in
>
> http://www.bacula.org/en/dev-manual/main/main/Restore_Command.html#SECTION0021100000000000000000,
> but these only seem to apply to tape backups rather than disk
> ones. Regardless, I've tried striping the positional information
> from the bootstrap file with no effect.
>
>
>
> Some relevant notes from my testing:
>
> - The issue does not affect every backup made, but does
> affect a significant proportion tested.
>
> - A single job can be affected at multiple locations,
> i.e. skipping one affected file might see the job fail again at a
> subsequent file.
>
> - Attempting to restore the same job multiple times
> elicits failures at the same block each time. Re-running the job
> may produce a restorable backup, otherwise a job that will fail at
> a different location again. Other jobs fail at different locations.
>
> - All data is stored on ZFS, which reports completely
> clean of any checksum errors at the filesystem level
>
> - The server is not reporting any hardware issues, e.g.
> corrected or uncorrectable memory reads, disk accesses etc.
>
> - The backup jobs are multiple TB in size, and restores
> frequently fail within the first couple hundred GB.
>
> - The storage daemon is configured with a disk-changer
> backed autochanger, writing to 100GB volumes, all residing within
> the same ZFS filesystem (sitting atop a large RAID-Z2 disk array).
>
>
>
> The director is running "Version: 5.0.2 (28 April 2010)
> i386-pc-solaris2.10 solaris 5.10" (compiled on solaris 5.10,
> running on 5.11). Storage daemon runs on the same machine as the
> director. (I'm loosely tied to this version so the director can
> interact with a storage daemon on another machine connected to a
> tape changer).
>
> A sample client is running "Version: 5.2.13 (19 February 2013)
> i386-pc-solaris2.11 solaris 5.11".
>
>
>
> From my understanding of how the Bacula components fit together, I
> suspect the corruption must be happening in the Storage daemon
> (since this is the only component that would be interested in the
> BB02 block header?) before the data is written to disk (otherwise
> ZFS would be reporting read/write errors).
>
>
>
> Is this an issue that's been seen before on other disk backups?
> Can anyone provide any assistance in locating and fixing the cause
> of the corruption? Any help would be greatly appreciated.
>
>
>
> Regards,
>
>
>
> *Ben Roberts*
>
> IT Infrastructure
>
>
>
> --- Relevant config excerpts:
>
>
>
> Autochanger {
>
> Name = backup3-autochanger
>
> Device = drive-restore-backup3, drive-1-backup3
>
> Device = drive-2-backup3, drive-3-backup3
>
> Device = drive-4-backup3, drive-5-backup3
>
> Changer Device = /data2/bacula/storage/backup3-autochanger.conf
>
> Changer Command = "/opt/bacula/etc/disk-changer %c %o %S %a %d"
>
> }
>
>
>
> Device {
>
> Name = drive-1-backup3
>
> Archive Device = /data2/bacula/storage/backup3-autochanger/drive1
>
> Device Type = File
>
> Media Type = File-backup3
>
> AutoChanger = yes
>
> Removable media = no
>
> Random access = yes
>
> Requires Mount = no
>
> Always Open = no
>
> Label Media = yes
>
> Maximum Changer Wait = 180
>
> Drive Index = 1
>
> Maximum Spool Size = 100G
>
> }
>
> ...
>
>
>
> Storage {
>
> Name = backup3-sd
>
> Address = backup3.local
>
> Device = backup3-autochanger
>
> Media Type = File-backup3
>
> Autochanger = yes
>
> }
>
>
>
> Pool {
>
> Name = Disk-45Day-backup3
>
> Pool Type = Backup
>
> Recycle = yes
>
> AutoPrune = yes
>
> Job Retention = 45 days
>
> Volume Retention = 45 days
>
> Label Format = Disk-45Day-backup3-
>
> Storage = backup3-sd
>
> Maximum Volume Bytes = 100G
>
> }
>
>
>
> ------------------------------------------------------------------------
>
> This email and any files transmitted with it contain confidential
> and proprietary information and is solely for the use of the
> intended recipient. If you are not the intended recipient please
> return the email to the sender and delete it from your computer
> and you must not use, disclose, distribute, copy, print or rely on
> this email or its contents. This communication is for
> informational purposes only. It is not intended as an offer or
> solicitation for the purchase or sale of any financial instrument
> or as an official confirmation of any transaction. Any comments or
> statements made herein do not necessarily reflect those of GSA
> Capital. GSA Capital Partners LLP is authorised and regulated by
> the Financial Conduct Authority and is registered in England and
> Wales at Stratton House, 5 Stratton Street, London W1J 8LA, number
> OC309261. GSA Capital Services Limited is registered in England
> and Wales at the same address, number 5320529.
>
>
>
>
>
> ------------------------------------------------------------------------------
>
> CenturyLink Cloud: The Leader in Enterprise Cloud Services.
>
> Learn Why More Businesses Are Choosing CenturyLink Cloud For
>
> Critical Workloads, Development Environments & Everything In Between.
>
> Get a Quote or Start a Free Trial Today.
>
>
> http://pubads.g.doubleclick.net/gampad/clk?id=119420431&iu=/4140/ostg.clktrk
>
>
>
>
> _______________________________________________
>
> Bacula-users mailing list
>
> Bacula-users@lists.sourceforge.net
> <mailto:Bacula-users@lists.sourceforge.net>
>
> https://lists.sourceforge.net/lists/listinfo/bacula-users
>
>
>
>
> ------------------------------------------------------------------------
> This email and any files transmitted with it contain confidential and
> proprietary information and is solely for the use of the intended
> recipient. If you are not the intended recipient please return the
> email to the sender and delete it from your computer and you must not
> use, disclose, distribute, copy, print or rely on this email or its
> contents. This communication is for informational purposes only. It is
> not intended as an offer or solicitation for the purchase or sale of
> any financial instrument or as an official confirmation of any
> transaction. Any comments or statements made herein do not necessarily
> reflect those of GSA Capital. GSA Capital Partners LLP is authorised
> and regulated by the Financial Conduct Authority and is registered in
> England and Wales at Stratton House, 5 Stratton Street, London W1J
> 8LA, number OC309261. GSA Capital Services Limited is registered in
> England and Wales at the same address, number 5320529.
>
>
>
> ------------------------------------------------------------------------------
> CenturyLink Cloud: The Leader in Enterprise Cloud Services.
> Learn Why More Businesses Are Choosing CenturyLink Cloud For
> Critical Workloads, Development Environments & Everything In Between.
> Get a Quote or Start a Free Trial Today.
> http://pubads.g.doubleclick.net/gampad/clk?id=119420431&iu=/4140/ostg.clktrk
>
>
> _______________________________________________
> Bacula-users mailing list
> Bacula-users@lists.sourceforge.net
> https://lists.sourceforge.net/lists/listinfo/bacula-users
------------------------------------------------------------------------------
CenturyLink Cloud: The Leader in Enterprise Cloud Services.
Learn Why More Businesses Are Choosing CenturyLink Cloud For
Critical Workloads, Development Environments & Everything In Between.
Get a Quote or Start a Free Trial Today.
http://pubads.g.doubleclick.net/gampad/clk?id=119420431&iu=/4140/ostg.clktrk
_______________________________________________
Bacula-users mailing list
Bacula-users@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/bacula-users