Hi all,

I've recently setup a new Bacula director/storage daemon in preparation to move 
our existing backups to newer hardware. During testing, I've run into problems 
doing restores of backups taken to disk, failing with the messages:



Error: block.c:275 Volume data error at 24:4294944994! Wanted ID: "BB02", got 
"". Buffer discarded.

Fatal error: fd_cmds.c:169 Command error with FD, hanging up.

Similar errors are reported for both file-level backups, and block-level 
backups made using bpipe. I've seen the instructions in 
http://www.bacula.org/en/dev-manual/main/main/Restore_Command.html#SECTION0021100000000000000000,
 but these only seem to apply to tape backups rather than disk ones. 
Regardless, I've tried striping the positional information from the bootstrap 
file with no effect.

Some relevant notes from my testing:

-          The issue does not affect every backup made, but does affect a 
significant proportion tested.

-          A single job can be affected at multiple locations, i.e. skipping 
one affected file might see the job fail again at a subsequent file.

-          Attempting to restore the same job multiple times elicits failures 
at the same block each time. Re-running the job may produce a restorable 
backup, otherwise a job that will fail at a different location again. Other 
jobs fail at different locations.

-          All data is stored on ZFS, which reports completely clean of any 
checksum errors at the filesystem level

-          The server is not reporting any hardware issues, e.g. corrected or 
uncorrectable memory reads, disk accesses etc.

-          The backup jobs are multiple TB in size, and restores frequently 
fail within the first couple hundred GB.

-          The storage daemon is configured with a disk-changer backed 
autochanger, writing to 100GB volumes, all residing within the same ZFS 
filesystem (sitting atop a large RAID-Z2 disk array).

The director is running "Version: 5.0.2 (28 April 2010) i386-pc-solaris2.10 
solaris 5.10" (compiled on solaris 5.10, running on 5.11). Storage daemon runs 
on the same machine as the director.  (I'm loosely tied to this version so the 
director can interact with a storage daemon on another machine connected to a 
tape changer).
A sample client is running "Version: 5.2.13 (19 February 2013)  
i386-pc-solaris2.11 solaris 5.11".

>From my understanding of how the Bacula components fit together, I suspect the 
>corruption must be happening in the Storage daemon (since this is the only 
>component that would be interested in the BB02 block header?) before the data 
>is written to disk (otherwise ZFS would be reporting read/write errors).

Is this an issue that's been seen before on other disk backups? Can anyone 
provide any assistance in locating and fixing the cause of the corruption? Any 
help would be greatly appreciated.

Regards,

Ben Roberts

IT Infrastructure


--- Relevant config excerpts:

Autochanger {
  Name = backup3-autochanger
  Device = drive-restore-backup3, drive-1-backup3
  Device = drive-2-backup3, drive-3-backup3
  Device = drive-4-backup3, drive-5-backup3
  Changer Device = /data2/bacula/storage/backup3-autochanger.conf
  Changer Command = "/opt/bacula/etc/disk-changer %c %o %S %a %d"
}

Device {
  Name = drive-1-backup3
  Archive Device = /data2/bacula/storage/backup3-autochanger/drive1
  Device Type = File
  Media Type = File-backup3
  AutoChanger = yes
  Removable media = no
  Random access = yes
  Requires Mount = no
  Always Open = no
  Label Media = yes
  Maximum Changer Wait = 180
  Drive Index = 1
  Maximum Spool Size = 100G
}
...

Storage {
  Name = backup3-sd
  Address = backup3.local
  Device = backup3-autochanger
  Media Type = File-backup3
  Autochanger = yes
}

Pool {
    Name = Disk-45Day-backup3
    Pool Type = Backup
    Recycle = yes
    AutoPrune = yes
    Job Retention = 45 days
    Volume Retention = 45 days
    Label Format = Disk-45Day-backup3-
    Storage = backup3-sd
    Maximum Volume Bytes = 100G
}

________________________________
This email and any files transmitted with it contain confidential and 
proprietary information and is solely for the use of the intended recipient. If 
you are not the intended recipient please return the email to the sender and 
delete it from your computer and you must not use, disclose, distribute, copy, 
print or rely on this email or its contents. This communication is for 
informational purposes only. It is not intended as an offer or solicitation for 
the purchase or sale of any financial instrument or as an official confirmation 
of any transaction. Any comments or statements made herein do not necessarily 
reflect those of GSA Capital. GSA Capital Partners LLP is authorised and 
regulated by the Financial Conduct Authority and is registered in England and 
Wales at Stratton House, 5 Stratton Street, London W1J 8LA, number OC309261. 
GSA Capital Services Limited is registered in England and Wales at the same 
address, number 5320529.

------------------------------------------------------------------------------
CenturyLink Cloud: The Leader in Enterprise Cloud Services.
Learn Why More Businesses Are Choosing CenturyLink Cloud For
Critical Workloads, Development Environments & Everything In Between.
Get a Quote or Start a Free Trial Today. 
http://pubads.g.doubleclick.net/gampad/clk?id=119420431&iu=/4140/ostg.clktrk
_______________________________________________
Bacula-users mailing list
Bacula-users@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/bacula-users

Reply via email to