>>>>> On Wed, 8 May 2019 09:28:05 -0700, Mike Benoit said: > > Clearly the verify job either isn't verifying anything or isn't verifying > what I expect it to be verifying.
Was the test restore directory empty before you did the restore? > I noticed in the v7.4.0 release notes > there was a new feature of "Level=Data" for verification jobs, but I > couldn't actually find any more information on that in the documentation, > just information about "DiskToCatalog" and "VolumeToCatalog" verification > levels. Does "Level=Data" actually exist, or is there additional > documentation on it somewhere that I'm missing? This is all critical data, > so we want to do as much verification as possible on it which should in > theory mimic everything a restore would do so we can be 100% certain the > data was intact at that time. There is a little more documentation here: https://www.bacula.org/9.4.x-manuals/en/main/New_Features_in_7_4_0.html#SECTION00621000000000000000 and here: https://www.bacula.org/9.4.x-manuals/en/main/New_Features_in_9_0_0.html#SECTION005013000000000000000 > The next question is why is the data being corrupted to begin with? The > bacula server uses a RAID1 BTRFS array to store the pool volumes on and > doing a BTRFS scrub on the entire block device shows no checksum errors > whatsoever. There are also no hardware errors appearing in the dmesg logs, > and SMART monitoring on the drives isn't showing any errors (drives are a > few months old). > > Any ideas what would cause bacula to show checksum errors but BTRFS isn't? That suggests the data was corrupted after Bacula's checksum but before BTRFS computed its checksum. You could try running bls on that volume to see if it also detects the checksum mismatch. Something like bls -j /home/backup/Vol0043 If that does detect it, then try with bls -j -p to see if there are more errors. It may be worthwhile testing other volumes as well. > The SQL backup jobs are being run across multiple servers in multiple > countries, and they seem to be the only jobs that we have experienced the > checksum errors with so far and its happening almost every night when the > jobs are scheduled to run. If we run the jobs manually the next morning > everything works fine and the restore succeeds without a problem. Does the SD have ECC RAM? __Martin _______________________________________________ Bacula-users mailing list Bacula-users@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/bacula-users