Hi, thanks for the suggestions. Sorry it took me a few days to respond-
there's not much time for testing between daily backup cycles.
My current theory is that there is some strange corruption in my DB,
perhaps in the File table.
I'm not sure but I think this may be related to another problem I'm
having. Restores (using option 3 or 5) of very large jobs (around 200
GB) fail while writing the restore bootstrap. I suspect that while
reading my catalog to generate the restore.bsr, bacula is encountering
some corruption, which may also explain the strange garbage in my other
restores.
This isn't necessarily pertinent but I have seen a couple interesting
results with these large restores, varying from the SD segfaulting
immediately to sitting in an infinite loop, eating about half the system
memory, and then segfaulting (no, its not using the tls lib). Here's a
gdb trace of the latter behavior if you are curious:
http://www.duke.edu/~ttoomey/misc/bacula-sd-debug.3.txt.gz
After running dbcheck, it did cough up an error while restoring before
the SD died:
25-Jul 11:05 fury: restore.2005-07-25_11.03.07 Fatal error: Bootstrap
file error: expected an integer or a range, got T_EOL: =
: Line 5543394, col 10 of file
/var/bacula/fury.restore.2005-07-25_11.03.07.bootstrap
FileIndex=
25-Jul 11:05 fury: restore.2005-07-25_11.03.07 Fatal error: job.c:1662
Comm error with SD. bad response to Bootstrap. ERR=No data available
I plan on filing a bug about the SD issue after I do some more testing
to try and isolate the problem. I think, whatever the corruption is, it
should probably be handled more gracefully by the SD (if my theory is
right).
Err.. anyway, please see below for my answers.
Martin Simmons wrote:
Theron> Hello,
Theron> I'm seeing some strange behavior with restores under 1.36.3/RHEL 3 using
Theron> an AIT-3 drive. I'm not quite sure what is causing it and I'd really
Theron> appreciate any suggestions.
Theron> When I choose restore option 5 (Select the most recent backup) bacula
Theron> proceeds to restore data from the last full and subsequent diff/incr
Theron> jobs. However, for large restores (>50 GB), I notice a few dozen error
Theron> messages like:
Theron> Error: attribs.c:339 File size of restored file /foo/bar not
correct.
Theron> Original [file size], restored [large, bogus file size].
Theron> Comparing the restores against the live data, I see that the restored
Theron> files have lots of random garbage inserted/appended to them.
Theron> However, when I manually find the jobIDs of the full/diffs/incrs and
Theron> restore them individually with restore option 3, there is no corruption
Theron> and the files all seem fine.
Does "individually" mean one at a time, i.e. repeated use of option 3? If so,
do you get corruption if you enter all the jobIDs into a single option 3 in
the same order as bacula chose from option 5?
Yes, individually means one at a time with repeated use of option 3. If
I enter the same JobID's from option 5 into option 3, I see exactly the
same corruption on the same files as when I use option 5.
Theron> Most of the corrupt files are older than the last full; Perhaps there's
Theron> something in the diff/incr jobs that corrupts the files from the full
Theron> job. However, most of the corrupt files are older than the last full and
Theron> so are not even present in the diff/incr jobs.
Theron> Has anyone seen behavior like this or have any ideas about where to
look?
For a particular restore, is it always the same files that are corrupted? If
yes, is the garbage really random or is it the same garbage each time? Also,
what happens if you use option 5 but only mark one of the corrupted files for
restore?
If I perform two identical restores, I see the same files corrupted with
the same garbage. The md5sums of the corresponding files from each
restore are a match so the garbage isn't random or at least its
dependent on something else.
If I mark just one of the corrupted files for restore, there is no
corruption in the file.
Thanks for your help, I hope I'm on the right track.
--
Theron Toomey, System Administrator
NSEES-IT (919-613-8148)
-------------------------------------------------------
SF.Net email is sponsored by: Discover Easy Linux Migration Strategies
from IBM. Find simple to follow Roadmaps, straightforward articles,
informative Webcasts and more! Get everything you need to get up to
speed, fast. http://ads.osdn.com/?ad_id=7477&alloc_id=16492&op=click
_______________________________________________
Bacula-users mailing list
Bacula-users@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/bacula-users