On Thursday 04 August 2005 19:42, Theron Toomey wrote:
> Hello,
> I initially thought this problem was due to corruption in my database.
> However, the behavior seems to be caused by the SD, FD, and DIR sharing
> a working directory. When I assign the FD a different working-dir from
> the DIR/SD (e.g. WorkingDirectory = "/var/bacula/fd"), my restores work
> perfectly.

Yes, thanks for figuring this out.  The problem is that you did not as 
recommended, and as is the default, give your daemons unique names. I'll 
improve the documentation on this.

> I have opened a bug on this here:
> http://bugs.bacula.org/bug_view_advanced_page.php?bug_id=0000398
> GDB tracebacks, errors, conf files, and bconsole output demonstrating
> the problem:
> http://www.duke.edu/~ttoomey/misc/ttoomey-bacula-wd-dbg.20050803.tar.gz
> As a workaround, I have separated my FD working dir from the DIR/SD.
> Curiously, when I separate the DIR and SD working dirs (so each daemon
> has its own dir), my autochanger stops working. That's a different issue
> though, and one that I haven't had time to investigate.
> Thanks for all your help.
> Kern Sibbald wrote:
> > On Tuesday 26 July 2005 04:10, Theron Toomey wrote:
> >>Hi, thanks for the suggestions. Sorry it took me a few days to respond-
> >>there's not much time for testing between daily backup cycles.
> >>
> >>My current theory is that there is some strange corruption in my DB,
> >>perhaps in the File table.
> >>
> >>I'm not sure but I think this may be related to another problem I'm
> >>having. Restores (using option 3 or 5) of very large jobs (around 200
> >>GB) fail while writing the restore bootstrap. I suspect that while
> >>reading my catalog to generate the restore.bsr, bacula is encountering
> >>some corruption, which may also explain the strange garbage in my other
> >>restores.
> >
> > Could you send me the console and any output from this "fail" so I can
> > see what is going on.
> >
> >>This isn't necessarily pertinent but I have seen a couple interesting
> >>results with these large restores, varying from the SD segfaulting
> >>immediately to sitting in an infinite loop, eating about half the system
> >>memory, and then segfaulting (no, its not using the tls lib). Here's a
> >>gdb trace of the latter behavior if you are curious:
> >>http://www.duke.edu/~ttoomey/misc/bacula-sd-debug.3.txt.gz
> >
> > Could you send me the bootstrap file from this?  When doing the restore
> > and it reaches the question yes/mod/no, it will have printed the location
> > of the bootstrap file just prior to issuing the prompt. Before answering
> > the prompt, you can copy it to another location (after answering the
> > prompt, it usually deletes the file).
> >
> >>After running dbcheck, it did cough up an error while restoring before
> >>the SD died:
> >>25-Jul 11:05 fury: restore.2005-07-25_11.03.07 Fatal error: Bootstrap
> >>file error: expected an integer or a range, got T_EOL: =
> >>
> >>   : Line 5543394, col 10 of file
> >>
> >>/var/bacula/fury.restore.2005-07-25_11.03.07.bootstrap
> >>FileIndex=
> >>25-Jul 11:05 fury: restore.2005-07-25_11.03.07 Fatal error: job.c:1662
> >>Comm error with SD. bad response to Bootstrap. ERR=No data available
> >>
> >>I plan on filing a bug about the SD issue after I do some more testing
> >>to try and isolate the problem. I think, whatever the corruption is, it
> >>should probably be handled more gracefully by the SD (if my theory is
> >>right).
> >
> > Yes, please do open a bug report -- preferrably one for each problem that
> > you consider unrelated.  I am unable to adequately track and resolve
> > complicated problems such as this from emails.
> >
> >>Err.. anyway, please see below for my answers.
> >>
> >>Martin Simmons wrote:
> >>>  Theron> Hello,
> >>>  Theron> I'm seeing some strange behavior with restores under
> >>>1.36.3/RHEL 3 using Theron> an AIT-3 drive. I'm not quite sure what is
> >>>causing it and I'd really Theron> appreciate any suggestions.
> >>>
> >>>  Theron> When I choose restore option 5 (Select the most recent backup)
> >>>bacula Theron> proceeds to restore data from the last full and
> >>> subsequent diff/incr Theron> jobs. However, for large restores (>50
> >>> GB), I notice a few dozen error Theron> messages like:
> >>>  Theron>   Error: attribs.c:339 File size of restored file /foo/bar not
> >>>correct. Theron>   Original [file size], restored [large, bogus file
> >>>size].
> >>>
> >>>  Theron> Comparing the restores against the live data, I see that the
> >>>restored Theron> files have lots of random garbage inserted/appended to
> >>>them.
> >>>
> >>>  Theron> However, when I manually find the jobIDs of the
> >>>full/diffs/incrs and Theron> restore them individually with restore
> >>>option 3, there is no corruption Theron> and the files all seem fine.
> >>>
> >>>Does "individually" mean one at a time, i.e. repeated use of option 3?
> >>>If so, do you get corruption if you enter all the jobIDs into a single
> >>>option 3 in the same order as bacula chose from option 5?
> >>
> >>Yes, individually means one at a time with repeated use of option 3. If
> >>I enter the same JobID's from option 5 into option 3, I see exactly the
> >>same corruption on the same files as when I use option 5.
> >>
> >>>  Theron> Most of the corrupt files are older than the last full;
> >>> Perhaps there's Theron> something in the diff/incr jobs that corrupts
> >>> the files from the full Theron> job. However, most of the corrupt files
> >>> are older than the last full and Theron> so are not even present in the
> >>> diff/incr jobs.
> >>>
> >>>  Theron> Has anyone seen behavior like this or have any ideas about
> >>>where to look?
> >>>
> >>>For a particular restore, is it always the same files that are
> >>> corrupted? If yes, is the garbage really random or is it the same
> >>> garbage each time?  Also, what happens if you use option 5 but only
> >>> mark one of the corrupted files for restore?
> >>
> >>If I perform two identical restores, I see the same files corrupted with
> >>the same garbage. The md5sums of the corresponding files from each
> >>restore are a match so the garbage isn't random or at least its
> >>dependent on something else.
> >>
> >>If I mark just one of the corrupted files for restore, there is no
> >>corruption in the file.
> >
> > Can you send me the corresponding bootstrap files for these two cases so
> > that I can compare them.
> >
> > What database are you using, and can you give me an idea how big it is?
> >
> >>Thanks for your help, I hope I'm on the right track.
> >
> > Well, you are certainly doing the right things, but it is a bit early to
> > tell what the right track really is ...

Best regards,



SF.Net email is Sponsored by the Better Software Conference & EXPO
September 19-22, 2005 * San Francisco, CA * Development Lifecycle Practices
Agile & Plan-Driven Development * Managing Projects & Teams * Testing & QA
Security * Process Improvement & Measurement * http://www.sqe.com/bsce5sf
Bacula-users mailing list

Reply via email to