On Thursday 04 August 2005 19:42, Theron Toomey wrote: > Hello, > I initially thought this problem was due to corruption in my database. > However, the behavior seems to be caused by the SD, FD, and DIR sharing > a working directory. When I assign the FD a different working-dir from > the DIR/SD (e.g. WorkingDirectory = "/var/bacula/fd"), my restores work > perfectly.
Yes, thanks for figuring this out. The problem is that you did not as recommended, and as is the default, give your daemons unique names. I'll improve the documentation on this. > > I have opened a bug on this here: > http://bugs.bacula.org/bug_view_advanced_page.php?bug_id=0000398 > > GDB tracebacks, errors, conf files, and bconsole output demonstrating > the problem: > http://www.duke.edu/~ttoomey/misc/ttoomey-bacula-wd-dbg.20050803.tar.gz > > As a workaround, I have separated my FD working dir from the DIR/SD. > Curiously, when I separate the DIR and SD working dirs (so each daemon > has its own dir), my autochanger stops working. That's a different issue > though, and one that I haven't had time to investigate. > > Thanks for all your help. > > Kern Sibbald wrote: > > On Tuesday 26 July 2005 04:10, Theron Toomey wrote: > >>Hi, thanks for the suggestions. Sorry it took me a few days to respond- > >>there's not much time for testing between daily backup cycles. > >> > >>My current theory is that there is some strange corruption in my DB, > >>perhaps in the File table. > >> > >>I'm not sure but I think this may be related to another problem I'm > >>having. Restores (using option 3 or 5) of very large jobs (around 200 > >>GB) fail while writing the restore bootstrap. I suspect that while > >>reading my catalog to generate the restore.bsr, bacula is encountering > >>some corruption, which may also explain the strange garbage in my other > >>restores. > > > > Could you send me the console and any output from this "fail" so I can > > see what is going on. > > > >>This isn't necessarily pertinent but I have seen a couple interesting > >>results with these large restores, varying from the SD segfaulting > >>immediately to sitting in an infinite loop, eating about half the system > >>memory, and then segfaulting (no, its not using the tls lib). Here's a > >>gdb trace of the latter behavior if you are curious: > >>http://www.duke.edu/~ttoomey/misc/bacula-sd-debug.3.txt.gz > > > > Could you send me the bootstrap file from this? When doing the restore > > and it reaches the question yes/mod/no, it will have printed the location > > of the bootstrap file just prior to issuing the prompt. Before answering > > the prompt, you can copy it to another location (after answering the > > prompt, it usually deletes the file). > > > >>After running dbcheck, it did cough up an error while restoring before > >>the SD died: > >>25-Jul 11:05 fury: restore.2005-07-25_11.03.07 Fatal error: Bootstrap > >>file error: expected an integer or a range, got T_EOL: = > >> > >> : Line 5543394, col 10 of file > >> > >>/var/bacula/fury.restore.2005-07-25_11.03.07.bootstrap > >>FileIndex= > >>25-Jul 11:05 fury: restore.2005-07-25_11.03.07 Fatal error: job.c:1662 > >>Comm error with SD. bad response to Bootstrap. ERR=No data available > >> > >>I plan on filing a bug about the SD issue after I do some more testing > >>to try and isolate the problem. I think, whatever the corruption is, it > >>should probably be handled more gracefully by the SD (if my theory is > >>right). > > > > Yes, please do open a bug report -- preferrably one for each problem that > > you consider unrelated. I am unable to adequately track and resolve > > complicated problems such as this from emails. > > > >>Err.. anyway, please see below for my answers. > >> > >>Martin Simmons wrote: > >>> Theron> Hello, > >>> Theron> I'm seeing some strange behavior with restores under > >>>1.36.3/RHEL 3 using Theron> an AIT-3 drive. I'm not quite sure what is > >>>causing it and I'd really Theron> appreciate any suggestions. > >>> > >>> Theron> When I choose restore option 5 (Select the most recent backup) > >>>bacula Theron> proceeds to restore data from the last full and > >>> subsequent diff/incr Theron> jobs. However, for large restores (>50 > >>> GB), I notice a few dozen error Theron> messages like: > >>> Theron> Error: attribs.c:339 File size of restored file /foo/bar not > >>>correct. Theron> Original [file size], restored [large, bogus file > >>>size]. > >>> > >>> Theron> Comparing the restores against the live data, I see that the > >>>restored Theron> files have lots of random garbage inserted/appended to > >>>them. > >>> > >>> Theron> However, when I manually find the jobIDs of the > >>>full/diffs/incrs and Theron> restore them individually with restore > >>>option 3, there is no corruption Theron> and the files all seem fine. > >>> > >>>Does "individually" mean one at a time, i.e. repeated use of option 3? > >>>If so, do you get corruption if you enter all the jobIDs into a single > >>>option 3 in the same order as bacula chose from option 5? > >> > >>Yes, individually means one at a time with repeated use of option 3. If > >>I enter the same JobID's from option 5 into option 3, I see exactly the > >>same corruption on the same files as when I use option 5. > >> > >>> Theron> Most of the corrupt files are older than the last full; > >>> Perhaps there's Theron> something in the diff/incr jobs that corrupts > >>> the files from the full Theron> job. However, most of the corrupt files > >>> are older than the last full and Theron> so are not even present in the > >>> diff/incr jobs. > >>> > >>> Theron> Has anyone seen behavior like this or have any ideas about > >>>where to look? > >>> > >>>For a particular restore, is it always the same files that are > >>> corrupted? If yes, is the garbage really random or is it the same > >>> garbage each time? Also, what happens if you use option 5 but only > >>> mark one of the corrupted files for restore? > >> > >>If I perform two identical restores, I see the same files corrupted with > >>the same garbage. The md5sums of the corresponding files from each > >>restore are a match so the garbage isn't random or at least its > >>dependent on something else. > >> > >>If I mark just one of the corrupted files for restore, there is no > >>corruption in the file. > > > > Can you send me the corresponding bootstrap files for these two cases so > > that I can compare them. > > > > What database are you using, and can you give me an idea how big it is? > > > >>Thanks for your help, I hope I'm on the right track. > > > > Well, you are certainly doing the right things, but it is a bit early to > > tell what the right track really is ... -- Best regards, Kern ("> /\ V_V ------------------------------------------------------- SF.Net email is Sponsored by the Better Software Conference & EXPO September 19-22, 2005 * San Francisco, CA * Development Lifecycle Practices Agile & Plan-Driven Development * Managing Projects & Teams * Testing & QA Security * Process Improvement & Measurement * http://www.sqe.com/bsce5sf _______________________________________________ Bacula-users mailing list Bacula-users@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/bacula-users