On Tuesday 26 July 2005 04:10, Theron Toomey wrote: > Hi, thanks for the suggestions. Sorry it took me a few days to respond- > there's not much time for testing between daily backup cycles. > > My current theory is that there is some strange corruption in my DB, > perhaps in the File table. > > I'm not sure but I think this may be related to another problem I'm > having. Restores (using option 3 or 5) of very large jobs (around 200 > GB) fail while writing the restore bootstrap. I suspect that while > reading my catalog to generate the restore.bsr, bacula is encountering > some corruption, which may also explain the strange garbage in my other > restores.
Could you send me the console and any output from this "fail" so I can see what is going on. > > This isn't necessarily pertinent but I have seen a couple interesting > results with these large restores, varying from the SD segfaulting > immediately to sitting in an infinite loop, eating about half the system > memory, and then segfaulting (no, its not using the tls lib). Here's a > gdb trace of the latter behavior if you are curious: > http://www.duke.edu/~ttoomey/misc/bacula-sd-debug.3.txt.gz Could you send me the bootstrap file from this? When doing the restore and it reaches the question yes/mod/no, it will have printed the location of the bootstrap file just prior to issuing the prompt. Before answering the prompt, you can copy it to another location (after answering the prompt, it usually deletes the file). > > After running dbcheck, it did cough up an error while restoring before > the SD died: > 25-Jul 11:05 fury: restore.2005-07-25_11.03.07 Fatal error: Bootstrap > file error: expected an integer or a range, got T_EOL: = > > : Line 5543394, col 10 of file > > /var/bacula/fury.restore.2005-07-25_11.03.07.bootstrap > FileIndex= > 25-Jul 11:05 fury: restore.2005-07-25_11.03.07 Fatal error: job.c:1662 > Comm error with SD. bad response to Bootstrap. ERR=No data available > > I plan on filing a bug about the SD issue after I do some more testing > to try and isolate the problem. I think, whatever the corruption is, it > should probably be handled more gracefully by the SD (if my theory is > right). Yes, please do open a bug report -- preferrably one for each problem that you consider unrelated. I am unable to adequately track and resolve complicated problems such as this from emails. > > Err.. anyway, please see below for my answers. > > Martin Simmons wrote: > > Theron> Hello, > > Theron> I'm seeing some strange behavior with restores under > > 1.36.3/RHEL 3 using Theron> an AIT-3 drive. I'm not quite sure what is > > causing it and I'd really Theron> appreciate any suggestions. > > > > Theron> When I choose restore option 5 (Select the most recent backup) > > bacula Theron> proceeds to restore data from the last full and subsequent > > diff/incr Theron> jobs. However, for large restores (>50 GB), I notice a > > few dozen error Theron> messages like: > > Theron> Error: attribs.c:339 File size of restored file /foo/bar not > > correct. Theron> Original [file size], restored [large, bogus file > > size]. > > > > Theron> Comparing the restores against the live data, I see that the > > restored Theron> files have lots of random garbage inserted/appended to > > them. > > > > Theron> However, when I manually find the jobIDs of the > > full/diffs/incrs and Theron> restore them individually with restore > > option 3, there is no corruption Theron> and the files all seem fine. > > > > Does "individually" mean one at a time, i.e. repeated use of option 3? > > If so, do you get corruption if you enter all the jobIDs into a single > > option 3 in the same order as bacula chose from option 5? > > Yes, individually means one at a time with repeated use of option 3. If > I enter the same JobID's from option 5 into option 3, I see exactly the > same corruption on the same files as when I use option 5. > > > Theron> Most of the corrupt files are older than the last full; Perhaps > > there's Theron> something in the diff/incr jobs that corrupts the files > > from the full Theron> job. However, most of the corrupt files are older > > than the last full and Theron> so are not even present in the diff/incr > > jobs. > > > > Theron> Has anyone seen behavior like this or have any ideas about > > where to look? > > > > For a particular restore, is it always the same files that are corrupted? > > If yes, is the garbage really random or is it the same garbage each > > time? Also, what happens if you use option 5 but only mark one of the > > corrupted files for restore? > > If I perform two identical restores, I see the same files corrupted with > the same garbage. The md5sums of the corresponding files from each > restore are a match so the garbage isn't random or at least its > dependent on something else. > > If I mark just one of the corrupted files for restore, there is no > corruption in the file. Can you send me the corresponding bootstrap files for these two cases so that I can compare them. What database are you using, and can you give me an idea how big it is? > > Thanks for your help, I hope I'm on the right track. Well, you are certainly doing the right things, but it is a bit early to tell what the right track really is ... -- Best regards, Kern ("> /\ V_V ------------------------------------------------------- SF.Net email is sponsored by: Discover Easy Linux Migration Strategies from IBM. Find simple to follow Roadmaps, straightforward articles, informative Webcasts and more! Get everything you need to get up to speed, fast. http://ads.osdn.com/?ad_id=7477&alloc_id=16492&op=click _______________________________________________ Bacula-users mailing list Bacula-users@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/bacula-users