Although it will generate lots output, have you tried turning on debugging on the DIR and SD to see if anything shows up there?
On 2/6/2012 8:15 PM, mark.berg...@uphs.upenn.edu wrote: > In the message dated: Mon, 06 Feb 2012 12:43:41 GMT, > The pithy ruminations from Martin Simmons on > <Re: [Bacula-users] critical error -- tape labels get corrupted, previous > backu > ps unreadable> were: > > > Martin, > > Thanks again for continuing to respond...I appreciate the feedback and > troubleshooting help. > > > => >>>>> On Fri, 03 Feb 2012 20:04:44 -0500, mark bergman said: > => > > => > I've added more logging to /etc/init.d/bacula-sd to confirm when tapes > are > => > ejected and to timestamp the SCSI release commands. > => > > => > Is it possible that bacula flagged tapes 003231 and 000312 as being in > => > the drives because they were loaded when the server crashed, even > though > => > they were later ejected (outside of bacula's control)? Could this cause > => > bacula to believe that the tapes were at EOT when they do get loaded, > and > => > bacula then immediately begins writing (corrupting the label)? > [Unlikely > => > that bacula would try to write before reading the label, and would then > => > read the label after corrupting the tapes.] > => > => I don't see how this could happen. Bacula issues a rewind command when it > > I don't see how it could happen either....but I'm searching for any > explanation. > > => mounts a tape and should then know that the tape is at the start. > > That's what I'd expect too. > > > => > => > => > When the current backup is finished, I'll extract the beginning data > => > on each of 003231 and 000312. Is there anything you recommend in terms > => > of checking the data on tape to determine whether the tape begins with > => > random garbage (possibly caused by the shutdown, startup, scsi reset, > => > etc.) or if it begins with valid bacula data that happened to overwrite > => > the label instead of being appended? > => > => Do you have a File device defined in the SD? If so, label a new File > volume > > No. > > => and then append the data from the start of the tape to the end of the file > => volume using dd and cat. You can then examine the file volume using bls > -v -j > => (the File label will allow bls to read it). > > > Can I do this against a tape directly? > > => > => > => > Does anyone have suggestions of how to troubleshoot this further, > => > or how to make the daemon startup process more resistant to causing > => > any corruption? > => > => The important information missing is whether 000312 was already corrupted > at > => 01-Feb 20:11. You could add some commands to the startup part of > > > Hmmm....The only way that I could imagine that happening is if: > > bacula loads the tape as needed > > bacula reads the volume label > > {somehow the tape is rewound, either when the tape is first loaded, or > after some backups are written} > > bacula writes to tape > > The only thing outside of bacula that touches the tape drive in any way is the > /etc/init.d/bacula-sd script, which unloads any tapes before starting the > daemon& after shutting down the daemon. > > => /etc/init.d/bacula-sd script before it unloads all tapes. E.g. do mt > status, > => mt rewind and grab a copy of the first few blocks on any loaded tapes. > > Sure. I'm thinking that I may modify /opt/bacula/scripts/mtx-changer to > replace the "unload" operation with: > > mt rewind > dd if=$TAPE of=/opt/bacula/working/dump_$VOLUMEID.`date '+%Y-%m-%d_%T'` > ibs=64k count=1024 > mtx -f $ctl load $slot $drive > > Is that a suitable number of blocks to dump? I've got the dumps from 5 > corrupted tapes, and I'm trying to see if they have anything in common (for > example, maybe the first 128k is corrupted, followed by valid data from dumps > that should have been appended to the tape). > > => > => Also, you say that infrastructure1 server crashes. Maybe the crash > caused the > => tape to be rewound and some buffer flushed to start of the tape? > > I can't see how... > > if there was unwritten data in a buffer within the memory of the > server infrastructure1, then when the server crashes it wouldn't > get written to tape. The 'infrastucture' machines are part of > an HA cluster...in this crash, the other nodes determined that > infrastructure1 had lost communication with the quorum disk, > and they powered off the node...even if that action reset the > fibre loop and caused the tape library to rewind both tapes > (unlikely), I don't know how any buffers on the infrastructure1 > server could be written when the power was out. > > if there was unwritten data in a buffer within the memory of > the tape library, then I believe it must be written before any > rewind command will be honored. If infrastructure1 sends > data to the tape drive, that data is buffered, infrastructure1 then > crashes, infrastructure2 runs /etc/init.d/bacula-sd (which ejects tapes, > thereby rewinding them)...the data within the buffer in the tape > drive would still be written before the rewind/eject command was > executed. > > Thanks again for your help, > > Mark > > => > => __Martin > => > > ------------------------------------------------------------------------------ > Keep Your Developer Skills Current with LearnDevNow! > The most comprehensive online learning library for Microsoft developers > is just $99.99! Visual Studio, SharePoint, SQL - plus HTML5, CSS3, MVC3, > Metro Style Apps, more. Free future releases when you subscribe now! > http://p.sf.net/sfu/learndevnow-d2d > _______________________________________________ > Bacula-users mailing list > Bacula-users@lists.sourceforge.net > https://lists.sourceforge.net/lists/listinfo/bacula-users ------------------------------------------------------------------------------ Keep Your Developer Skills Current with LearnDevNow! The most comprehensive online learning library for Microsoft developers is just $99.99! Visual Studio, SharePoint, SQL - plus HTML5, CSS3, MVC3, Metro Style Apps, more. Free future releases when you subscribe now! http://p.sf.net/sfu/learndevnow-d2d _______________________________________________ Bacula-users mailing list Bacula-users@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/bacula-users