In the message dated: Tue, 24 Jan 2012 19:09:15 GMT, The pithy ruminations from Martin Simmons on
Thanks for replying. <Re: [Bacula-users] critical error -- tape labels get corrupted, previous backu ps unreadable> were: => >>>>> On Mon, 23 Jan 2012 18:47:31 -0500, mark bergman said: => > => > I'm experiencing a critical problem where tape labels on volumes with data => > get corrupted, leaving all data on the tape inaccessible to bacula. => > => > I'm running bacula 5.2.2 built from source, under Linux (CentOS 5.7 => > x86_64). => > => > This problem has happened with approximately 15 tapes over approximately 6 => > months, mostly new LTO-4 media, but some LTO-3 media that's being reused. => > The problem is sporadic, appearing in approximately 1 out of 60 tapes => > per week. => > => > I do not think the issue is related to the physical media or the tape => > drives. One tape was last written successfully when in drive 0, then appears => > corrupt when a later job tries to use is in drive 1. Another tape was last => > written successfully when in drive 1, then appears corrupt when a later job => > tries to use it in drive 0. => => Why do think it isn't a hardware problem? => I don't think it's a hardware problem because: the vast majority of tape access (read or write) doesn't result in corrupted labels there aren't SCSI, tape, or bacula errors reported during backups (within Bacula, the OS, or the tape library console) the tapes are readable--though the data is not usable by bacula the problem occurs on tapes that have been written and read in both drives (this doesn't rule out some common element in the tape library) => Bacula only looks at the label when a volume is mounted, so it could be => written unsuccessfully but you wouldn't know that until later. Interesting... thanks for bringing this up...I'm checking the logs (which only go back to early Dec) to see if any of the corrupted tapes got unloaded, reloaded and then written to successfully.... [pause] Some of the tapes that are not corrupt have gone through multiple load/unload cycles for different jobs....so the act of reloading a tape in order to append new jobs does not always cause corruption in the label. The initial label (via "label barcodes", assigning the tape to the Scratch pool) must be valid, or bacula would detect the corruption when it later loads the tape and uses is for its first job. Does bacula actually relabel a tape with a working pool (Full, Incremental, or Archive), or will it continue to have the pool name (Scratch) that was assigned during "label barcodes"? I'm also going to load all the other tapes in the changer and check their labels with btape and dump. [pause] Done. I loaded, dumped the first ~640MB from the tape, and used 'btape' to read the label from each tape in the changer (35). There were no I/O errors, all tapes were 'readable', though 4 have corrupt labels. => => => > Here are the log records for a particular volume. It was labeled about => > Dec 22, 2011. First used on Jan 4 2012. Used successfully for 10 jobs => > (350.49GB), then the label was corrupted. => > => > ------------------------------ => > 04-Jan 06:24 sbia-infr-vbacula JobId 42676: Using Volume "004090" from 'Scratch' pool. => > 04-Jan 06:25 sbia-infr-vbacula JobId 42676: Wrote label to prelabeled Volume "004090" on devic What does "Wrote label to prelabeled Volume" mean, exactly? Should the label be changed from when it was first written with "label barcodes"? The labels on new tapes that have been used but are not corrupted show a pool of "Scratch". However, the database shows that those tapes have been used for jobs in the working pools (Full, Incremental, Archive). For example, the database shows that volume 004056 is 'full', has ~1513GB of data from 4 backups, and is in the "Full" pool, but the label shows the pool as "Scratch". It appears that: label barcodes successfully labels a tape and puts it into the Scratch pool when the tape is first used, bacula logs the message "Wrote label to prelabed Volume", but the label is not updated ('btape readlabel' reports Scratch), and jobs can be written to the tape at some time the label on some tapes gets corrupted when the tape is reloaded, bacula detects the corruption => e "ml6000-drv1" (/dev/tape1-ml6000) => > 04-Jan 06:25 sbia-infr-vbacula JobId 42676: New volume "004090" mounted on device "ml6000-drv1 => " (/dev/tape1-ml6000) at 04-Jan-2012 06:25. => => Is /dev/tape1-ml6000 a non-rewinding device (like /dev/nst0)? Yes, so successive writes to the same tape without unloading should append. The tape hardware consists of 2x LTO-4 drives in a 40 slot Dell ML6000 (rebranded Adic) changer. => => => > At this point, the volume 004090 is unusable. Running 'btape' on that volume reports => > ---------------------------- => > [root@sbia-infr1 working]# ../bin/btape -v ml6000-drv0 => > Tape block granularity is 1024 bytes. => > btape: butil.c:290 Using device: "ml6000-drv0" for writing. => > 23-Jan 18:14 btape JobId 0: 3301 Issuing autochanger "loaded? drive 0" => > command. => > 23-Jan 18:14 btape JobId 0: 3302 Autochanger "loaded? drive 0", result is Slot => > 9. => > btape: btape.c:477 open device "ml6000-drv0" (/dev/tape0-ml6000): OK => > *readlabel => > btape: btape.c:526 Volume has no label. => > => > Volume Label: => > Id : **error**VerNo : 0 => > VolName : => > PrevVolName : => > VolFile : 0 => > LabelType : Unknown 0 => > LabelSize : 0 => > PoolName : => > MediaType : => > PoolType : => > HostName : => > Date label written: -4712-01-01 at 00:00 => > ---------------------------- => > => > => > => > => > However, there _is_ data on the tape. I'm able to read the tape via dd => > (ibs=64k). The ASCII data at the beginning of the tape shows fragments of the => > Bacula label and data that corresponds to some of the backups: => => The output of => => od -tx1 /tmp/vol4090.header | head -n 40 Sure. 0000000 00 6a 49 7b 00 00 01 88 00 00 00 00 42 42 30 32 0000020 00 00 00 2b 4f 0e 1c 72 ff ff ff fc ff ff 58 6b 0000040 00 00 00 9a 42 61 63 75 6c 61 20 31 2e 30 20 69 0000060 6d 6d 6f 72 74 61 6c 0a 00 00 00 00 0b 00 00 a7 0000100 95 00 04 b6 67 65 24 6f 0e 00 00 00 00 00 00 00 0000120 00 41 72 63 68 69 76 65 00 42 61 63 6b 75 70 00 0000140 61 72 63 68 69 76 65 00 73 62 69 61 2d 69 6e 66 0000160 72 2d 76 6e 66 73 32 00 61 72 63 68 69 76 65 2e 0000200 32 30 31 32 2d 30 31 2d 31 33 5f 30 36 2e 34 35 0000220 2e 30 30 5f 30 31 00 61 72 63 68 69 76 65 00 00 0000240 00 00 42 00 00 00 49 74 6e 49 70 52 68 4e 35 4a 0000260 56 46 58 68 77 2f 2b 30 2f 73 41 73 44 00 ff ff 0000300 ff fb ff ff 58 6b 00 00 00 be 42 61 63 75 6c 61 0000320 20 31 2e 30 20 69 6d 6d 6f 72 74 61 6c 0a 00 00 0000340 00 00 0b 00 00 a7 95 00 04 b6 67 65 26 18 80 00 0000360 00 00 00 00 00 00 00 41 72 63 68 69 76 65 00 42 0000400 61 63 6b 75 70 00 61 72 63 68 69 76 65 00 73 62 0000420 69 61 2d 69 6e 66 72 2d 76 6e 66 73 32 00 61 72 0000440 63 68 69 76 65 2e 32 30 31 32 2d 30 31 2d 31 33 0000460 5f 30 36 2e 34 35 2e 30 30 5f 30 31 00 61 72 63 0000500 68 69 76 65 00 00 00 00 42 00 00 00 49 74 6e 49 0000520 70 52 68 4e 35 4a 56 46 58 68 77 2f 2b 30 2f 73 0000540 41 73 44 00 00 00 00 00 00 00 00 00 00 00 00 00 0000560 00 00 00 00 00 00 00 00 00 00 00 54 00 00 00 53 0000600 00 00 00 00 00 00 00 54 00 00 00 00 00 00 00 00 0000620 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 * 0176000 => => might be useful, to see why Bacula rejects it. I hope that's more useful to you than me! :) I can provide dumps from headers of 3 other tapes with corrupted labels. Thanks, Mark => => __Martin ------------------------------------------------------------------------------ Keep Your Developer Skills Current with LearnDevNow! The most comprehensive online learning library for Microsoft developers is just $99.99! Visual Studio, SharePoint, SQL - plus HTML5, CSS3, MVC3, Metro Style Apps, more. Free future releases when you subscribe now! http://p.sf.net/sfu/learndevnow-d2d _______________________________________________ Bacula-users mailing list Bacula-users@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/bacula-users