On Mon, 8 Dec 2008 06:48:52 -0800 Kern Sibbald <k...@sibbald.com> wrote
> This is a support issue and is either a: > 1. Configuration error (most likely) > or > 2. An OS (driver) bug (less likely) > > After noticing a curious warning message from a catalog backup, I have > > been doing a little investigation and I now think that, with certain > > types of drive, a backup which spans more than one tape can corrupt the > > data being written at the point where the tape change takes place. > > My catalog backup produced this output: > > > > 23-Nov 22:03 gershwin-dir JobId 506: Using Device "DDS3-0" > > 23-Nov 22:06 gershwin-sd JobId 506: End of Volume "MainCatalog-004" at > > 57:4963 on device "DDS3-0" (/dev/rmt/1cbn). Write of 64512 bytes got 0. > > 23-Nov 22:06 gershwin-sd JobId 506: Error: Re-read of last block OK, but > > block numbers differ. Last block=4963 Current block=4963. 23-Nov 22:06 Correct .... it is a configuration error. Sorry to revive such an old thread, but it was *very* difficult to figure this one out (and it's also an internal drive on an important machine, so I wasn't able to experiment very often). I'm sending this to the developers' list because developers ared more likely to need this type of low-level detail, e.g. when porting, or when arguing with device driver maintainers :-) I'll post a simplified version to bacula-users. What this comes down to, is the way in which the tape drive handles the write which is in progress when the tape reaches the logical EOM (EW EOM, which I assume means "Early Warning EOM"). The data is successfully written, of course, and the drive will also return a sense code indicating EOM. The important thing here, though, appears to be the value of the "valid" bit in the sense data. If the valid bit is on, that indicates to the device driver that all of the data has been written to tape. If off, it indicates the data has not been written successfully. Presumably this is intended to keep all of the data within EW EOM (perhaps some drives can't read past this point), and I can only assume that the application must be expected to reposition the tape before it writes the EOD markers. I eventually found this (after tearing my hair out over what the options were): http://www.impediment.com/hp/hp_2.ps (HP DDS Configuration Guide). The relevant option is this: No EW EOM Residue: True - If CHECK CONDITION is reported for EW EOM, the Sense data will not have the Valid bit set. False - The Valid bit will be set in the Sense data if CHECK CONDITION is reported for EW EOM. This option is set to True in the drive's default configuration. If I run btape with this drive in its default config, and use the "fill" command, and select the "s" option, I get the above error, and using truss to trace the system calls, I see the following: 4563: write(3, "84F6EF w\0\0FC\0\002 SF8".., 64512) = 64512 4563: write(3, "8CB4 >CC\0\0FC\0\002 SF9".., 64512) = 64512 4563: write(3, "AD05 q q\0\0FC\0\002 SFA".., 64512) = 64512 4563: write(3, "E5B3FE j\0\0FC\0\002 SFB".., 64512) = 0 4563: time() = 1268190308 4563: write(1, 0xFE975B54, 13) = 13 4563: 1 0 - M a r 0 3 : 0 5 4563: write(1, 0xFE975B54, 116) = 116 4563: b t a p e J o b I d 0 : E n d o f V o l u m e " T e 4563: s t V o l u m e 1 " a t 1 4 : 9 0 7 1 o n d e v i c e 4563: " d d s 3 " ( / d e v / r m t / 1 c b n ) . W r i t e o f 4563: 6 4 5 1 2 b y t e s g o t 0 .\n 4563: ioctl(3, (('m'<<8)|1), 0x08046EF8) = 0 4563: ioctl(3, (('m'<<8)|1), 0x08046ED8) = 0 4563: ioctl(3, (('m'<<8)|1), 0x08046ED8) = 0 4563: brk(0x08107538) = 0 4563: brk(0x08117538) = 0 4563: read(3, "E5B3FE j\0\0FC\0\002 SFB".., 64512) = 64512 4563: time() = 1268190310 4563: write(1, 0xFE975B54, 13) = 13 4563: 1 0 - M a r 0 3 : 0 5 4563: write(1, 0xFE975B54, 111) = 111 4563: b t a p e J o b I d 0 : E r r o r : R e - r e a d o f 4563: l a s t b l o c k O K , b u t b l o c k n u m b e r 4563: s d i f f e r . R e a d b l o c k = 1 5 2 5 7 1 W a n t 4563: b l o c k = 1 5 2 5 7 0 .\n 4563: write(1, 0xFE975B54, 67) = 67 4563: b t a p e : b t a p e . c : 2 3 6 0 L a s t b l o c k a 4563: t : 1 4 : 9 0 7 0 t h i s _ d e v _ b l o c k _ n u m = 9 0 4563: 7 1\n The last write (E5B3FE) returned 0, indicating that the write was not successful. Of course, when btape re-reads the last block, it gets this same E5B3FE block, which it doesn't expect. This is because the drive had returned sense with the valid bit set to false. In this case the driver told the application the write had not succeeded, so btape expected the preceding block (AD05) to be the last one on the tape, hence the error. If I reconfigure the drive so that "No EW EOM Residue" is False (no mean feat; see below!), it behaves the way I would expect: 3470: write(3, "DCA8 7 X\0\0FC\0\002 e \".., 64512) = 64512 3470: write(3, " $E7 S9D\0\0FC\0\002 e ]".., 64512) = 64512 3470: write(3, "D7 U e8B\0\0FC\0\002 e ^".., 64512) = 64512 3470: write(3, " v . I 7\0\0FC\0\002 e _".., 64512) = 0 3470: time() = 1268164735 3470: write(1, 0xFE975B54, 13) = 13 3470: 0 9 - M a r 1 9 : 5 8 3470: write(1, 0xFE975B54, 117) = 117 3470: b t a p e J o b I d 0 : E n d o f V o l u m e " T e 3470: s t V o l u m e 1 " a t 1 4 : 1 3 5 2 3 o n d e v i c e 3470: " d d s 3 " ( / d e v / r m t / 1 c b n ) . W r i t e o 3470: f 6 4 5 1 2 b y t e s g o t 0 .\n 3470: ioctl(3, (('m'<<8)|1), 0x08046EF8) = 0 3470: ioctl(3, (('m'<<8)|1), 0x08046ED8) = 0 3470: ioctl(3, (('m'<<8)|1), 0x08046ED8) = 0 3470: brk(0x08107538) = 0 3470: brk(0x08117538) = 0 3470: read(3, "D7 U e8B\0\0FC\0\002 e ^".., 64512) = 64512 3470: time() = 1268164736 3470: write(1, 0xFE975B54, 13) = 13 3470: 0 9 - M a r 1 9 : 5 8 3470: write(1, 0xFE975B54, 48) = 48 3470: b t a p e J o b I d 0 : R e - r e a d o f l a s t b 3470: l o c k s u c c e e d e d .\n 3470: write(1, 0xFE975B54, 69) = 69 3470: b t a p e : b t a p e . c : 2 3 6 0 L a s t b l o c k a 3470: t : 1 4 : 1 3 5 2 2 t h i s _ d e v _ b l o c k _ n u m = 1 3470: 3 5 2 3\n In this case, the "D7 U e8B" block is the last one written to the tape. I believe what is happening is this: the "D7 U e8B" block corresponds to the E5B3FE block in the first case, i.e. it was written to tape, the drive returned EOM sense, *but* this time the sense data had the valid bit set. Now the driver behaves as expected by returning success to the application, and when btape attempts to write the next block ( v . I 7), the driver returns 0 immediately (without attempting to write to the drive). This time, the last block on the tape is the one btape expected to be there ... and with that, I have the drive working properly (at last!) Unfortunately, these drives are not easy to configure. There is no one-to-one correspondence between the switches and the options. You need to look in the configuration table for the set of options that most closely matches the ones you want and then set the switches according to that line of the table (yuk!) The default (wrong) settings for configuration switches 3-8 on this drive are: 0 1 1 1 1 1. For Solaris and presumably Linux, I used the settings which the manual recommends for Sun workstations: 1 1 1 0 0 1. Among other things, this gives the drive an "other" personality, rather than the default "HP" personality. (Phew!) Operating System: Solaris 10 5/09 (update 7) Bacula: 3.0.1 Drive: HP C1557A & HP C5648A DDS-3 Autoloader (6 x 24GB) Allan ------------------------------------------------------------------------------ Download Intel® Parallel Studio Eval Try the new software tools for yourself. Speed compiling, find bugs proactively, and fine-tune applications for parallel performance. See why Intel Parallel Studio got high marks during beta. http://p.sf.net/sfu/intel-sw-dev _______________________________________________ Bacula-devel mailing list Bacula-devel@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/bacula-devel