On Mon, 8 Dec 2008 06:48:52 -0800 Kern Sibbald <k...@sibbald.com> wrote

> This is a support issue and is either a:
> 1. Configuration error (most likely)
> or
> 2. An OS (driver) bug (less likely)
> > After noticing a curious warning message from a catalog backup, I have
> > been doing a little investigation and I now think that, with certain
> > types of drive, a backup which spans more than one tape can corrupt the
> > data being written at the point where the tape change takes place.

> > My catalog backup produced this output:
> >
> > 23-Nov 22:03 gershwin-dir JobId 506: Using Device "DDS3-0"
> > 23-Nov 22:06 gershwin-sd JobId 506: End of Volume "MainCatalog-004" at
> > 57:4963 on device "DDS3-0" (/dev/rmt/1cbn). Write of 64512 bytes got 0.
> > 23-Nov 22:06 gershwin-sd JobId 506: Error: Re-read of last block OK, but
> > block numbers differ. Last block=4963 Current block=4963. 23-Nov 22:06

Correct .... it is a configuration error. Sorry to revive such an old thread,
but it was *very* difficult to figure this one out (and it's also an internal
drive on an important machine, so I wasn't able to experiment very often).

I'm sending this to the developers' list because developers ared more likely
to need this type of low-level detail, e.g. when porting, or when arguing with
device driver maintainers :-) I'll post a simplified version to bacula-users.

What this comes down to, is the way in which the tape drive handles the write
which is in progress when the tape reaches the logical EOM (EW EOM, which I
assume means "Early Warning EOM"). The data is successfully written, of course,
and the drive will also return a sense code indicating EOM.

The important thing here, though, appears to be the value of the "valid" bit
in the sense data. If the valid bit is on, that indicates to the device driver
that all of the data has been written to tape. If off, it indicates the data
has not been written successfully. Presumably this is intended to keep all of
the data within EW EOM (perhaps some drives can't read past this point), and
I can only assume that the application must be expected to reposition the tape
before it writes the EOD markers.

I eventually found this (after tearing my hair out over what the options were):
http://www.impediment.com/hp/hp_2.ps (HP DDS Configuration Guide). The
relevant option is this:

No EW EOM Residue:
        True  - If CHECK CONDITION is reported for EW EOM, the Sense data
                will not have the Valid bit set.
        False - The Valid bit will be set in the Sense data if CHECK CONDITION
                is reported for EW EOM.

This option is set to True in the drive's default configuration. If I run btape
with this drive in its default config, and use the "fill" command, and select
the "s" option, I get the above error, and using truss to trace the system
calls, I see the following:

4563:   write(3, "84F6EF w\0\0FC\0\002 SF8".., 64512)   = 64512
4563:   write(3, "8CB4 >CC\0\0FC\0\002 SF9".., 64512)   = 64512
4563:   write(3, "AD05 q q\0\0FC\0\002 SFA".., 64512)   = 64512
4563:   write(3, "E5B3FE j\0\0FC\0\002 SFB".., 64512)   = 0
4563:   time()                                          = 1268190308
4563:   write(1, 0xFE975B54, 13)                        = 13
4563:      1 0 - M a r   0 3 : 0 5
4563:   write(1, 0xFE975B54, 116)                       = 116
4563:      b t a p e   J o b I d   0 :   E n d   o f   V o l u m e   " T e
4563:      s t V o l u m e 1 "   a t   1 4 : 9 0 7 1   o n   d e v i c e
4563:      " d d s 3 "   ( / d e v / r m t / 1 c b n ) .   W r i t e   o f
4563:        6 4 5 1 2   b y t e s   g o t   0 .\n
4563:   ioctl(3, (('m'<<8)|1), 0x08046EF8)              = 0
4563:   ioctl(3, (('m'<<8)|1), 0x08046ED8)              = 0
4563:   ioctl(3, (('m'<<8)|1), 0x08046ED8)              = 0
4563:   brk(0x08107538)                                 = 0
4563:   brk(0x08117538)                                 = 0
4563:   read(3, "E5B3FE j\0\0FC\0\002 SFB".., 64512)    = 64512
4563:   time()                                          = 1268190310
4563:   write(1, 0xFE975B54, 13)                        = 13
4563:      1 0 - M a r   0 3 : 0 5
4563:   write(1, 0xFE975B54, 111)                       = 111
4563:      b t a p e   J o b I d   0 :   E r r o r :   R e - r e a d   o f
4563:        l a s t   b l o c k   O K ,   b u t   b l o c k   n u m b e r
4563:      s   d i f f e r .   R e a d   b l o c k = 1 5 2 5 7 1   W a n t
4563:        b l o c k = 1 5 2 5 7 0 .\n
4563:   write(1, 0xFE975B54, 67)                        = 67
4563:      b t a p e :   b t a p e . c : 2 3 6 0   L a s t   b l o c k   a
4563:      t :   1 4 : 9 0 7 0   t h i s _ d e v _ b l o c k _ n u m = 9 0
4563:      7 1\n

The last write (E5B3FE) returned 0, indicating that the write was not
successful. Of course, when btape re-reads the last block, it gets this same
E5B3FE block, which it doesn't expect. This is because the drive had returned
sense with the valid bit set to false. In this case the driver told the
application the write had not succeeded, so btape expected the preceding block
(AD05) to be the last one on the tape, hence the error.

If I reconfigure the drive so that "No EW EOM Residue" is False (no mean feat;
see below!), it behaves the way I would expect:

3470:   write(3, "DCA8 7 X\0\0FC\0\002 e \".., 64512)   = 64512
3470:   write(3, " $E7 S9D\0\0FC\0\002 e ]".., 64512)   = 64512
3470:   write(3, "D7 U e8B\0\0FC\0\002 e ^".., 64512)   = 64512
3470:   write(3, " v . I 7\0\0FC\0\002 e _".., 64512)   = 0
3470:   time()                                          = 1268164735
3470:   write(1, 0xFE975B54, 13)                        = 13
3470:      0 9 - M a r   1 9 : 5 8
3470:   write(1, 0xFE975B54, 117)                       = 117
3470:      b t a p e   J o b I d   0 :   E n d   o f   V o l u m e   " T e
3470:      s t V o l u m e 1 "   a t   1 4 : 1 3 5 2 3   o n   d e v i c e
3470:        " d d s 3 "   ( / d e v / r m t / 1 c b n ) .   W r i t e   o
3470:      f   6 4 5 1 2   b y t e s   g o t   0 .\n
3470:   ioctl(3, (('m'<<8)|1), 0x08046EF8)              = 0
3470:   ioctl(3, (('m'<<8)|1), 0x08046ED8)              = 0
3470:   ioctl(3, (('m'<<8)|1), 0x08046ED8)              = 0
3470:   brk(0x08107538)                                 = 0
3470:   brk(0x08117538)                                 = 0
3470:   read(3, "D7 U e8B\0\0FC\0\002 e ^".., 64512)    = 64512
3470:   time()                                          = 1268164736
3470:   write(1, 0xFE975B54, 13)                        = 13
3470:      0 9 - M a r   1 9 : 5 8
3470:   write(1, 0xFE975B54, 48)                        = 48
3470:      b t a p e   J o b I d   0 :   R e - r e a d   o f   l a s t   b
3470:      l o c k   s u c c e e d e d .\n
3470:   write(1, 0xFE975B54, 69)                        = 69
3470:      b t a p e :   b t a p e . c : 2 3 6 0   L a s t   b l o c k   a
3470:      t :   1 4 : 1 3 5 2 2   t h i s _ d e v _ b l o c k _ n u m = 1
3470:      3 5 2 3\n

In this case, the "D7 U e8B" block is the last one written to the tape. I
believe what is happening is this: the "D7 U e8B" block corresponds to the
E5B3FE block in the first case, i.e. it was written to tape, the drive
returned EOM sense, *but* this time the sense data had the valid bit set.
Now the driver behaves as expected by returning success to the application,
and when btape attempts to write the next block ( v . I 7), the driver
returns 0 immediately (without attempting to write to the drive).

This time, the last block on the tape is the one btape expected to be there
... and with that, I have the drive working properly (at last!)

Unfortunately, these drives are not easy to configure. There is no one-to-one
correspondence between the switches and the options. You need to look in the
configuration table for the set of options that most closely matches the ones
you want and then set the switches according to that line of the table (yuk!)

The default (wrong) settings for configuration switches 3-8 on this drive
are: 0 1 1 1 1 1. For Solaris and presumably Linux, I used the settings which
the manual recommends for Sun workstations: 1 1 1 0 0 1. Among other things,
this gives the drive an "other" personality, rather than the default "HP"
personality.

(Phew!)

Operating System: Solaris 10 5/09 (update 7)
Bacula: 3.0.1
Drive: HP C1557A & HP C5648A DDS-3 Autoloader (6 x 24GB)

Allan



------------------------------------------------------------------------------
Download Intel&#174; Parallel Studio Eval
Try the new software tools for yourself. Speed compiling, find bugs
proactively, and fine-tune applications for parallel performance.
See why Intel Parallel Studio got high marks during beta.
http://p.sf.net/sfu/intel-sw-dev
_______________________________________________
Bacula-devel mailing list
Bacula-devel@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/bacula-devel

Reply via email to