In message <[EMAIL PROTECTED]> you wrote:
>
> Then it sounds to me more like a bacula issue rather than the SCSI tape 
> driver. 

I disagree. We get  pretty  clear  SCSI  error  messages  (unexpected
disconnect).  No matter what a user application does, the SCSI driver
must never run into such a situation. This is a SCSI driver problem.

> A problem in diagnosing it is that it is not reproducible. This could 
> indicate a 

The problem *is* reproducable. For me it happens pretty reliably. The
problem is that it takes a loooooong time - typicly hours. And I have
to admit that I didn't find  (or  take)  the  time  to  really  start
debugging it. Probably raising debug levels for the SCSI system would
be a good start, but I'm not convinced. 

BTW: I wrote befor that this happens without spooling only; this  was
wrong.  Scanning  the  logs  I've  seen  cases  of  this problem when
spooling was active, too.

> timing issue as you've pointed out so if a trace is set up to catch the 
> villain 
> the incident may not occur at all. What can we do?

Let's summarize the observed symptoms again:

* On user level we see error messages like these:

        Error: block.c:538 Write error at 39:5706 on device "SLR100" 
(/dev/nst0). ERR=Input/output error.
        Error: Error writing final EOF to tape. This Volume may not be readable.
        dev.c:1536 ioctl MTWEOF error on "SLR100" (/dev/nst0). ERR=Input/output 
error.

* On system level we see error messages like these:

        sym0: unexpected disconnect
        st0: Error 700ff (sugg. bt 0x0, driver bt 0x0, host bt 0x7).
        sym0: unexpected disconnect
        st0: Error 700ff (sugg. bt 0x0, driver bt 0x0, host bt 0x7).
        st0: Error with sense data: <6>st0: Current: sense key: Unit Attention
            Additional sense: Power on, reset, or bus device reset occurred

* It happens with different types of tape drives; for me with a SLR60
  driver and 3 x SLR100 autoloaders.

* It happens with different types of SCSI controllers; for me with:
  - LSI Logic / Symbios Logic 53c1030 PCI-X Fusion-MPT Dual Ultra320 SCSI
  - Adaptec aic7899 Ultra160 SCSI adapter
  - Adaptec AHA-2940UW Ultra SCSI adapter
  - Dawicontrol DC-29160 Ultra160 SCSI adapter

* It happens long before the tape is actually full.

* I never had any other kinds of I/O errors, only this "Error writing
  final EOF"; this boils down to a MTIOCTOP  ioctl()  with  op=MTWEOF
  and  count=1  -  and  this  is probably the major difference to all
  other tape tests I've tried: none of the other tools I use to write
  to a tape (like tar etc.) actually write an EOF themself; they just
  close the tape device at the end of the write operations.


Maybe I'm going to write some test code for such a szenario  -  write
some buffers followed by an MTWEOF op...

Best regards,

Wolfgang Denk

-- 
Software Engineering:  Embedded and Realtime Systems,  Embedded Linux
Phone: (+49)-8142-66989-10 Fax: (+49)-8142-66989-80 Email: [EMAIL PROTECTED]
The only way to learn a new programming language is by  writing  pro-
grams in it.                                        - Brian Kernighan


-------------------------------------------------------
This SF.Net email is sponsored by xPML, a groundbreaking scripting language
that extends applications into web and mobile media. Attend the live webcast
and join the prime developer group breaking into this new coding territory!
http://sel.as-us.falkag.net/sel?cmd=lnk&kid0944&bid$1720&dat1642
_______________________________________________
Bacula-users mailing list
Bacula-users@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/bacula-users

Reply via email to