On Monday 17 April 2006 01:02, Wolfgang Denk wrote:
> Dear Kern,
>
> in message <[EMAIL PROTECTED]> you wrote:
> > All your reasoning is absolutely perfect up to this previous point.  In
> > looking at the Bacula error messages that you list above, it is always an
> > I/O error writing a Bacula block that produces the problem.  Once Bacula
> > gets an
>
> Argh... Thanks for pointing this out. So I always misinterpreted  the
> events.

Well, not necessarily. First, you are not as familiar with Bacula messages as 
I am, and second, after more thought I could be completely wrong see below 
(or I guess I prefer to say, "perhaps it is even more complicated").

>
> > IMO, the source problem is coming when writing the buffers (a write()
> > request) and not subsequent ioctl(WEOF).  Also, between the write() that
> > fails and the ioctl(WEOF), Bacula will issue some other ioctl(), which
> > varies according to the OS.  This ioctl() on a Linux machine, for
> > example, is ioctl() MTIOCTOP with mt_op=MTIOCLRERR.  In all cases, the
> > purpose of this ioctl() between the write() and the ioctl(WEOF) is to
> > attempt to clear any error condition in the SCSI driver to permit a valid
> > EOF to terminate the Volume.  On Linux, this may not be necessary, but on
> > other OSes such as FreeBSD, the SCSI driver locks out virtually all I/O
> > operations after a serious error.
>
> OK.
>
> > My best guess is that the problem is some sort of kernel SCSI lock race
> > condition.  As a consequence, I would recommend that you concentrate on
> > writing lots of buffers as fast as you can, but from multiple processes,
> > possibly to the same or different drives.  In fact, you might try firing
> > off several hundred write processes, and possibly a few read processes to
> > another drive.
>
> I will try that, but you just blowed my theory  of  why  we  see  the
> problem  only  with  bacula,  but  never (yet) with any other program
> writing to tape.

Bacula *does* use the sequence  write(), ioctl(WEOF).  However, this is done 
only once every 1GB by default.  Maybe this could be why you only see it 
infrequently. If the problem is happening at that point, then you will not 
see an I/O error message from the write(), but you will see one from the 
ioctl(WEOF).  Look carefully at the Bacula output.   It is also possible that 
you are getting the error from the sequence:

  write()
  ioctl(WEOF)
  write()

Which is the sequence when Bacula writes and EOF once every 1GB. Perhaps the 
ioctl(WEOF) is causing the write() of the next block to fail. Bacula will 
then do the ioctl(clear-error) and ioctl(WEOF) "recovery attempt" I mentioned 
in my previous email.

All the above you could be tested by setting "Maximum File Size = 100 MB" for 
example, and in that case, Bacula will write a *lot* more EOF marks (10 times 
as many as the default).

>
> > When the SCSI driver complains about an unexpected disconnect, it is very
> > likely because it either missed an interrupt or it issued a command at a
> > bad time (i.e. a missing lock), or it overran the SCSI command queue.
>
> I will try to run some tests...
>
> Best regards,
>
> Wolfgang Denk

-- 
Best regards,

Kern

  (">
  /\
  V_V


-------------------------------------------------------
This SF.Net email is sponsored by xPML, a groundbreaking scripting language
that extends applications into web and mobile media. Attend the live webcast
and join the prime developer group breaking into this new coding territory!
http://sel.as-us.falkag.net/sel?cmd=lnk&kid=110944&bid=241720&dat=121642
_______________________________________________
Bacula-users mailing list
Bacula-users@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/bacula-users

Reply via email to