On Friday 28 October 2005 12:00, Michele Baldessari wrote:
> Since I've spent some time looking at bacula's and mt's source code to
> figure out why just bacula was barfing on an empty tape drive on a
> 2.6.x linux kernel, I think it might be worth adding the following to
> the FAQ. (Apologies if something similar is in the docs already..I did
> look but found nothing relevant).
>
> Note: I stumbled upon this issue while configuring bacula 1.36.2 on a
> 2.6.x based system on an IBM 4560SLX tape library.
>
> thanks for all your work,
> Michele Baldessari
>
> * Bacula, kernel 2.6.x and SCSI hangs when tape drive is empty
>
> Bacula 1.36.x does not open the device with O_NONBLOCK and thus isn't
> able to cope with the kernel change in the st driver [1]. The solution
> is either to upgrade bacula to a 1.37.x version or to use a 2.4.x kernel.
>
> [1] http://www.ussg.iu.edu/hypermail/linux/kernel/0302.2/0066.html :
> """
> The open() behaviour of st was changed at 2.5.3 to conform with SUS
> (blocking) and what the other Unices do (timeout). If the device is opened
> without O_NONBLOCK, the driver waits for some time (default 2 minutes)
> for the device to become ready. If it does not become ready, an error is
> returned.

Do you know where the kernel documentation on this is and where I can find out 
more about SUS (whatever it means)?

My experience here is that if there is no tape in the drive, you must open it 
in read-only mode and with O_NONBLOCK set.  In that case, it will open the 
drive, and you can then try to figure out that there is no media present.
This is a real nightmare for me (and Bacula) because the use of O_NONBLOCK 
seems to be very system dependent -- for example, on FreeBSD the ioctl() to 
clear the O_NONBLOCK apparently is not valid on a tape drive.

As far as I can tell, this change has destroyed a number of functions such as 
the Polling in Bacula, and if you use "Offline on Unmount", and Bacula wants 
another tape, Bacula will retry opening the drive for 5 minutes (based on the 
idea that the open fails immediately), however since the open() blocks two 
minutes, Bacula will fail the job after 20 minutes.

I wonder if the guys who changed this in the kernel were aware of the 
consequences.  

If anyone knows if and where the kernel guys publish these things, I sure 
would like to know.  If I have to read through 50 million lines of kernel 
change log to find this information, which I cannot do, we will probably have 
more of these kinds of surprises in the future ...  :-(

-- 
Best regards,

Kern

  (">
  /\
  V_V


-------------------------------------------------------
SF.Net email is sponsored by:
Tame your development challenges with Apache's Geronimo App Server. Download
it for free - -and be entered to win a 42" plasma tv or your very own
Sony(tm)PSP.  Click here to play: http://sourceforge.net/geronimo.php
_______________________________________________
Bacula-users mailing list
Bacula-users@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/bacula-users

Reply via email to