Hello,

On 1/17/2006 12:10 AM, Alexander Bergolth wrote:
On 01/14/06 15:20, Julien Cigar wrote:

I had exactly the same problem some months ago, full thread can be found here: http://www.nabble.com/Still-problems-with-my-Sony-tapes-t504270.html


Thanks for the hint. But I don't think that this is the same problem.
Your errors were DDS related, I'm using a DLT drive. Besides, the scsi-errors are pretty different and your problems seem to have been scsi-id related. I don't suspect this to be the problem in my case, as the server was shipped with this configuration and the tape drive has been working without any problem now for nearly a year.

I believe my problem might have been caused by a drive firmware problem or something like that. After that error, the tape was totally unresponsive, when I tried to query it using mt status or tapeinfo, the following errors showed up in the syslog:

scsi: reservation conflict: host 0 channel 1 id 6 lun 0
st0: Error 70018 (sugg. bt 0x0, driver bt 0x0, host bt 0x7).

After a cold restart, the drive works again now.

Then let's hope it remains that way... unfortunately, I experienced the similar problems, though not often, and I could never find any definite reason for it.

I suspect that there may be slight instabilities in the SCSI HBAs I use, but nothing I could prove. And, also, I use rather aged hardware here, so it's quite possibly related to hardware errors, too.

However the other issue is that apparently bacula has been informed about the error, but it seems to have interpreted it as an end of medium condition instead of aborting the backup:

Yup, that's normal because of two reasons:
- First, to logically finish a volume after an error that's Baculas only choice, and - second, there is, to my knowledge, no portable, reliable way to distinguis between EOM and real hardware errors from an application.

-------------------- snipp! --------------------
14-Jan 01:57 samba-sd: Samba-Homes.2006-01-14_01.05.00 Error:
block.c:538 Write error at 6:3819 on device "DLT1" (/dev/nst0).
ERR=Input/output error.
14-Jan 01:57 samba-sd: Samba-Homes.2006-01-14_01.05.00 Error: Error
writing final EOF to tape. This Volume may not be readable.
dev.c:1553 ioctl MTWEOF error on "DLT1" (/dev/nst0). ERR=Input/output error.
14-Jan 01:57 samba-sd: End of medium on Volume "Weekly-2005-06-12_8"
Bytes=6,245,922,599 Blocks=96,818 at 14-Jan-2006 01:57.
-------------------- snipp! --------------------

Is there an option to let bacula abort a backup after a write error?

Not as far as I know, because then it might choke an any EOM and on any temporary failure.

Cheers,
--leo

P.S.: I'd also still be interested in other users experiences with the durability of DLT tapes and some recommendations on drive cleaning and maintainance.

I must have missed that the last time...
Well, here I use a really aged 7-Slot DLT4000 autoloader. I'm not sure how old it actually is, but for those of you who followed the development: It's an ADIC FastStor 4000 which I bought second hand. Might be about eight years old. When I installed it, according to the internal counters, it has not been used really often. Currently, it claims that it's loaded the drive less than 2200 times. It usually works without problems.

I have also tried used DLT tapes, and usually found them to be reliable day-to-day use. These cartridges were usually used only once or twice, though, and then stored in good conditions. Anyway, with my usage profile, I can't find any significant differences in reliabilty between new and used DLT-IV tapes written with DLT4000 format. I'd estimate the percentage of problematic or even damaged tapes to be below 5%. (I consider a tape bad when the following happens: (I test used tapes by writing some GB of random data to them) write failure during test write, problems labeling a tape that don't disappear after manually writing some random data, testing with btape, and re-labeling them, or a reduced capacity during operation. Currently, in my Bacula catalog, I have 64 DLT-IV tapes of which 2 are damaged: One was a used tape I never could write any data to, and one was removed just some times agob because, after recycling, it couldn't be recognized any more. I have not yet tested if the tape is still writeable because I wouldn't trust it any more anyway.

Concerning drive maintenance: In my home-office environment (e.g. neither clean-room conditions nor cigartte smoke or unusual amounts of dust, I have to clean the DLT drive avery few months, or about 15-25 tapes written. (Usually, I follow the drives indication that it needs to be cleaned.) That's about what I know as normal DLT cleaning cycles from other sites, too. In a dusty environment, especially with cigarette smoke, you will need to clean more often, and you will have to replace the drive much more often.

In short: I know DLT as a very robust technology, bot concerning drives and tapes.

Arno

Alexander Bergolth wrote:

Hi!

Tonight, I got the following I/O error for the first time:
The Drive is a DLT1 tape:
  Vendor: BNCHMARK  Model: DLT1              Rev: 5538
  Type:   Sequential-Access                  ANSI SCSI revision: 02

-------------------- snipp! --------------------
14-Jan 01:55 samba-sd: Writing spooled data to Volume. Despooling 536,904,185 bytes ... 14-Jan 01:57 samba-sd: Samba-Homes.2006-01-14_01.05.00 Error: block.c:538 Write error at 6:3819 on device "DLT1" (/dev/nst0). ERR=Input/output error. 14-Jan 01:57 samba-sd: Samba-Homes.2006-01-14_01.05.00 Error: Error writing final EOF to tape. This Volume may not be readable. dev.c:1553 ioctl MTWEOF error on "DLT1" (/dev/nst0). ERR=Input/output error. 14-Jan 01:57 samba-sd: End of medium on Volume "Weekly-2005-06-12_8" Bytes=6,245,922,599 Blocks=96,818 at 14-Jan-2006 01:57.
14-Jan 01:57 samba-dir: Recycled volume "Weekly-2005-04-22_2"
14-Jan 01:57 samba-sd: Please mount Volume "Weekly-2005-04-22_2" on Storage Device "DLT1" (/dev/nst0) for Job Samba-Homes.2006-01-14_01.05.00
-------------------- snipp! --------------------

This is the dmesg output:
-------------------- snipp! --------------------
st0: Error with sense data: <6>st0: Current: sense key: Not Ready
    Additional sense: Logical unit not ready, initializing cmd. required
st0: Error 400f4 (sugg. bt 0x0, driver bt 0x0, host bt 0x4).
st0: Error 400f4 (sugg. bt 0x0, driver bt 0x0, host bt 0x4).
[...]
-------------------- snipp! --------------------

As I don't have very much experience with DLT tapes:
Does this indicate a worn out tape or should I simply clean the drive with the cleaning tape?

How many backups should be possible with a DLT tape / at which intervals should a tape be changed?

At which intervals should the drive be cleaned?

In this case, the current job is still running and waiting for another tape. Is there an option to let bacula abort the job, if such an error occurs?

Cheers,
--leo




--
IT-Service Lehmann                    [EMAIL PROTECTED]
Arno Lehmann                  http://www.its-lehmann.de


-------------------------------------------------------
This SF.net email is sponsored by: Splunk Inc. Do you grep through log files
for problems?  Stop!  Download the new AJAX search engine that makes
searching your log files as easy as surfing the  web.  DOWNLOAD SPLUNK!
http://sel.as-us.falkag.net/sel?cmd=lnk&kid=103432&bid=230486&dat=121642
_______________________________________________
Bacula-users mailing list
Bacula-users@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/bacula-users

Reply via email to