Hi Dan,

On Saturday 25 August 2007 03:41:55 pm Dan Langille wrote:
> On 25 Aug 2007 at 1:32, Ivan Adzhubey wrote:
> > Hi,
> >
> > I am getting the following errors while running large jobs with data
> > spanning
> >
> > >2 tapes:
> >
> > 23-Aug 21:37 rosalind-sd: 3305 Autochanger "load slot 7, drive 0", status
> > is OK.
> > 23-Aug 21:37 rosalind-sd: 3301 Issuing autochanger "loaded drive 0"
> > command. 23-Aug 21:37 rosalind-sd: 3302 Autochanger "loaded drive 0",
> > result is Slot 7. 23-Aug 21:38 rosalind-sd: Recycled volume
> > "Chromosome0031" on device "Drive-1" (/dev/nst0), all previous data lost.
> > 23-Aug 21:38 rosalind-sd: New volume "Chromosome0031" mounted on
> > device "Drive-1" (/dev/nst0) at 23-Aug-2007 21:38.
> > 23-Aug 21:38 rosalind-sd: fantom-dataOut.2007-08-22_23.10.12 Error:
> > block.c:538 Write error at 0:1756 on device "Drive-1" (/dev/nst0).
> > ERR=Input/output error.
>
> Try adding a sleep to the changer script.  Sometimes the tape drive
> is still settling when the write is attempted.

I did, this section of my mtx-changer script looks like this:

case $cmd in
   unload)
      debug "Doing mtx -f $ctl unload $slot $drive"
#
# enable the following line if you need to eject the cartridge
      mt -f $device offline
      sleep 10
      ${MTX} -f $ctl unload $slot $drive
      ;;

   load)
      debug "Doing mtx -f $ctl load $slot $drive"
      ${MTX} -f $ctl load $slot $drive
      rtn=$?
#
# Increase the sleep time if you have a slow device
# or remove the sleep and add the following:
#     sleep 15
      wait_for_drive $device
      exit $rtn
      ;;

As you can see, I do have "sleep 10" after offline and "wait_for_drive" after 
load. I used to have sleep 15 after load instead, and it worked the same. All 
autochanger tests pass without a problem. Any suggestions where to insert 
more delays in the script?

> > 23-Aug 21:38 rosalind-sd: fantom-dataOut.2007-08-22_23.10.12 Error: Error
> > writing final EOF to tape. This Volume may not be readable.
> > dev.c:1542 ioctl MTWEOF error on "Drive-1" (/dev/nst0). ERR=Input/output
> > error.
> > 23-Aug 21:38 rosalind-sd: End of medium on Volume "Chromosome0031"
> > Bytes=113,218,556 Blocks=1,755 at 23-Aug-2007 21:38.
> >
> > This started with version 1.36.1 that we were running originally and
> > persisted through upgrade to 1.38.11. I have built and installed version
> > 2.2.0 today but haven't run large backups yet. I am trying to test and
> > eliminate any possible hardware/driver configuration problems first.
> > Regular btape "test" and "auto" tests completed perfectly, now I want to
> > run "fill" test but documentation states multiple-tape variant is still
> > not operational. Is it true?
>
> Multi-volume backups has and still is a vital feature of Bacula.  See
> http://www.bacula.org/rel-manual/Current_State_Bacula.html:
>
> "Multi-volume saves. When a Volume is full, Bacula automatically
> requests the next Volume and continues the backup."
>
> Granted, it could be worded better... :)
>
> Where did you see otherwise?  We should amend that.

It's not in the main documentation but in the "Testing Your Tape..." chapter. 
It only refers to "fill" command as implemented in btape:

http://www.bacula.org/rel-manual/Testing_Your_Tape_Drive.html#TapeTestingChapter

"Using btape to Simulate Filling a Tape

<...skipped...>

To begin this test, you enter the fill command and follow the instructions. 
There are two options: the simple single tape option and the multiple tape 
option. Please use only the simple single tape option because the multiple 
tape option still doesn't work totally correctly. If the single tape option 
does not succeed, you should correct the problem before using Bacula."

> > Does it mean version 2.2.0 in general can still have problems with
> > multi-volume backups?
>
> It should not, but anything is possible.

Bacula used to run multi-volume jobs here just fine for years until very 
recently. Still, it is only 3 or 4-volume jobs that constantly fail, 2-volume 
ones are OK. It also looks like as soon as the first error occurres, bacula 
loses track of files/volumes completely and all consequent attempts to change 
a volume in the middle of the job will fail. I've lost most of my previous 
backup data through this error already. This however was my mistake so I 
can't complain: I was keeping purged tapes with old data in the changer 
unprotected and the overnight backup triggering this error has just recycled 
and trashed them all. Since every attempt to change a volume was failing, 
bacula just kept recycling volumes until none were left, before I noticed. It 
never happened before, so I grew sort of overconfident in it ;-(

> > The server is running Linux kernel 2.4.18smp and have Qualstar RLS-4445
> > autochanger with single Sony SDX-700C AIT drive attached to Adaptec 3960D
> > Ultra160 SCSI adapter (aic7xxx driver). SCSI RAID is also attached to the
> > first channel of the same dual-host adapter. I read that sharing SCSI
> > adapters with other devices may create problems but it was running just
> > fine for 4 years in this configuration until recently when the amount of
> > backup data increased. The problem seems to only appear when a single
> > backup job spans 3 or more volumes; spanning 2 volumes has yet to produce
> > an error, although I haven't run too many of large jobs - they take a lot
> > of time. But every 2+ volume job I've tried has failed.
> >
> > Here's the tapeinfo output:
> >
> > # tapeinfo -f /dev/sg2
> > Product Type: Tape Drive
> > Vendor ID: 'SONY    '
> > Product ID: 'SDX-700C        '
> > Revision: '0103'
> > Attached Changer: No
> > SerialNumber: '0002084649'
> > MinBlock:2
> > MaxBlock:16777215
> > SCSI ID: 1
> > SCSI LUN: 0
> > Ready: yes
> > BufferedMode: yes
> > Medium Type: Not Loaded
> > Density Code: 0x32
> > BlockSize: 0
> > DataCompEnabled: yes
> > DataCompCapable: yes
> > DataDeCompEnabled: yes
> > CompType: 0x3
> > DeCompType: 0x0
> > BOP: yes
> > Block Position: 0
>
> Here is my two tape test with btape.  Does your test run OK?
>
>   http://www.freebsddiary.org/digital-tl891.php
>
> Look for: ape -c /usr/local/etc/bacula-sd.conf /dev/nsa0

Yep, ran it many times, tested half of my tapes already, including the ones 
that failed during actual backups. Not a single glitch. Ran single-volume 
btape "fill" test already, no errors either. Running a large test backup 
right now, should take about 12 hours to complete. If it fails, I will 
consider disabling Hardware End of Medium as per your suggestions. Thanks for 
a link, very useful!

Cheers,
Ivan


-------------------------------------------------------------------------
This SF.net email is sponsored by: Splunk Inc.
Still grepping through log files to find problems?  Stop.
Now Search log events and configuration files using AJAX and a browser.
Download your FREE copy of Splunk now >>  http://get.splunk.com/
_______________________________________________
Bacula-users mailing list
Bacula-users@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/bacula-users

Reply via email to