Hi Dan, On Saturday 25 August 2007 03:41:55 pm Dan Langille wrote: > On 25 Aug 2007 at 1:32, Ivan Adzhubey wrote: > > Hi, > > > > I am getting the following errors while running large jobs with data > > spanning > > > > >2 tapes: > > > > 23-Aug 21:37 rosalind-sd: 3305 Autochanger "load slot 7, drive 0", status > > is OK. > > 23-Aug 21:37 rosalind-sd: 3301 Issuing autochanger "loaded drive 0" > > command. 23-Aug 21:37 rosalind-sd: 3302 Autochanger "loaded drive 0", > > result is Slot 7. 23-Aug 21:38 rosalind-sd: Recycled volume > > "Chromosome0031" on device "Drive-1" (/dev/nst0), all previous data lost. > > 23-Aug 21:38 rosalind-sd: New volume "Chromosome0031" mounted on > > device "Drive-1" (/dev/nst0) at 23-Aug-2007 21:38. > > 23-Aug 21:38 rosalind-sd: fantom-dataOut.2007-08-22_23.10.12 Error: > > block.c:538 Write error at 0:1756 on device "Drive-1" (/dev/nst0). > > ERR=Input/output error. > > Try adding a sleep to the changer script. Sometimes the tape drive > is still settling when the write is attempted.
I did, this section of my mtx-changer script looks like this: case $cmd in unload) debug "Doing mtx -f $ctl unload $slot $drive" # # enable the following line if you need to eject the cartridge mt -f $device offline sleep 10 ${MTX} -f $ctl unload $slot $drive ;; load) debug "Doing mtx -f $ctl load $slot $drive" ${MTX} -f $ctl load $slot $drive rtn=$? # # Increase the sleep time if you have a slow device # or remove the sleep and add the following: # sleep 15 wait_for_drive $device exit $rtn ;; As you can see, I do have "sleep 10" after offline and "wait_for_drive" after load. I used to have sleep 15 after load instead, and it worked the same. All autochanger tests pass without a problem. Any suggestions where to insert more delays in the script? > > 23-Aug 21:38 rosalind-sd: fantom-dataOut.2007-08-22_23.10.12 Error: Error > > writing final EOF to tape. This Volume may not be readable. > > dev.c:1542 ioctl MTWEOF error on "Drive-1" (/dev/nst0). ERR=Input/output > > error. > > 23-Aug 21:38 rosalind-sd: End of medium on Volume "Chromosome0031" > > Bytes=113,218,556 Blocks=1,755 at 23-Aug-2007 21:38. > > > > This started with version 1.36.1 that we were running originally and > > persisted through upgrade to 1.38.11. I have built and installed version > > 2.2.0 today but haven't run large backups yet. I am trying to test and > > eliminate any possible hardware/driver configuration problems first. > > Regular btape "test" and "auto" tests completed perfectly, now I want to > > run "fill" test but documentation states multiple-tape variant is still > > not operational. Is it true? > > Multi-volume backups has and still is a vital feature of Bacula. See > http://www.bacula.org/rel-manual/Current_State_Bacula.html: > > "Multi-volume saves. When a Volume is full, Bacula automatically > requests the next Volume and continues the backup." > > Granted, it could be worded better... :) > > Where did you see otherwise? We should amend that. It's not in the main documentation but in the "Testing Your Tape..." chapter. It only refers to "fill" command as implemented in btape: http://www.bacula.org/rel-manual/Testing_Your_Tape_Drive.html#TapeTestingChapter "Using btape to Simulate Filling a Tape <...skipped...> To begin this test, you enter the fill command and follow the instructions. There are two options: the simple single tape option and the multiple tape option. Please use only the simple single tape option because the multiple tape option still doesn't work totally correctly. If the single tape option does not succeed, you should correct the problem before using Bacula." > > Does it mean version 2.2.0 in general can still have problems with > > multi-volume backups? > > It should not, but anything is possible. Bacula used to run multi-volume jobs here just fine for years until very recently. Still, it is only 3 or 4-volume jobs that constantly fail, 2-volume ones are OK. It also looks like as soon as the first error occurres, bacula loses track of files/volumes completely and all consequent attempts to change a volume in the middle of the job will fail. I've lost most of my previous backup data through this error already. This however was my mistake so I can't complain: I was keeping purged tapes with old data in the changer unprotected and the overnight backup triggering this error has just recycled and trashed them all. Since every attempt to change a volume was failing, bacula just kept recycling volumes until none were left, before I noticed. It never happened before, so I grew sort of overconfident in it ;-( > > The server is running Linux kernel 2.4.18smp and have Qualstar RLS-4445 > > autochanger with single Sony SDX-700C AIT drive attached to Adaptec 3960D > > Ultra160 SCSI adapter (aic7xxx driver). SCSI RAID is also attached to the > > first channel of the same dual-host adapter. I read that sharing SCSI > > adapters with other devices may create problems but it was running just > > fine for 4 years in this configuration until recently when the amount of > > backup data increased. The problem seems to only appear when a single > > backup job spans 3 or more volumes; spanning 2 volumes has yet to produce > > an error, although I haven't run too many of large jobs - they take a lot > > of time. But every 2+ volume job I've tried has failed. > > > > Here's the tapeinfo output: > > > > # tapeinfo -f /dev/sg2 > > Product Type: Tape Drive > > Vendor ID: 'SONY ' > > Product ID: 'SDX-700C ' > > Revision: '0103' > > Attached Changer: No > > SerialNumber: '0002084649' > > MinBlock:2 > > MaxBlock:16777215 > > SCSI ID: 1 > > SCSI LUN: 0 > > Ready: yes > > BufferedMode: yes > > Medium Type: Not Loaded > > Density Code: 0x32 > > BlockSize: 0 > > DataCompEnabled: yes > > DataCompCapable: yes > > DataDeCompEnabled: yes > > CompType: 0x3 > > DeCompType: 0x0 > > BOP: yes > > Block Position: 0 > > Here is my two tape test with btape. Does your test run OK? > > http://www.freebsddiary.org/digital-tl891.php > > Look for: ape -c /usr/local/etc/bacula-sd.conf /dev/nsa0 Yep, ran it many times, tested half of my tapes already, including the ones that failed during actual backups. Not a single glitch. Ran single-volume btape "fill" test already, no errors either. Running a large test backup right now, should take about 12 hours to complete. If it fails, I will consider disabling Hardware End of Medium as per your suggestions. Thanks for a link, very useful! Cheers, Ivan ------------------------------------------------------------------------- This SF.net email is sponsored by: Splunk Inc. Still grepping through log files to find problems? Stop. Now Search log events and configuration files using AJAX and a browser. Download your FREE copy of Splunk now >> http://get.splunk.com/ _______________________________________________ Bacula-users mailing list Bacula-users@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/bacula-users