Hello Andrew, Is /dev/changer created by udev rules? Have you tried /dev/sgX instead? Can you send us the output of the "lsscsi -l" command and "dmesg | grep Attached"? Have you checked your drives/autochanger using just mtx/mt commands to see if they are working? Which is your mtx version?
Best regards, Ana On Wed, Jun 17, 2015 at 7:29 PM, Marcin Haba <ganius...@gmail.com> wrote: > Hello, > > Do you have any errors in dmesg (hardware errors, bus reset, SCSI > errors ... etc.) ? > > Best regards, > Marcin Haba (gani) > > 2015-06-17 21:56 GMT+02:00 Andrew Noonan <anoo...@gmail.com>: > > Hi all, > > > > It's taking a lot longer because of the higher timeouts, but the > > label is still failing with a termination. If I understand it > > correctly, the mtx-changer script is polling with 'mt' looking for the > > $ready state, defined in the config file as ONLINE (for Linux). I'm > > not seeing drive 0 go into that state... I just see: > > > > SCSI 2 tape drive: > > File number=-1, block number=-1, partition=0. > > Tape block size 0 bytes. Density code 0x0 (default). > > Soft error count since last status=0 > > General status bits on (50000): > > DR_OPEN IM_REP_EN > > > > the other device looks like: > > > > SCSI 2 tape drive: > > File number=0, block number=0, partition=0. > > Tape block size 0 bytes. Density code 0x5a (no translation). > > Soft error count since last status=0 > > General status bits on (41010000): > > BOT ONLINE IM_REP_EN > > > > So I see that it's ~possible~ to see the ONLINE state, but it doesn't > > seem like it ever gets to that state during load. > > > > Any thoughts? > > > > Thanks, > > Andrew > > > > On Wed, Jun 17, 2015 at 11:44 AM, Andrew Noonan <anoo...@gmail.com> > wrote: > >> Hi Ana, > >> > >> Thanks for the reply. I'm adding those into the drives. BTW, > >> 900 is the value. Having no real experience with these, is it > >> abnormal for a load to take the 10+ minutes, or is that reasonable? > >> My next step is to add those settings in, restart the SD, and attempt > >> to do a "label barcode" again. > >> > >> Thanks, > >> Andrew > >> > >> On Tue, Jun 16, 2015 at 9:10 PM, Ana Emília M. Arruda > >> <emiliaarr...@gmail.com> wrote: > >>> Hello Andrew, > >>> > >>> You can find in the output of a "lsscsi -l" command the timeout for > your > >>> drives. Then you can configure 3 timeout directives for each one of > your two > >>> drives (LRADrive-1 e LRADrive-2): > >>> > >>> Maximum Changer Wait = X > >>> Maximum Rewind Wait = X > >>> Maximum Open Wait = X > >>> > >>> where X is the timeout value for your dirves. > >>> > >>> You can also customize your mtx-changer script for this timeout > changing the > >>> bellow 300 seconds value: > >>> > >>> wait_for_drive() { > >>> i=0 > >>> while [ $i -le 300 ]; do # Wait max 300 seconds > >>> > >>> Best regards, > >>> Ana > >>> > >>> > >>> On Tue, Jun 16, 2015 at 5:02 PM, Andrew Noonan <anoo...@gmail.com> > wrote: > >>>> > >>>> Hi all, > >>>> > >>>> I'm almost completely new to tape. We've been doing disk-based > >>>> backups for years, but we now have a project where we want to offsite > >>>> hundreds of TB permanently, and have a Dell TL4000 (a rebranded IBM > >>>> 3573-TL from the looks of it) with 2 ULT3580 LTO-6 drives. We're > >>>> running bacula 5.2. The server is a Dell 1950 running Centos 5 (sorry > >>>> for the old OS). > >>>> > >>>> The btape tests run on both units without a problem, including the > >>>> autochanger tests, and manually executing load/unload/list commands > >>>> with mtx-changer seem to run fine. The one exception to this is that > >>>> the mtx-changer load command seems to take about 10 minutes to > >>>> complete, which seems unreasonably long. These are brand new tapes > >>>> and I haven't written anything to them other then whatever btape does > >>>> with testing. I put a 5 minute sleep on the load for mtx-changer, but > >>>> other then that haven't customized the script, as I'm not sure what > >>>> I'd customize. > >>>> > >>>> The "update slots" command from the director works OK, but when I go > >>>> to do a "label barcode", the resulting "load slot" gets killed by > >>>> Bacula: > >>>> > >>>> 3992 Bad autochanger "load slot 20, drive 1": ERR=Child died from > >>>> signal 15: Termination. > >>>> Results=Program killed by Bacula (timeout) > >>>> > >>>> I've seen that in some of these posts to the list, this ends up being > >>>> permissions problems against the devices, but that doesn't seem to be > >>>> the case as far as I can see: > >>>> > >>>> bacula-sd is running as the bacula user/group. The bacula user is in > >>>> the "disk" group, and the *st* devices are in the disk group with "rw" > >>>> permissions: > >>>> > >>>> crw-rw---- 1 root disk 9, 128 Jun 4 12:02 /dev/nst0 > >>>> crw-rw---- 1 root disk 9, 224 Jun 4 12:02 /dev/nst0a > >>>> crw-rw---- 1 root disk 9, 160 Jun 4 12:02 /dev/nst0l > >>>> crw-rw---- 1 root disk 9, 192 Jun 4 12:02 /dev/nst0m > >>>> crw-rw---- 1 root disk 9, 129 Jun 10 17:06 /dev/nst1 > >>>> crw-rw---- 1 root disk 9, 225 Jun 10 17:06 /dev/nst1a > >>>> crw-rw---- 1 root disk 9, 161 Jun 10 17:06 /dev/nst1l > >>>> crw-rw---- 1 root disk 9, 193 Jun 10 17:06 /dev/nst1m > >>>> crw-rw---- 1 root disk 9, 0 Jun 4 12:02 /dev/st0 > >>>> crw-rw---- 1 root disk 9, 96 Jun 4 12:02 /dev/st0a > >>>> crw-rw---- 1 root disk 9, 32 Jun 4 12:02 /dev/st0l > >>>> crw-rw---- 1 root disk 9, 64 Jun 4 12:02 /dev/st0m > >>>> crw-rw---- 1 root disk 9, 1 Jun 10 17:06 /dev/st1 > >>>> crw-rw---- 1 root disk 9, 97 Jun 10 17:06 /dev/st1a > >>>> crw-rw---- 1 root disk 9, 33 Jun 10 17:06 /dev/st1l > >>>> crw-rw---- 1 root disk 9, 65 Jun 10 17:06 /dev/st1m > >>>> > >>>> Here's a block of debug from the SD during a label attempt for one of > the > >>>> slots: > >>>> > >>>> odin-sd: autochanger.c:434-0 Wiffle through devices looking for slot > >>>> odin-sd: autochanger.c:313-0 Locking changer LogRepoAutochanger > >>>> odin-sd: autochanger.c:740-0 omsg=/usr/lib64/bacula/mtx-changer > >>>> /dev/changer loaded 14 /dev/nst0 0 > >>>> odin-sd: autochanger.c:272-0 Run program=/usr/lib64/bacula/mtx-changer > >>>> /dev/changer loaded 14 /dev/nst0 0 > >>>> odin-sd: watchdog.c:206-0 Registered watchdog 636b888, interval 300 > >>>> odin-sd: bpipe.c:220-0 Wait for 28962 opt=1 > >>>> odin-sd: bpipe.c:228-0 Got break wpid=28962 status=0 ERR=none > >>>> odin-sd: bpipe.c:249-0 child status=0 > >>>> odin-sd: watchdog.c:226-0 Unregistered watchdog 636b888 > >>>> odin-sd: bpipe.c:264-0 returning stat=0,0 > >>>> odin-sd: autochanger.c:274-0 run_prog: /usr/lib64/bacula/mtx-changer > >>>> /dev/changer loaded 14 /dev/nst0 0 stat=0 result=0 > >>>> odin-sd: autochanger.c:327-0 Unlocking changer LogRepoAutochanger > >>>> odin-sd: autochanger.c:313-0 Locking changer LogRepoAutochanger > >>>> odin-sd: autochanger.c:740-0 omsg=/usr/lib64/bacula/mtx-changer > >>>> /dev/changer loaded 14 /dev/nst1 1 > >>>> odin-sd: autochanger.c:272-0 Run program=/usr/lib64/bacula/mtx-changer > >>>> /dev/changer loaded 14 /dev/nst1 1 > >>>> odin-sd: watchdog.c:206-0 Registered watchdog 636b888, interval 300 > >>>> odin-sd: bpipe.c:220-0 Wait for 28976 opt=1 > >>>> odin-sd: bpipe.c:228-0 Got break wpid=28976 status=0 ERR=none > >>>> odin-sd: bpipe.c:249-0 child status=0 > >>>> odin-sd: watchdog.c:226-0 Unregistered watchdog 636b888 > >>>> odin-sd: bpipe.c:264-0 returning stat=0,0 > >>>> odin-sd: autochanger.c:274-0 run_prog: /usr/lib64/bacula/mtx-changer > >>>> /dev/changer loaded 14 /dev/nst1 1 stat=0 result=0 > >>>> odin-sd: autochanger.c:327-0 Unlocking changer LogRepoAutochanger > >>>> odin-sd: autochanger.c:453-0 Slot=14 not found in another device > >>>> odin-sd: autochanger.c:313-0 Locking changer LogRepoAutochanger > >>>> odin-sd: autochanger.c:183-0 Doing changer load slot 14 "LRADrive-2" > >>>> (/dev/nst1) > >>>> odin-sd: autochanger.c:740-0 omsg=/usr/lib64/bacula/mtx-changer > >>>> /dev/changer load 14 /dev/nst1 1 > >>>> odin-sd: dev.c:1746-0 close_dev "LRADrive-2" (/dev/nst1) > >>>> odin-sd: dev.c:1751-0 device "LRADrive-2" (/dev/nst1) already closed > vol= > >>>> odin-sd: autochanger.c:190-0 Run program=/usr/lib64/bacula/mtx-changer > >>>> /dev/changer load 14 /dev/nst1 1 > >>>> odin-sd: watchdog.c:206-0 Registered watchdog 636b888, interval 300 > >>>> odin-sd: bpipe.c:443-0 Run program fgets killed=1 > >>>> odin-sd: bpipe.c:220-0 Wait for 28990 opt=1 > >>>> odin-sd: bpipe.c:228-0 Got break wpid=28990 status=15 ERR=none > >>>> odin-sd: bpipe.c:256-0 Child died from signal 15 > >>>> odin-sd: watchdog.c:235-0 Unregistered inactive watchdog 636b888 > >>>> odin-sd: bpipe.c:264-0 returning stat=15,134217743 > >>>> odin-sd: autochanger.c:205-0 load slot 14, drive 1, bad stats=Child > >>>> died from signal 15: Termination. > >>>> odin-sd: autochanger.c:212-0 load slot 14 status=134217743 > >>>> odin-sd: autochanger.c:327-0 Unlocking changer LogRepoAutochanger > >>>> odin-sd: autochanger.c:218-0 After changer, status=134217743 > >>>> odin-sd: dev.c:1735-0 Clear volhdr vol= > >>>> odin-sd: vol_mgr.c:544-0 vol_unused: no vol on "LRADrive-2" > (/dev/nst1) > >>>> odin-sd: lock.c:302-0 return lock. old=BST_WRITING_LABEL from > dircmd.c:554 > >>>> odin-sd: lock.c:307-0 return lock. new=BST_NOT_BLOCKED > >>>> odin-sd: dev.c:1746-0 close_dev "LRADrive-2" (/dev/nst1) > >>>> odin-sd: dev.c:1751-0 device "LRADrive-2" (/dev/nst1) already closed > vol= > >>>> odin-sd: acquire.c:731-0 Enter detach_dcr_from_dev > >>>> odin-sd: dircmd.c:220-0 <dird: label LogRepoAutochanger > >>>> VolumeName=000030L6 PoolName=LogrepoArchive MediaType=LTO-6 Slot=15 > >>>> drive=1 > >>>> odin-sd: dircmd.c:234-0 Do command: label > >>>> odin-sd: dircmd.c:627-0 Try changer device LRADrive-1 > >>>> odin-sd: dircmd.c:648-0 Device LogRepoAutochanger drive wrong: want=1 > >>>> got=0 skipping > >>>> odin-sd: dircmd.c:627-0 Try changer device LRADrive-2 > >>>> odin-sd: dircmd.c:643-0 Found changer device LRADrive-2 > >>>> odin-sd: dircmd.c:656-0 Found device LRADrive-2 > >>>> odin-sd: block.c:144-0 Returning new block=636b800 > >>>> odin-sd: acquire.c:713-0 JobId=0 enter attach_dcr_to_dev > >>>> odin-sd: dircmd.c:421-0 Can label. Device is not open > >>>> odin-sd: lock.c:285-0 steal lock. old=BST_NOT_BLOCKED from > dircmd.c:470 > >>>> odin-sd: lock.c:290-0 steal lock. new=BST_WRITING_LABEL > >>>> odin-sd: dircmd.c:471-0 Stole device "LRADrive-2" (/dev/nst1) lock, > >>>> writing label. > >>>> > >>>> The config I've got for these is: > >>>> > >>>> Device { > >>>> Name = LRADrive-1 > >>>> Alert Command = "sh -c 'smartctl -H -l error %c'" > >>>> AlwaysOpen = yes > >>>> ArchiveDevice = /dev/nst0 > >>>> AutoChanger = yes > >>>> AutomaticMount = yes > >>>> DeviceType = Tape > >>>> DriveIndex = 0 > >>>> LabelMedia = no > >>>> MediaType = LTO-6 > >>>> RandomAccess = no > >>>> RemovableMedia = yes > >>>> } > >>>> > >>>> Device { > >>>> Name = LRADrive-2 > >>>> Alert Command = "sh -c 'smartctl -H -l error %c'" > >>>> AlwaysOpen = yes > >>>> ArchiveDevice = /dev/nst1 > >>>> AutoChanger = yes > >>>> AutomaticMount = yes > >>>> DeviceType = Tape > >>>> DriveIndex = 1 > >>>> LabelMedia = no > >>>> MediaType = LTO-6 > >>>> RandomAccess = no > >>>> RemovableMedia = yes > >>>> } > >>>> > >>>> Autochanger { > >>>> Name = LogRepoAutochanger > >>>> ChangerCommand = "/usr/lib64/bacula/mtx-changer %c %o %S %a %d" > >>>> ChangerDevice = /dev/changer > >>>> Device = LRADrive-1 > >>>> Device = LRADrive-2 > >>>> } > >>>> > >>>> > >>>> I know there are some things that could be optimized here for > >>>> performance, and I'm certainly interested in them, but right now I > >>>> can't even label my tapes :) > >>>> > >>>> I suspect it's the long load delay, and I wasn't sure if maybe the > >>>> drive is searching for some mark or something. On that note, I tried > >>>> to do a "rewind" and "weof" using the /dev/st0 device (wasn't sure if > >>>> nst0 would complain about issuing a rewind), but I would get > >>>> "Input/Output error" messages from mt on both the rewind and weof > >>>> commands. > >>>> > >>>> Any advice I could get would be helpful. > >>>> > >>>> Thanks! > >>>> Andrew > >>>> > >>>> > >>>> > ------------------------------------------------------------------------------ > >>>> _______________________________________________ > >>>> Bacula-users mailing list > >>>> Bacula-users@lists.sourceforge.net > >>>> https://lists.sourceforge.net/lists/listinfo/bacula-users > >>> > >>> > > > > > ------------------------------------------------------------------------------ > > _______________________________________________ > > Bacula-users mailing list > > Bacula-users@lists.sourceforge.net > > https://lists.sourceforge.net/lists/listinfo/bacula-users > > > > -- > "Większej miłości nikt nie ma nad tę, jak gdy kto życie swoje kładzie > za przyjaciół swoich." Jezus Chrystus > > > ------------------------------------------------------------------------------ > _______________________________________________ > Bacula-users mailing list > Bacula-users@lists.sourceforge.net > https://lists.sourceforge.net/lists/listinfo/bacula-users >
------------------------------------------------------------------------------
_______________________________________________ Bacula-users mailing list Bacula-users@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/bacula-users