Hello,
The output from lsscsi looks odd. From what I see, I am not reassured
that both the tape drives are actually part one at a time and see if
physically the right tapes are mounted.
I also am a bit skeptical about using a 40GB maximum file size on your
LTO-4 -- that seems *much* larger than what we recommend to be an
optimal compromise between restore speed and write speed.
My experience on LTO-1 and LTO-4 drives is that 512K buffer sizes get
quite adequate performance so I am a bit skeptical about your need for
1MB buffers, but that said, they should be OK.
Also you should probably be using device independent device names rather
than /dev/nst0 and /dev/nst1 as depending on the boot, the devices could
get swapped -- the same goes for the /dev/sg4 name.
Others on this list should be able to help you with the details of my
suggestions ...
Best regards,
Kern
On 04/02/2018 01:53 PM, Sebastian Suchanek wrote:
Am 02.04.2018 um 09:54 schrieb Kern Sibbald:
Hello Kern,
thank you for your reply.
First, I would recommend that you use at *most* 1MB block sizes for
LT0-1 and LTO-4 tapes.
OK, I changed that for the LTO-4 drive. (I don't want go below 1MB
though, because it significantly reduces write rates.)
[...]
You haven't shown your full autochanger device configuration so it will
be hard/impossible to diagnose your problem.
No problem, here's my full bacula-sd.conf, only comments and passwords
are removed:
| Storage {
| Name = tigersclaw-sd
| SDPort = 9103
| WorkingDirectory = "/var/lib/bacula"
| Pid Directory = "/var/run/bacula"
| Maximum Concurrent Jobs = 20
| SDAddress = 10.1.0.1
| }
|
| Director {
| Name = tigersclaw-dir
| Password = <removed>
| }
|
| Director {
| Name = tigersclaw-mon
| Password = <removed>
| Monitor = yes
| }
|
| Device {
| Name = FileStorage
| Media Type = File
| Archive Device = /srv/bacula/file
| LabelMedia = yes
| Random Access = yes
| AutomaticMount = yes
| RemovableMedia = no
| AlwaysOpen = no
| }
|
| Autochanger {
| Name = Overland-NEO2000
| Device = LTO1-Drive-1
| Device = LTO4-Drive-1
| Changer Command = "/etc/bacula/scripts/mtx-changer %c %o %S %a %d"
| Changer Device = /dev/sg4
| }
|
| Device {
| Name = LTO1-Drive-1
| Drive Index = 0
| Media Type = LTO-1
| Archive Device = /dev/nst1
| AutomaticMount = yes
| AlwaysOpen = yes
| RemovableMedia = yes
| RandomAccess = no
| AutoChanger = yes
| Maximum File Size = 2GB
| Alert Command = "sh -c 'tapeinfo -f %c |grep TapeAlert|cat'"
| Spool Directory = "/srv/bacula/spool"
| }
|
| Device {
| Name = LTO4-Drive-1
| Drive Index = 1
| Media Type = LTO-4
| Archive Device = /dev/nst0
| AutomaticMount = yes
| AlwaysOpen = yes
| RemovableMedia = yes
| RandomAccess = no
| Maximum block size = 1MB
| Maximum File Size = 40GB
| AutoChanger = yes
| Alert Command = "sh -c 'tapeinfo -f %c |grep TapeAlert|cat'"
| Spool Directory = "/srv/bacula/spool"
| }
|
| Messages {
| Name = Standard
| director = tigersclaw-dir = all
| }
JFTR: "tigersclaw" is the name of my server which runs (among other
things) the Bacula director, storage daemon und also the client where
the backup job in question comes from. (I have more jobs configured,
also from other clients, but they are way to small to even fill a LTO-1
tape.)
Also the output from a:
lsscsi -g
would be necessary.
No problem either:
| # lsscsi -g
| [0:0:1:0] tape HP Ultrium 4-SCSI H63H /dev/st0 /dev/sg3
| [1:0:0:0] disk ATA Samsung SSD 850 2B6Q /dev/sda /dev/sg0
| [2:0:0:0] disk ATA WDC WD40EFRX-68W 0A82 /dev/sdb /dev/sg1
| [3:0:0:0] disk ATA WDC WD40EFRX-68W 0A82 /dev/sdc /dev/sg2
| [7:0:0:0] mediumx OVERLAND NEO Series 0616 /dev/sch0 /dev/sg4
| [7:0:0:1] tape SEAGATE ULTRIUM06242-XXX 1603 /dev/st1 /dev/sg5
| #
Note in general, if btape works then Bacula will work because the SD
uses the same subroutines that btape uses for reading/writing tapes.
Well, that's what I expected and that's why I'm so puzzled about this
error...
Consequently there may be some other problem. When the SD seems to be
stuck, you can probably get more information by doing:
bconsole
set debuglevel=200 storage=<director's name for storage daemon> trace=1
mount
set debuglevel=0 storage=<director's name for storage daemon> trace=0
then look at the trace file in your working directory to see what it
going on.
Here's the trace file from the beginning of the job until the point
where the first tape was full and Bacula got stuck:
https://suchanek.de/temp/tigersclaw-sd.trace (47kB)
And here ist what happend when I manually cancelled the stuck job (which
worked) and try to do a "release LTO4-Drive" command in bconsole. (Which
didn't work, i.e. Bacula got stuck here too.)
https://suchanek.de/temp/tigersclaw-sd.trace.2 (3kB)
I hope you can find anything usefull in these debug files, because I'm
totally lost here...
Best regards
Sebastian
------------------------------------------------------------------------------
Check out the vibrant tech community on one of the world's most
engaging tech sites, Slashdot.org! http://sdm.link/slashdot
_______________________________________________
Bacula-users mailing list
Bacula-users@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/bacula-users
------------------------------------------------------------------------------
Check out the vibrant tech community on one of the world's most
engaging tech sites, Slashdot.org! http://sdm.link/slashdot
_______________________________________________
Bacula-users mailing list
Bacula-users@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/bacula-users