This looks like a problem reported by the drive or the tape. I suggest setting the "Alert Command" option in bacula-sd.conf to run something like
/usr/sbin/smartctl -a /dev/nst0 -T verypermissive (or use the tapeinfo program) to check for TapeAlert messages after every backup. __Martin >>>>> On Mon, 17 Sep 2018 17:37:12 +0100, Kevin Hodges said: > > Martin > > found the following around the same time: > > > Sep 9 10:54:58 swlx1 kernel: st 1:0:0:0: [st0] Sense Key : Medium Error > [deferred] > Sep 9 10:54:58 swlx1 kernel: st 1:0:0:0: [st0] Add. Sense: Write append error > Sep 9 10:54:59 swlx1 kernel: st 1:0:0:0: [st0] Sense Key : Medium Error > [current] > Sep 9 10:54:59 swlx1 kernel: st 1:0:0:0: [st0] Add. Sense: Write append error > Sep 9 10:55:00 swlx1 kernel: st 1:0:0:0: [st0] Sense Key : Medium Error > [current] > Sep 9 10:55:00 swlx1 kernel: st 1:0:0:0: [st0] Add. Sense: Write append error > Sep 9 10:55:00 swlx1 kernel: st 1:0:0:0: [st0] Sense Key : Medium Error > [current] > Sep 9 10:55:00 swlx1 kernel: st 1:0:0:0: [st0] Add. Sense: Write append error > Sep 9 10:55:01 swlx1 kernel: st 1:0:0:0: [st0] Sense Key : Medium Error > [current] > Sep 9 10:55:01 swlx1 kernel: st 1:0:0:0: [st0] Add. Sense: Write append error > > Kevin > > On Mon, 2018-09-17 at 17:28 +0100, Martin Simmons wrote: > > The "ERR=Input/output error" can be caused by hardware problems, but > > I would > > not expect it from a network problem. If you have the syslog > > (e.g. /var/log/messages) from that time, I would check for errors > > there too. > > > > __Martin > > > > > > > > > > > On Mon, 17 Sep 2018 14:30:16 +0100, Kevin Hodges said: > > > > > > hi Martin > > > > > > found this in the log: > > > > > > Writing spooled data to Volume. Despooling 50,000,033,271 bytes ... > > > 09-Sep 10:55 swlx1.rdg.ac.uk-sd2 JobId 36: Error: block.c:255 Write > > > error at 1512:13125 on device "LTO-8" (/dev/nst0). ERR=Input/output > > > error. > > > 09-Sep 10:55 swlx1.rdg.ac.uk-sd2 JobId 36: Error: Error writing > > > final > > > EOF to tape. This Volume may not be readable. > > > tape_dev.c:941 ioctl MTWEOF error on "LTO-8" (/dev/nst0). > > > ERR=Input/output error. > > > > > > I've restarted the backup from scratch and so far it seems to have > > > got > > > past the same point of failure that occured last time, so fingers > > > crossed. There was some network issues around the time of the > > > failure! > > > > > > Regards > > > > > > Kevin > > > > > > On Mon, 2018-09-17 at 14:13 +0100, Martin Simmons wrote: > > > > When the director stopped at ~1.5TB, did it report any other > > > > messages > > > > (e.g. I/O errors)? > > > > > > > > I suggest looking in the system logs / console for messages > > > > around > > > > that time > > > > as well. > > > > > > > > __Martin > > > > > > > > > > > > > > > > > On Tue, 11 Sep 2018 10:30:31 +0100, Kevin Hodges said: > > > > > > > > > > hi > > > > > > > > > > I came across a problem recently after installing a new > > > > > single > > > > > tape > > > > > drive for backups. This is a HPE LTO-8 Ultrium machine > > > > > connected to > > > > > a > > > > > Redhat linux box: Linux swlx1.rdg.ac.uk 3.10.0- > > > > > 862.9.1.el7.x86_64 > > > > > > > > > > The problem occured whilst performing a backup that consists of > > > > > several > > > > > millions of files which are several TB in total size. The > > > > > backup > > > > > stopped after writing ~1.5TB with the director reporting the > > > > > volume > > > > > was > > > > > full and asking for a new labelled volume. LTO-8 should take at > > > > > least > > > > > 12TB (native). This was a surprise but I thought it might be a > > > > > tape > > > > > problem so I unmounted the tape and tried to load a new tape to > > > > > label > > > > > it and mount it to continue but I could not load the new blank > > > > > tape. > > > > > It seemed like the machine continually tried to load the tape > > > > > without > > > > > success and I had to keep pressing the eject button to extract > > > > > the > > > > > tape. > > > > > > > > > > Thinking this might be a hardware problem I stopped the backup > > > > > shutdown > > > > > the bacula daemons and ran all the vendor tests which came back > > > > > as > > > > > reporting no errors. On restarting the bacula daemons I found I > > > > > was > > > > > able to load the tapes again and re-start the backup. > > > > > > > > > > So my question is if this is not a hardware or tape problem > > > > > what > > > > > prevents me loading a new tape and labelling during an ongoing > > > > > backup > > > > > job, is there some way to pause the backup to allow a new tape > > > > > to > > > > > be > > > > > labelled? > > > > > > > > > > My storage config is: > > > > > > > > > > Device { > > > > > Name = LTO-8 > > > > > Media Type = LTO-8 > > > > > Archive Device = /dev/nst0 > > > > > AutomaticMount = yes; > > > > > AlwaysOpen = yes; > > > > > RemovableMedia = yes; > > > > > RandomAccess = no; > > > > > AutoChanger = no > > > > > Spool Directory = /opt/bacula/working2 > > > > > Maximum Spool Size = 100GB > > > > > Maximum Job Spool Size = 50GB > > > > > } > > > > > > > > > > Should the AutomaticMount be set to 'no' to stop attempts to > > > > > automatically mount any new tape even if it is not labelled? > > > > > > > > > > The issue of the tape being labelled full well before its > > > > > capacity > > > > > is > > > > > still a mystery. > > > > > > > > > > Thanks for any help > > > > > > > > > > Kevin _______________________________________________ Bacula-users mailing list Bacula-users@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/bacula-users