Re: [Bacula-users] tape problem

Martin Simmons Fri, 21 Sep 2018 04:25:49 -0700

This looks like a problem reported by the drive or the tape.  I suggest
setting the "Alert Command" option in bacula-sd.conf to run something like


/usr/sbin/smartctl -a /dev/nst0 -T verypermissive

(or use the tapeinfo program) to check for TapeAlert messages after every
backup.

__Martin


>>>>> On Mon, 17 Sep 2018 17:37:12 +0100, Kevin Hodges said:
> 
> Martin
> 
>    found the following around the same time:
> 
> 
> Sep  9 10:54:58 swlx1 kernel: st 1:0:0:0: [st0] Sense Key : Medium Error 
> [deferred] 
> Sep  9 10:54:58 swlx1 kernel: st 1:0:0:0: [st0] Add. Sense: Write append error
> Sep  9 10:54:59 swlx1 kernel: st 1:0:0:0: [st0] Sense Key : Medium Error 
> [current] 
> Sep  9 10:54:59 swlx1 kernel: st 1:0:0:0: [st0] Add. Sense: Write append error
> Sep  9 10:55:00 swlx1 kernel: st 1:0:0:0: [st0] Sense Key : Medium Error 
> [current] 
> Sep  9 10:55:00 swlx1 kernel: st 1:0:0:0: [st0] Add. Sense: Write append error
> Sep  9 10:55:00 swlx1 kernel: st 1:0:0:0: [st0] Sense Key : Medium Error 
> [current] 
> Sep  9 10:55:00 swlx1 kernel: st 1:0:0:0: [st0] Add. Sense: Write append error
> Sep  9 10:55:01 swlx1 kernel: st 1:0:0:0: [st0] Sense Key : Medium Error 
> [current] 
> Sep  9 10:55:01 swlx1 kernel: st 1:0:0:0: [st0] Add. Sense: Write append error
> 
> Kevin
> 
> On Mon, 2018-09-17 at 17:28 +0100, Martin Simmons wrote:
> > The "ERR=Input/output error" can be caused by hardware problems, but
> > I would
> > not expect it from a network problem.  If you have the syslog
> > (e.g. /var/log/messages) from that time, I would check for errors
> > there too.
> > 
> > __Martin
> > 
> > 
> > > > > > > On Mon, 17 Sep 2018 14:30:16 +0100, Kevin Hodges said:
> > > 
> > > hi Martin
> > > 
> > >    found this in the log:
> > > 
> > > Writing spooled data to Volume. Despooling 50,000,033,271 bytes ...
> > > 09-Sep 10:55 swlx1.rdg.ac.uk-sd2 JobId 36: Error: block.c:255 Write
> > > error at 1512:13125 on device "LTO-8" (/dev/nst0). ERR=Input/output
> > > error.
> > > 09-Sep 10:55 swlx1.rdg.ac.uk-sd2 JobId 36: Error: Error writing
> > > final
> > > EOF to tape. This Volume may not be readable.
> > > tape_dev.c:941 ioctl MTWEOF error on "LTO-8" (/dev/nst0).
> > > ERR=Input/output error.
> > > 
> > > I've restarted the backup from scratch and so far it seems to have
> > > got
> > > past the same point of failure that occured last time, so fingers
> > > crossed. There was some network issues around the time of the
> > > failure!
> > > 
> > > Regards
> > > 
> > > Kevin
> > > 
> > > On Mon, 2018-09-17 at 14:13 +0100, Martin Simmons wrote:
> > > > When the director stopped at ~1.5TB, did it report any other
> > > > messages
> > > > (e.g. I/O errors)?
> > > > 
> > > > I suggest looking in the system logs / console for messages
> > > > around
> > > > that time
> > > > as well.
> > > > 
> > > > __Martin
> > > > 
> > > > 
> > > > > > > > > On Tue, 11 Sep 2018 10:30:31 +0100, Kevin Hodges said:
> > > > > 
> > > > > hi
> > > > > 
> > > > >    I came across a problem recently after installing a new
> > > > > single
> > > > > tape
> > > > > drive for backups. This is a HPE LTO-8 Ultrium machine
> > > > > connected to
> > > > > a
> > > > > Redhat linux box: Linux swlx1.rdg.ac.uk 3.10.0-
> > > > > 862.9.1.el7.x86_64
> > > > > 
> > > > > The problem occured whilst performing a backup that consists of
> > > > > several
> > > > > millions of files which are several TB in total size. The
> > > > > backup
> > > > > stopped after writing ~1.5TB with the director reporting the
> > > > > volume
> > > > > was
> > > > > full and asking for a new labelled volume. LTO-8 should take at
> > > > > least
> > > > > 12TB (native). This was a surprise but I thought it might be a
> > > > > tape
> > > > > problem so I unmounted the tape and tried to load a new tape to
> > > > > label
> > > > > it and mount it to continue but I could not load the new blank
> > > > > tape.
> > > > > It seemed like the machine continually tried to load the tape
> > > > > without
> > > > > success and I had to keep pressing the eject button to extract
> > > > > the
> > > > > tape.
> > > > > 
> > > > > Thinking this might be a hardware problem I stopped the backup
> > > > > shutdown
> > > > > the bacula daemons and ran all the vendor tests which came back
> > > > > as
> > > > > reporting no errors. On restarting the bacula daemons I found I
> > > > > was
> > > > > able to load the tapes again and re-start the backup.
> > > > > 
> > > > > So my question is if this is not a hardware or tape problem
> > > > > what
> > > > > prevents me loading a new tape and labelling during an ongoing
> > > > > backup
> > > > > job, is there some way to pause the backup to allow a new tape
> > > > > to
> > > > > be
> > > > > labelled?
> > > > > 
> > > > > My storage config is:
> > > > > 
> > > > > Device {
> > > > >   Name = LTO-8
> > > > >   Media Type = LTO-8
> > > > >   Archive Device = /dev/nst0
> > > > >   AutomaticMount = yes;              
> > > > >   AlwaysOpen = yes;
> > > > >   RemovableMedia = yes;
> > > > >   RandomAccess = no;
> > > > >   AutoChanger = no
> > > > >   Spool Directory = /opt/bacula/working2 
> > > > >   Maximum Spool Size = 100GB 
> > > > >   Maximum Job Spool Size  = 50GB
> > > > > }
> > > > > 
> > > > > Should the AutomaticMount be set to 'no' to stop attempts to
> > > > > automatically mount any new tape even if it is not labelled?
> > > > > 
> > > > > The issue of the tape being labelled full well before its
> > > > > capacity
> > > > > is
> > > > > still a mystery.
> > > > > 
> > > > > Thanks for any help
> > > > > 
> > > > > Kevin


_______________________________________________
Bacula-users mailing list
Bacula-users@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/bacula-users

Re: [Bacula-users] tape problem

Reply via email to