Hello Dan,
Thank you for you kind remarks.
Just for information: In reality Bacula uses the block sizes you
configure (or the default). You can cause Bacula to write fixed sized
blocks, but modern tape drives (LTO and similar) handle variable sizes
quite well, so by default, Bacula will attempt to block everything to
the maximum block size set, but if there is a reason (e.g. last block of
a job), Bacula will write smaller blocks. This has never been a problem
to the best of my knowledge. I suspect that tar does not check all
return statuses as well as it should. Baculla's concept is to do
everything possible (not all is practical though) to ensure that writes
are valid so that restores will work.
Best regards,
Kern
On 06/11/2018 09:44 PM, Stieneke, Dan wrote:
Thanks all for your input & confirming it pretty much had to be a hardware
problem.
In the interest of completeness / helping the next person who's googling for
answers, reseating the SCSI card fixed it - it just completed a 900GB backup
w/out any problems, onto one of the same tapes that it had rejected before
after only a few GB.
Now that the major problem has been solved, I'm still curious about why Bacula ran into
the (real!) hardware issue where tar did not. The tar tape was software compressed
& then software encrypted, so the restore had to successfully decrypt & then
decompress the data, so there couldn't have been any bit errors on that tar tape. This
was true four months ago, with the sketchy cable, and this time, with the SCSI card
that needed re-seated. Are fixed-size (tar) blocks just a little bit more robust than
variable-sized (Bacula) blocks?
And thanks, Kern, for an outstanding product.
Dan Stieneke
IT Specialist
USDA - ARS - NWISRL
3793 N 3600 E
Kimberly, ID 83341
208/423-6519
-----Original Message-----
From: Kern Sibbald [mailto:k...@sibbald.com]
Sent: Saturday, June 9, 2018 4:16 AM
To: Stieneke, Dan <dan.stien...@ars.usda.gov>
Cc: bacula-users@lists.sourceforge.net
Subject: Re: [Bacula-users] Bacula h/w write fails, but tar writes w/out error?
Hello,
Well, Bacula does not check what was written from time to time, but when it reaches the
end of the tape, Bacula will re-read the last block written to make sure it corresponds
to what it wrote, then it writes a double end of file. In your case, something is going
wrong -- either there is a hardware error, or there is really an end of tape marker that
is telling Bacula that the tape is full. From what you write, it looks more like a
hardware error, and the kernel logs that you show below indicate that something serious
is wrong with your tape drive. While Bacula is writing you should never see such
messages, and when they occur, Bacula will receive a write error. Everything is
consistent with a hardware problem. You may get a better idea of what is going on by
running the "btape test" command. Please see the manual for instructions on
how to run it. I recommend both the test, and the fill commands. Note: both of these
commands will write on the tape. Prior to using a tape with btape, if it has been
labeled by Bacula, you should rewind the tape and write one or two eof marks at the
beginning so that btape will take it as a blank tape.
If both btape "test" and "fill" work, you should not have problems with failing
Bacula backups. If either one of those tests fail, you must fix it prior to trying to backup on
tape with Bacula.
Best regards,
Kern
On 06/08/2018 07:48 PM, Stieneke, Dan wrote:
@ Dan Langille - yes, I think it is an issue with the tape drive, but only
Bacula runs into it; tar does not.
@Martin Simmons - of course I should have checked/reported the log, sorry.
=======BEGIN SYSLOG
======================================================================
======== Jun 4 08:06:11 SRVName kernel: [410468.465702] st0: Sense
Key : Unit Attention [current] Jun 4 08:06:11 SRVName kernel:
[410468.465714] st0: Add. Sense: Power on, reset, or bus device reset
occurred Jun 4 08:10:47 SRVName kernel: [410744.629015] st0: Sense
Key : Unit Attention [current] Jun 4 08:10:47 SRVName kernel:
[410744.629026] st0: Add. Sense: Power on, reset, or bus device reset
occurred Jun 4 08:14:02 SRVName kernel: [410939.819168] st0: Sense
Key : Unit Attention [current] Jun 4 08:14:02 SRVName kernel:
[410939.819180] st0: Add. Sense: Power on, reset, or bus device reset
occurred Jun 4 08:16:57 SRVName kernel: [411114.538975] st0: Sense
Key : Unit Attention [current] Jun 4 08:16:57 SRVName kernel:
[411114.538988] st0: Add. Sense: Power on, reset, or bus device reset
occurred =======END SYSLOG
======================================================================
========
Googling for those entries I found
http://bacula.10910.n7.nabble.com/Bacula-tapes-marked-FULL-too-early-VolBytes-too-low-td58881i20.html.
Similar issue (but no report of tar), the thread ended with "similar problem went away with
replaced drive" & "get your drive tested"
From the Bacula log ("Error: Re-read of last block OK, but block numbers differ.
Read block=990557 Want block=990558.") it looks like Bacula checks up on what has
been written every so often. I don't think tar does that; it just streams to tape. If my
card/cable/tape is only slightly flaky, is it reasonable to think that this extra work
pushes it over the edge? Or am I barking up the wrong tree?
Thanks,
Dan Stieneke
----- from Dan Langille -----
If it is all tapes, is the issue with the tape drive?
----- from Martin Simmons -----
Check the syslog and system console for error messages about the tape device
(since Bacula saw Input/output error, that usually means some error on the
device).
On Thu, 7 Jun 2018 15:38:13 +0000, Stieneke, Dan said:
The job ate through 4 tapes, with only 2 - 60GB on each tape. Then it hit
recycle limits and was asking for more media.
These are used tapes, but I can't see 4 consecutive tapes going bad at the same
time.
Incidentally, this is the same behavior I saw 4 months ago, and at that time I
did test bacula to a brand-new tape, which also failed quickly.
Thanks,
Dan
From: Josh Fisher [mailto:jfis...@pvct.com]
Sent: Wednesday, June 6, 2018 5:18 AM
To: Stieneke, Dan <dan.stien...@ars.usda.gov>;
'bacula-users@lists.sourceforge.net'
<bacula-users@lists.sourceforge.net>
Subject: Re: [Bacula-users] Bacula h/w write fails, but tar writes w/out error?
On 6/5/2018 3:45 PM, Stieneke, Dan wrote:
Ubuntu 16.04, Bacula 5.2.6, single-drive autoloader, all running Bacula
trouble-free for years.
Four months ago I got some errors in Bacula that looked like h/w errors,
although jobs using tar on the same drive ran without error. I had suspicions
about a cable, and when I replaced it everything returned to normal, until now,
when I'm getting the same kinds of errors.
Tar works on the same drive, but what about on the same tape? How do you know
you are not seeing bad tapes?
The relevant part of "messages" is:
= = = = = = = = = = = = = = = = = =
05-Jun 09:17 xxx-sd JobId 794: Error: block.c:577 Write error at 12:60511 on device
"Ultrium-TD4" (/dev/tape/by-id/scsi-1IBM_ULTRIUM-TD4_1310010391-nst).
ERR=Input/output error.
05-Jun 09:18 xxx-sd JobId 794: Error: Re-read of last block OK, but block
numbers differ. Read block=990557 Want block=990558.
05-Jun 09:18 xxx-sd JobId 794: End of medium on Volume "A00030L4"
Bytes=63,902,942,208 Blocks=990,558 at 05-Jun-2018 09:18.
05-Jun 09:18 xxx-sd JobId 794: 3307 Issuing autochanger "unload slot 16, drive
0" command.
= = = = = = = = = = = = = = = = = =
As you can see, it had an error after about 64GB (of an 800GB native / 1600GB
compressed tape).
I've cleaned the drive. And again, backups made with tar record without error
and restore without error.
Any ideas?
Thanks,
Dan Stieneke
IT Specialist
USDA - ARS - NWISRL
3793 N 3600 E
Kimberly, ID 83341
This electronic message contains information generated by the USDA solely for
the intended recipients. Any unauthorized interception of this message or the
use or disclosure of the information it contains may violate the law and
subject the violator to civil or criminal penalties. If you believe you have
received this message in error, please notify the sender and delete the email
immediately.
---------------------------------------------------------------------
-
--------
Check out the vibrant tech community on one of the world's most
engaging tech sites, Slashdot.org! http://sdm.link/slashdot
_______________________________________________
Bacula-users mailing list
Bacula-users@lists.sourceforge.net<mailto:Bacula-users@lists.sourcefo
r
ge.net>
https://lists.sourceforge.net/lists/listinfo/bacula-users
----------------------------------------------------------------------
-------- Check out the vibrant tech community on one of the world's
most engaging tech sites, Slashdot.org! http://sdm.link/slashdot
_______________________________________________
Bacula-users mailing list
Bacula-users@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/bacula-users
------------------------------------------------------------------------------
Check out the vibrant tech community on one of the world's most
engaging tech sites, Slashdot.org! http://sdm.link/slashdot
_______________________________________________
Bacula-users mailing list
Bacula-users@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/bacula-users
------------------------------------------------------------------------------
Check out the vibrant tech community on one of the world's most
engaging tech sites, Slashdot.org! http://sdm.link/slashdot
_______________________________________________
Bacula-users mailing list
Bacula-users@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/bacula-users