Re: [Bacula-users] bacula copy job fails when tape is full and needs to change

Andras Horvai Mon, 03 Dec 2018 10:09:03 -0800

Hi,

I know I am using an old version of bacula (7.0.2) and as I mentioned we
are working heavily to upgrade it to a 9.0.8 (shipped with latest ubuntu).
But I setup heartbeat and did not help....


and the interesting thing is this (when I check the director status):

 JobId  Level    Files      Bytes   Status   Finished        Name
====================================================================
  4960  Diff    300,927    203.2 G  Error    03-Dec-18 17:42 server-job
  4959  Full    300,927    203.2 G  Error    03-Dec-18 17:42 Archive
  4962  Diff    300,927    203.2 G  OK       03-Dec-18 18:36 server-job
  4961  Full    300,927    203.2 G  OK       03-Dec-18 18:36 Archive


so when I ran Archive job it started job 4959 and hooked 4960 to do the
copy from local disk (full backup jobs) to tape. (everything is local)
But tape got full so we needed to replace... and then job failed... seems
like jobs cannot span across tapes.
Then with new tape (where we did not have to replace tape) job was fine.
I think this is the key in the log but unfortunately I have no clue how to
solve it:

03-Dec 17:06 backup-sd JobId 4960: Error: The Volume=WORMW-1246 on
device="LTO-4" (/dev/nst0) appears to be unlabeled.
03-Dec 17:07 backup-sd JobId 4960: Labeled new Volume "WORMW-1246" on tape
device "LTO-4" (/dev/nst0).
03-Dec 17:07 backup-sd JobId 4960: Wrote label to prelabeled Volume
"WORMW-1246" on tape device "LTO-4" (/dev/nst0)
03-Dec 17:07 backup-sd JobId 4960: New volume "RCWORMW-1246" mounted on
device "LTO-4" (/dev/nst0) at 03-Dec-2018 17:07.

*03-Dec 17:41 backup-sd JobId 4960: Fatal error: append.c:149 Error reading
data header from FD. n=-2 msglen=0 ERR=Connection reset by peer*03-Dec
17:41 backup-sd JobId 4960: Elapsed time=00:46:48, Transfer rate=72.38 M
Bytes/second


and here looks like Archive jobs (4959) spans the tapes:

03-Dec 17:42 backup1 JobId 4959: Error: Bacula backup1 7.0.5 (28Jul14):
  Build OS:               x86_64-pc-linux-gnu ubuntu 16.04
  Prev Backup JobId:      4933
  Prev Backup Job:        server-job.2018-12-01_02.00.00_42
  New Backup JobId:       4960
  Current JobId:          4959
  Current Job:            Archive.2018-12-03_16.02.06_10
  Backup Level:           Full
  Client:                 None
  FileSet:                "None" 2017-06-19 09:00:00
  Read Pool:              "ServersWeeklyFullFile" (From Job resource)
  Read Storage:           "File" (From Pool resource)
  Write Pool:             "TapeArchive" (From Pool's NextPool resource)
  Write Storage:          "LTO-4" (From Pool's NextPool resource)
  Catalog:                "MyCatalog" (From Client resource)
  Start time:             03-Dec-2018 16:02:09
  End time:               03-Dec-2018 17:42:00
  Elapsed time:           1 hour 39 mins 51 secs
  Priority:               13
  SD Files Written:       300,927
  SD Bytes Written:       203,268,700,917 (203.2 GB)
  Rate:                   33929.0 KB/s

*  Volume name(s):         WORMW-1245|WORMW-1246*  Volume Session Id:
66
  Volume Session Time:    1543248344
  Last Volume Bytes:      151,096,393,728 (151.0 GB)
  SD Errors:              0
  SD termination status:  OK
  Termination:            *** Copying Error ***

Any suggestions are welcomed! Thanks for your help!

Andras


On Mon, Nov 26, 2018 at 12:25 PM Andras Horvai <andras.hor...@gmail.com>
wrote:

> Thanks Kern! I will dig deeper in documentation. First I try setup
> heartbeat in SD's config!
>
> On Mon, Nov 26, 2018 at 11:33 AM Kern Sibbald <k...@sibbald.com> wrote:
>
>> Oh there are at least five different places to setup Heart Beat Interval
>> (Dir, SD, and FD).  Unfortunately my memory is not good enough to remember
>> them all. Please ask others or see the documentation ...
>>
>> The easiest way is to get on a current version -- e.g. 9.2.2 where it is
>> done by defaut.
>>
>> Best regards,
>> Kern
>>
>> On 11/26/18 11:13 AM, Andras Horvai wrote:
>>
>> Hello Kern,
>>
>> yes you are right I am using bacula 7.0.5 shipped with Ubuntu 16.04.
>> Where should I setup heartbeat interval? On SD's or FD's config? Or both?
>>
>> Thanks for your help!
>>
>> Andras
>>
>> On Mon, Nov 26, 2018 at 10:56 AM Kern Sibbald <k...@sibbald.com> wrote:
>>
>>> Hello,
>>>
>>> If I remember right you are running on a *very* old Bacula, and the
>>> problem seems to be that the backup takes more than 2 hours.  One of your
>>> comm lines (SD <-> FD) times out.  I mention your old version because newer
>>> Bacula's automatically fix this problem by turning on Heart Beat Interval =
>>> 300, which is very likely to resolve your problem.
>>>
>>> Best regards,
>>> Kern
>>>
>>> On 11/26/18 10:34 AM, Andras Horvai wrote:
>>>
>>> Hi Tilman,
>>>
>>> thank you for your answer! But unfortunately the firewall cannot be a
>>> problem here :)
>>> The problem happens only with Copy Jobs. The SD and the FD is on the
>>> same device. There is no firewall on the machine.
>>> So what I am doing is the following:
>>>
>>> during weekend I do full backup with the backup server to file storage
>>> on the backup server. Then starting from Monday I am doing Copy Job from
>>> the backup
>>> server to a Tape device connected to the backup server. This works
>>> pretty well till Tape does not get full. When Tape gets full bacula asks
>>> for another tape.
>>> We replace the tape, so job would continue (as expected) but then at the
>>> end we got the job error... So I am puzzled what is wrong.
>>>
>>> Please feel free to share your ideas...
>>>
>>> Thanks,
>>>
>>> Andras
>>>
>>> On Sun, Nov 25, 2018 at 10:28 PM Tilman Schmidt <til...@imap.cc>
>>> <til...@imap.cc> wrote:
>>>
>>>> Hi Andras,
>>>>
>>>> is there a firewall between the client and the SD?
>>>> The message
>>>>
>>>> > 20-Nov 12:25 backup-sd JobId 4845: Fatal error: append.c:223 Network
>>>> error reading from FD. ERR=Connection reset by peer
>>>>
>>>> looks suspiciously like a firewall killing the FD - SD connection
>>>> because it sees it as idle.
>>>>
>>>> HTH
>>>> Tilman
>>>>
>>>> Am 22.11.2018 um 16:04 schrieb Andras Horvai:
>>>> > Dear list,
>>>> >
>>>> > I have to following problem:
>>>> > We use copy jobs to copy weekly full backups to WORM tape but when a
>>>> tape
>>>> > gets filled and needs to change the copy job failed. Bacula says
>>>> > intervention is
>>>> > needed so we put a new tape in the tape drive. What can be the
>>>> problem?
>>>> >
>>>> > Copy job report:
>>>> > 20-Nov 12:25 backup-sd JobId 4838: End of Volume at file 0 on device
>>>> > "FileStorage" (/backup), Volume "FILEW-0542"
>>>> > 20-Nov 12:25 backup-sd JobId 4838: End of all volumes.
>>>> > 20-Nov 12:25 backup-sd JobId 4838: Elapsed time=02:45:29, Transfer
>>>> > rate=42.14 M Bytes/second
>>>> > 20-Nov 12:25 backup1 JobId 4838: Error: Bacula backup1 7.0.5
>>>> (28Jul14):
>>>> >   Build OS:               x86_64-pc-linux-gnu ubuntu 16.04
>>>> >   Prev Backup JobId:      4837
>>>> >   Prev Backup Job:        db1-job.2018-11-19_23.09.19_03
>>>> >   New Backup JobId:       4845
>>>> >   Current JobId:          4838
>>>> >   Current Job:            Archive.2018-11-20_07.59.53_05
>>>> >   Backup Level:           Full
>>>> >   Client:                 None
>>>> >   FileSet:                "None" 2017-06-19 09:00:00
>>>> >   Read Pool:              "ServersWeeklyFullFile" (From Job resource)
>>>> >   Read Storage:           "File" (From Pool resource)
>>>> >   Write Pool:             "TapeArchive" (From Pool's NextPool
>>>> resource)
>>>> >   Write Storage:          "LTO-4" 20-Nov 09:39 backup1 JobId 4845:
>>>> Using
>>>> > Device "LTO-4" to write.
>>>> > 20-Nov 12:01 backup-sd JobId 4845: End of Volume "WORMW-1242" at
>>>> > 386:27137 on device "LTO-4" (/dev/nst0). Write of 64512 bytes got -1.
>>>> > 20-Nov 12:01 backup-sd JobId 4845: Re-read of last block succeeded.
>>>> > 20-Nov 12:01 backup-sd JobId 4845: End of medium on Volume
>>>> "WORMW-1242"
>>>> > Bytes=764,853,046,272 Blocks=11,855,980 at 20-Nov-2018 12:01.
>>>> > 20-Nov 12:01 backup1 JobId 4845: Created new Volume="WORMW-1243",
>>>> > Pool="TapeArchive", MediaType="LTO-4" in catalog.
>>>> > 20-Nov 12:01 backup-sd JobId 4845: Please mount append Volume
>>>> > "WORMW-1243" or label a new one for:
>>>> >     Job:          db1-job.2018-11-20_07.59.54_12
>>>> >     Storage:      "LTO-4" (/dev/nst0)
>>>> >     Pool:         TapeArchive
>>>> >     Media type:   LTO-4
>>>> > 20-Nov 12:15 backup-sd JobId 4845: Error: The Volume=WORMW-1243 on
>>>> > device="LTO-4" (/dev/nst0) appears to be unlabeled.
>>>> > 20-Nov 12:15 backup-sd JobId 4845: Labeled new Volume "WORMW-1243" on
>>>> > tape device "LTO-4" (/dev/nst0).
>>>> > 20-Nov 12:15 backup-sd JobId 4845: Wrote label to prelabeled Volume
>>>> > "WORMW-1243" on tape device "LTO-4" (/dev/nst0)
>>>> > 20-Nov 12:15 backup-sd JobId 4845: New volume "WORMW-1243" mounted on
>>>> > device "LTO-4" (/dev/nst0) at 20-Nov-2018 12:15.
>>>> > 20-Nov 12:25 backup-sd JobId 4845: Fatal error: append.c:223 Network
>>>> > error reading from FD. ERR=Connection reset by peer
>>>> > 20-Nov 12:25 backup-sd JobId 4845: Elapsed time=02:31:15, Transfer
>>>> > rate=46.11 M Bytes/second
>>>> > (From Pool's NextPool resource)
>>>> >   Catalog:                "MyCatalog" (From Client resource)
>>>> >   Start time:             20-Nov-2018 09:39:31
>>>> >   End time:               20-Nov-2018 12:25:04
>>>> >   Elapsed time:           2 hours 45 mins 33 secs
>>>> >   Priority:               13
>>>> >   SD Files Written:       4,792
>>>> >   SD Bytes Written:       418,488,802,122 (418.4 GB)
>>>> >   Rate:                   42131.2 KB/s
>>>> >   Volume name(s):         WORMW-1242|WORMW-1243
>>>> >   Volume Session Id:      9
>>>> >   Volume Session Time:    1542631131
>>>> >   Last Volume Bytes:      33,060,787,200 (33.06 GB)
>>>> >   SD Errors:              0
>>>> >   SD termination status:  OK
>>>> >   Termination:            *** Copying Error ***
>>>> >
>>>> > Regarding copy jobs the FD and the SD are on the same machine.
>>>> >
>>>> > we are using:
>>>> >
>>>> > Distributor ID: Ubuntu
>>>> > Description:    Ubuntu 16.04.4 LTS
>>>> > Release:        16.04
>>>> > Codename:       xenial
>>>> >
>>>> > bacula 7.0.5
>>>> >
>>>> > Tape drive: HP       Ultrium 4-SCSI
>>>> >
>>>> >
>>>> > Thanks for help,
>>>> >
>>>> > Andras
>>>> >
>>>>
>>>>
>>>> _______________________________________________
>>>> Bacula-users mailing list
>>>> Bacula-users@lists.sourceforge.net
>>>> https://lists.sourceforge.net/lists/listinfo/bacula-users
>>>>
>>>
>>>
>>> _______________________________________________
>>> Bacula-users mailing 
>>> listBacula-users@lists.sourceforge.nethttps://lists.sourceforge.net/lists/listinfo/bacula-users
>>>
>>>
>>>
>>
>> _______________________________________________
>> Bacula-users mailing 
>> listBacula-users@lists.sourceforge.nethttps://lists.sourceforge.net/lists/listinfo/bacula-users
>>
>>
>>

_______________________________________________
Bacula-users mailing list
Bacula-users@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/bacula-users

Re: [Bacula-users] bacula copy job fails when tape is full and needs to change

Reply via email to