Hi, I know I am using an old version of bacula (7.0.2) and as I mentioned we are working heavily to upgrade it to a 9.0.8 (shipped with latest ubuntu). But I setup heartbeat and did not help....
and the interesting thing is this (when I check the director status): JobId Level Files Bytes Status Finished Name ==================================================================== 4960 Diff 300,927 203.2 G Error 03-Dec-18 17:42 server-job 4959 Full 300,927 203.2 G Error 03-Dec-18 17:42 Archive 4962 Diff 300,927 203.2 G OK 03-Dec-18 18:36 server-job 4961 Full 300,927 203.2 G OK 03-Dec-18 18:36 Archive so when I ran Archive job it started job 4959 and hooked 4960 to do the copy from local disk (full backup jobs) to tape. (everything is local) But tape got full so we needed to replace... and then job failed... seems like jobs cannot span across tapes. Then with new tape (where we did not have to replace tape) job was fine. I think this is the key in the log but unfortunately I have no clue how to solve it: 03-Dec 17:06 backup-sd JobId 4960: Error: The Volume=WORMW-1246 on device="LTO-4" (/dev/nst0) appears to be unlabeled. 03-Dec 17:07 backup-sd JobId 4960: Labeled new Volume "WORMW-1246" on tape device "LTO-4" (/dev/nst0). 03-Dec 17:07 backup-sd JobId 4960: Wrote label to prelabeled Volume "WORMW-1246" on tape device "LTO-4" (/dev/nst0) 03-Dec 17:07 backup-sd JobId 4960: New volume "RCWORMW-1246" mounted on device "LTO-4" (/dev/nst0) at 03-Dec-2018 17:07. *03-Dec 17:41 backup-sd JobId 4960: Fatal error: append.c:149 Error reading data header from FD. n=-2 msglen=0 ERR=Connection reset by peer*03-Dec 17:41 backup-sd JobId 4960: Elapsed time=00:46:48, Transfer rate=72.38 M Bytes/second and here looks like Archive jobs (4959) spans the tapes: 03-Dec 17:42 backup1 JobId 4959: Error: Bacula backup1 7.0.5 (28Jul14): Build OS: x86_64-pc-linux-gnu ubuntu 16.04 Prev Backup JobId: 4933 Prev Backup Job: server-job.2018-12-01_02.00.00_42 New Backup JobId: 4960 Current JobId: 4959 Current Job: Archive.2018-12-03_16.02.06_10 Backup Level: Full Client: None FileSet: "None" 2017-06-19 09:00:00 Read Pool: "ServersWeeklyFullFile" (From Job resource) Read Storage: "File" (From Pool resource) Write Pool: "TapeArchive" (From Pool's NextPool resource) Write Storage: "LTO-4" (From Pool's NextPool resource) Catalog: "MyCatalog" (From Client resource) Start time: 03-Dec-2018 16:02:09 End time: 03-Dec-2018 17:42:00 Elapsed time: 1 hour 39 mins 51 secs Priority: 13 SD Files Written: 300,927 SD Bytes Written: 203,268,700,917 (203.2 GB) Rate: 33929.0 KB/s * Volume name(s): WORMW-1245|WORMW-1246* Volume Session Id: 66 Volume Session Time: 1543248344 Last Volume Bytes: 151,096,393,728 (151.0 GB) SD Errors: 0 SD termination status: OK Termination: *** Copying Error *** Any suggestions are welcomed! Thanks for your help! Andras On Mon, Nov 26, 2018 at 12:25 PM Andras Horvai <andras.hor...@gmail.com> wrote: > Thanks Kern! I will dig deeper in documentation. First I try setup > heartbeat in SD's config! > > On Mon, Nov 26, 2018 at 11:33 AM Kern Sibbald <k...@sibbald.com> wrote: > >> Oh there are at least five different places to setup Heart Beat Interval >> (Dir, SD, and FD). Unfortunately my memory is not good enough to remember >> them all. Please ask others or see the documentation ... >> >> The easiest way is to get on a current version -- e.g. 9.2.2 where it is >> done by defaut. >> >> Best regards, >> Kern >> >> On 11/26/18 11:13 AM, Andras Horvai wrote: >> >> Hello Kern, >> >> yes you are right I am using bacula 7.0.5 shipped with Ubuntu 16.04. >> Where should I setup heartbeat interval? On SD's or FD's config? Or both? >> >> Thanks for your help! >> >> Andras >> >> On Mon, Nov 26, 2018 at 10:56 AM Kern Sibbald <k...@sibbald.com> wrote: >> >>> Hello, >>> >>> If I remember right you are running on a *very* old Bacula, and the >>> problem seems to be that the backup takes more than 2 hours. One of your >>> comm lines (SD <-> FD) times out. I mention your old version because newer >>> Bacula's automatically fix this problem by turning on Heart Beat Interval = >>> 300, which is very likely to resolve your problem. >>> >>> Best regards, >>> Kern >>> >>> On 11/26/18 10:34 AM, Andras Horvai wrote: >>> >>> Hi Tilman, >>> >>> thank you for your answer! But unfortunately the firewall cannot be a >>> problem here :) >>> The problem happens only with Copy Jobs. The SD and the FD is on the >>> same device. There is no firewall on the machine. >>> So what I am doing is the following: >>> >>> during weekend I do full backup with the backup server to file storage >>> on the backup server. Then starting from Monday I am doing Copy Job from >>> the backup >>> server to a Tape device connected to the backup server. This works >>> pretty well till Tape does not get full. When Tape gets full bacula asks >>> for another tape. >>> We replace the tape, so job would continue (as expected) but then at the >>> end we got the job error... So I am puzzled what is wrong. >>> >>> Please feel free to share your ideas... >>> >>> Thanks, >>> >>> Andras >>> >>> On Sun, Nov 25, 2018 at 10:28 PM Tilman Schmidt <til...@imap.cc> >>> <til...@imap.cc> wrote: >>> >>>> Hi Andras, >>>> >>>> is there a firewall between the client and the SD? >>>> The message >>>> >>>> > 20-Nov 12:25 backup-sd JobId 4845: Fatal error: append.c:223 Network >>>> error reading from FD. ERR=Connection reset by peer >>>> >>>> looks suspiciously like a firewall killing the FD - SD connection >>>> because it sees it as idle. >>>> >>>> HTH >>>> Tilman >>>> >>>> Am 22.11.2018 um 16:04 schrieb Andras Horvai: >>>> > Dear list, >>>> > >>>> > I have to following problem: >>>> > We use copy jobs to copy weekly full backups to WORM tape but when a >>>> tape >>>> > gets filled and needs to change the copy job failed. Bacula says >>>> > intervention is >>>> > needed so we put a new tape in the tape drive. What can be the >>>> problem? >>>> > >>>> > Copy job report: >>>> > 20-Nov 12:25 backup-sd JobId 4838: End of Volume at file 0 on device >>>> > "FileStorage" (/backup), Volume "FILEW-0542" >>>> > 20-Nov 12:25 backup-sd JobId 4838: End of all volumes. >>>> > 20-Nov 12:25 backup-sd JobId 4838: Elapsed time=02:45:29, Transfer >>>> > rate=42.14 M Bytes/second >>>> > 20-Nov 12:25 backup1 JobId 4838: Error: Bacula backup1 7.0.5 >>>> (28Jul14): >>>> > Build OS: x86_64-pc-linux-gnu ubuntu 16.04 >>>> > Prev Backup JobId: 4837 >>>> > Prev Backup Job: db1-job.2018-11-19_23.09.19_03 >>>> > New Backup JobId: 4845 >>>> > Current JobId: 4838 >>>> > Current Job: Archive.2018-11-20_07.59.53_05 >>>> > Backup Level: Full >>>> > Client: None >>>> > FileSet: "None" 2017-06-19 09:00:00 >>>> > Read Pool: "ServersWeeklyFullFile" (From Job resource) >>>> > Read Storage: "File" (From Pool resource) >>>> > Write Pool: "TapeArchive" (From Pool's NextPool >>>> resource) >>>> > Write Storage: "LTO-4" 20-Nov 09:39 backup1 JobId 4845: >>>> Using >>>> > Device "LTO-4" to write. >>>> > 20-Nov 12:01 backup-sd JobId 4845: End of Volume "WORMW-1242" at >>>> > 386:27137 on device "LTO-4" (/dev/nst0). Write of 64512 bytes got -1. >>>> > 20-Nov 12:01 backup-sd JobId 4845: Re-read of last block succeeded. >>>> > 20-Nov 12:01 backup-sd JobId 4845: End of medium on Volume >>>> "WORMW-1242" >>>> > Bytes=764,853,046,272 Blocks=11,855,980 at 20-Nov-2018 12:01. >>>> > 20-Nov 12:01 backup1 JobId 4845: Created new Volume="WORMW-1243", >>>> > Pool="TapeArchive", MediaType="LTO-4" in catalog. >>>> > 20-Nov 12:01 backup-sd JobId 4845: Please mount append Volume >>>> > "WORMW-1243" or label a new one for: >>>> > Job: db1-job.2018-11-20_07.59.54_12 >>>> > Storage: "LTO-4" (/dev/nst0) >>>> > Pool: TapeArchive >>>> > Media type: LTO-4 >>>> > 20-Nov 12:15 backup-sd JobId 4845: Error: The Volume=WORMW-1243 on >>>> > device="LTO-4" (/dev/nst0) appears to be unlabeled. >>>> > 20-Nov 12:15 backup-sd JobId 4845: Labeled new Volume "WORMW-1243" on >>>> > tape device "LTO-4" (/dev/nst0). >>>> > 20-Nov 12:15 backup-sd JobId 4845: Wrote label to prelabeled Volume >>>> > "WORMW-1243" on tape device "LTO-4" (/dev/nst0) >>>> > 20-Nov 12:15 backup-sd JobId 4845: New volume "WORMW-1243" mounted on >>>> > device "LTO-4" (/dev/nst0) at 20-Nov-2018 12:15. >>>> > 20-Nov 12:25 backup-sd JobId 4845: Fatal error: append.c:223 Network >>>> > error reading from FD. ERR=Connection reset by peer >>>> > 20-Nov 12:25 backup-sd JobId 4845: Elapsed time=02:31:15, Transfer >>>> > rate=46.11 M Bytes/second >>>> > (From Pool's NextPool resource) >>>> > Catalog: "MyCatalog" (From Client resource) >>>> > Start time: 20-Nov-2018 09:39:31 >>>> > End time: 20-Nov-2018 12:25:04 >>>> > Elapsed time: 2 hours 45 mins 33 secs >>>> > Priority: 13 >>>> > SD Files Written: 4,792 >>>> > SD Bytes Written: 418,488,802,122 (418.4 GB) >>>> > Rate: 42131.2 KB/s >>>> > Volume name(s): WORMW-1242|WORMW-1243 >>>> > Volume Session Id: 9 >>>> > Volume Session Time: 1542631131 >>>> > Last Volume Bytes: 33,060,787,200 (33.06 GB) >>>> > SD Errors: 0 >>>> > SD termination status: OK >>>> > Termination: *** Copying Error *** >>>> > >>>> > Regarding copy jobs the FD and the SD are on the same machine. >>>> > >>>> > we are using: >>>> > >>>> > Distributor ID: Ubuntu >>>> > Description: Ubuntu 16.04.4 LTS >>>> > Release: 16.04 >>>> > Codename: xenial >>>> > >>>> > bacula 7.0.5 >>>> > >>>> > Tape drive: HP Ultrium 4-SCSI >>>> > >>>> > >>>> > Thanks for help, >>>> > >>>> > Andras >>>> > >>>> >>>> >>>> _______________________________________________ >>>> Bacula-users mailing list >>>> Bacula-users@lists.sourceforge.net >>>> https://lists.sourceforge.net/lists/listinfo/bacula-users >>>> >>> >>> >>> _______________________________________________ >>> Bacula-users mailing >>> listBacula-users@lists.sourceforge.nethttps://lists.sourceforge.net/lists/listinfo/bacula-users >>> >>> >>> >> >> _______________________________________________ >> Bacula-users mailing >> listBacula-users@lists.sourceforge.nethttps://lists.sourceforge.net/lists/listinfo/bacula-users >> >> >>
_______________________________________________ Bacula-users mailing list Bacula-users@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/bacula-users