To everyone that had replied to my request for help. I have reduced the size of my spool buffer from 300G to 50G and run tests. Things work better but I'm still getting timeouts. Now they appear to be random over the out spooling. What I mean is that some jobs will do 1 to multiple output spools then fail with a timeout.
I now gonna following up with the bigger heartbeat, to see if that will help out. Per Kern's recommendation I will upgrade to 7.2.0, but I had tried 7.0.5 but the timeouts persisted and caused jobs that had previously worked for years to start failing with timeouts. I'll keep looking into this a see what I can discover. jeff On 09/12/2015 11:04 AM, Kern Sibbald wrote: The engineer is probably mistaken. The line seems clearly to have dropped. Perhaps you are still missing a Heartbeat interval somewhere -- there are quite a number of them. The other thing is that I would recommend a Heartbeat Interval of 300. Another alternative is to upgrade to version 7.2.0, which I believe has the Heartbeat interval set by default to 300. I also recommend not setting the Maximum Network Buffer Size as Bacula generally figures that out for itself. Once you get your backups working, you can experiment with it. Best regards, Kern On 15-09-03 04:16 PM, Jeffrey R. Lang wrote: First let me say thanks to Kern and all those that have helped make bacula a great tool. My current backup environment currently consists of a server, VTL and a tape library connected by a 10GiG network. Bacula currently at 5.2.13. I plan on upgrading once I've integrated the tape library and thing were working. A good starting point for an upgrade. My issue is when jobs are destined for the tape library I have enable job spooling, but these job always timeout after the first spooled block of data is written to tape. Here's an example: 03-Sep 11:19 bkupsvr2-dir JobId 12570: No prior Full backup Job record found. 03-Sep 11:19 bkupsvr2-dir JobId 12570: No prior or suitable Full backup found in catalog. Doing FULL backup. 03-Sep 11:19 bkupsvr2-dir JobId 12570: Start Backup JobId 12570, Job=bighorn-home.2015-09-03_11.19.32_03 03-Sep 11:20 bkupsvr2-dir JobId 12570: Using Device "LTO5-0" to write. 03-Sep 11:21 bkupsvr2-sd JobId 12570: 3304 Issuing autochanger "load slot 94, drive 0" command. 03-Sep 11:22 bkupsvr2-sd JobId 12570: 3305 Autochanger "load slot 94, drive 0", status is OK. 03-Sep 11:22 bkupsvr2-sd JobId 12570: Volume "000094L5" previously written, moving to end of data. 03-Sep 11:23 bkupsvr2-sd JobId 12570: Ready to append to end of Volume "000094L5" at file=1724. 03-Sep 11:23 bkupsvr2-sd JobId 12570: Spooling data ... 03-Sep 15:07 bkupsvr2-sd JobId 12570: User specified Device spool size reached: DevSpoolSize=322,122,610,512 MaxDevSpoolSize=322,122,547,200 03-Sep 15:07 bkupsvr2-sd JobId 12570: Writing spooled data to Volume. Despooling 322,122,610,512 bytes ... 03-Sep 15:23 mmcnsd4-fd JobId 12570: Error: bsock.c:429 Write error sending 253977 bytes to Storage daemon:bkupsvr2.gg.uwyo.edu:9103: ERR=Connection timed out 03-Sep 15:23 mmcnsd4-fd JobId 12570: Fatal error: backup.c:1200 Network send error to SD. ERR=Connection timed out 03-Sep 15:23 bkupsvr2-dir JobId 12570: Error: Director's comm line to SD dropped. 03-Sep 15:23 bkupsvr2-dir JobId 12570: Error: Bacula bkupsvr2-dir 5.2.13 (19Jan13): Build OS: x86_64-unknown-linux-gnu redhat Enterprise release JobId: 12570 Job: bighorn-home.2015-09-03_11.19.32_03 Backup Level: Full (upgraded from Incremental) Client: "mmonsd4-fd" 5.2.13 (19Jan13) x86_64-unknown-linux-gnu,redhat, FileSet: "bighorn-home" 2015-06-12 08:21:10 Pool: "ARCC" (From Job resource) Catalog: "MyCatalog" (From Client resource) Storage: "NEO4200" (From Pool resource) Scheduled time: 03-Sep-2015 11:19:31 Start time: 03-Sep-2015 11:20:26 End time: 03-Sep-2015 15:23:28 Elapsed time: 4 hours 3 mins 2 secs Priority: 10 FD Files Written: 1,805,322 SD Files Written: 0 FD Bytes Written: 321,716,097,371 (321.7 GB) SD Bytes Written: 0 (0 B) Rate: 22062.5 KB/s Software Compression: None VSS: no Encryption: no Accurate: yes Volume name(s): 000094L5 Volume Session Id: 1 Volume Session Time: 1441300753 Last Volume Bytes: 1,817,725,514,752 (1.817 TB) Non-fatal FD errors: 2 SD Errors: 0 FD termination status: Error SD termination status: Error Termination: *** Backup Error *** If I turn off job spooling then the job will complete as expected. I have enable "heartbeats" on the client, storage daemon and director but that didn't help. My current client configuration is this: FileDaemon { # this is me Name = mmcnsd4-fd FDport = 9102 # where we listen for the director WorkingDirectory = /usr/local/bacula/working Pid Directory = /usr/local/bacula/working Maximum Concurrent Jobs = 20 Maximum Network Buffer Size = 262144 Heartbeat Interval = 60 } My storage daemon configuration is: Storage { # definition of myself Name = bkupsvr2-sd SDPort = 9103 # Director's port WorkingDirectory = "/usr/local/bacula/working" Pid Directory = "/usr/local/bacula/working" Maximum Concurrent Jobs = 20 Heartbeat Interval = 60 } The only thing I can see is that with spooling turned off, data is constantly flowing over the network connection. With the spooling turned on there is a quiet period on the network connection. I've talked with my network engineer about this and he says there's nothing in the network that would cause the application to close the connection. So has any one seen this problem before? Any ideas on what to look at to figure this out? jeff ------------------------------------------------------------------------------ Monitor Your Dynamic Infrastructure at Any Scale With Datadog! Get real-time metrics from all of your servers, apps and tools in one place. SourceForge users - Click here to start your Free Trial of Datadog now! http://pubads.g.doubleclick.net/gampad/clk?id=241902991&iu=/4140 _______________________________________________ Bacula-users mailing list Bacula-users@lists.sourceforge.net<mailto:Bacula-users@lists.sourceforge.net> https://lists.sourceforge.net/lists/listinfo/bacula-users
<<attachment: jrlang.vcf>>
------------------------------------------------------------------------------
_______________________________________________ Bacula-users mailing list Bacula-users@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/bacula-users