My heartbeat interval is set to 300 - you should probably increase yours. I originally had a smaller interval and had some clients that would time out - particularly if they were Windows clients.
Patti Clark Linux System Administrator R&D Systems Support Oak Ridge National Laboratory On 9/3/15, 7:16 PM, "Jeffrey R. Lang" <jrl...@uwyo.edu> wrote: >First let me say thanks to Kern and all those that have helped make >bacula a great tool. > >My current backup environment currently consists of a server, VTL and a >tape library connected by a 10GiG network. Bacula currently at 5.2.13. >I plan on upgrading once I've integrated the tape library and thing were >working. A good starting point for an upgrade. > >My issue is when jobs are destined for the tape library I have enable >job spooling, but these job always timeout after the first spooled block >of data is written to tape. Here's an example: > >03-Sep 11:19 bkupsvr2-dir JobId 12570: No prior Full backup Job record >found. >03-Sep 11:19 bkupsvr2-dir JobId 12570: No prior or suitable Full backup >found in catalog. Doing FULL backup. >03-Sep 11:19 bkupsvr2-dir JobId 12570: Start Backup JobId 12570, >Job=bighorn-home.2015-09-03_11.19.32_03 >03-Sep 11:20 bkupsvr2-dir JobId 12570: Using Device "LTO5-0" to write. >03-Sep 11:21 bkupsvr2-sd JobId 12570: 3304 Issuing autochanger "load slot >94, drive 0" command. >03-Sep 11:22 bkupsvr2-sd JobId 12570: 3305 Autochanger "load slot 94, >drive 0", status is OK. >03-Sep 11:22 bkupsvr2-sd JobId 12570: Volume "000094L5" previously >written, moving to end of data. >03-Sep 11:23 bkupsvr2-sd JobId 12570: Ready to append to end of Volume >"000094L5" at file=1724. >03-Sep 11:23 bkupsvr2-sd JobId 12570: Spooling data ... >03-Sep 15:07 bkupsvr2-sd JobId 12570: User specified Device spool size >reached: DevSpoolSize=322,122,610,512 MaxDevSpoolSize=322,122,547,200 >03-Sep 15:07 bkupsvr2-sd JobId 12570: Writing spooled data to Volume. >Despooling 322,122,610,512 bytes ... >03-Sep 15:23 mmcnsd4-fd JobId 12570: Error: bsock.c:429 Write error >sending 253977 bytes to Storage daemon:bkupsvr2.gg.uwyo.edu:9103: >ERR=Connection timed out >03-Sep 15:23 mmcnsd4-fd JobId 12570: Fatal error: backup.c:1200 Network >send error to SD. ERR=Connection timed out >03-Sep 15:23 bkupsvr2-dir JobId 12570: Error: Director's comm line to SD >dropped. >03-Sep 15:23 bkupsvr2-dir JobId 12570: Error: Bacula bkupsvr2-dir 5.2.13 >(19Jan13): > Build OS: x86_64-unknown-linux-gnu redhat Enterprise >release > JobId: 12570 > Job: bighorn-home.2015-09-03_11.19.32_03 > Backup Level: Full (upgraded from Incremental) > Client: "mmonsd4-fd" 5.2.13 (19Jan13) >x86_64-unknown-linux-gnu,redhat, > FileSet: "bighorn-home" 2015-06-12 08:21:10 > Pool: "ARCC" (From Job resource) > Catalog: "MyCatalog" (From Client resource) > Storage: "NEO4200" (From Pool resource) > Scheduled time: 03-Sep-2015 11:19:31 > Start time: 03-Sep-2015 11:20:26 > End time: 03-Sep-2015 15:23:28 > Elapsed time: 4 hours 3 mins 2 secs > Priority: 10 > FD Files Written: 1,805,322 > SD Files Written: 0 > FD Bytes Written: 321,716,097,371 (321.7 GB) > SD Bytes Written: 0 (0 B) > Rate: 22062.5 KB/s > Software Compression: None > VSS: no > Encryption: no > Accurate: yes > Volume name(s): 000094L5 > Volume Session Id: 1 > Volume Session Time: 1441300753 > Last Volume Bytes: 1,817,725,514,752 (1.817 TB) > Non-fatal FD errors: 2 > SD Errors: 0 > FD termination status: Error > SD termination status: Error > Termination: *** Backup Error *** > >If I turn off job spooling then the job will complete as expected. > >I have enable "heartbeats" on the client, storage daemon and director >but that didn't help. > >My current client configuration is this: > >FileDaemon { # this is me > Name = mmcnsd4-fd > FDport = 9102 # where we listen for the director > WorkingDirectory = /usr/local/bacula/working > Pid Directory = /usr/local/bacula/working > Maximum Concurrent Jobs = 20 > Maximum Network Buffer Size = 262144 > Heartbeat Interval = 60 >} > >My storage daemon configuration is: >Storage { # definition of myself > Name = bkupsvr2-sd > SDPort = 9103 # Director's port > WorkingDirectory = "/usr/local/bacula/working" > Pid Directory = "/usr/local/bacula/working" > Maximum Concurrent Jobs = 20 > Heartbeat Interval = 60 >} > >The only thing I can see is that with spooling turned off, data is >constantly flowing over the network connection. With the spooling >turned on there is a quiet period on the network connection. > >I've talked with my network engineer about this and he says there's >nothing in the network that would cause the application to close the >connection. > >So has any one seen this problem before? >Any ideas on what to look at to figure this out? > >jeff > > ------------------------------------------------------------------------------ _______________________________________________ Bacula-users mailing list Bacula-users@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/bacula-users