One more suggestion, reducing the spool size has also helped. I have a mix of clients where initially everyone was using a 500GB spool size. My 10Gb connected clients were fine, but my 1 Gb clients would timeout when things were busy - I dropped the spool size to 50GB for the 1 Gb clients and rarely have a timeout.
Patti Clark Linux System Administrator R&D Systems Support Oak Ridge National Laboratory On 9/4/15, 7:08 AM, "Clark, Patti" <clar...@ornl.gov> wrote: >My heartbeat interval is set to 300 - you should probably increase yours. >I originally had a smaller interval and had some clients that would time >out - particularly if they were Windows clients. > >Patti Clark >Linux System Administrator >R&D Systems Support Oak Ridge National Laboratory > > > >On 9/3/15, 7:16 PM, "Jeffrey R. Lang" <jrl...@uwyo.edu> wrote: > >>First let me say thanks to Kern and all those that have helped make >>bacula a great tool. >> >>My current backup environment currently consists of a server, VTL and a >>tape library connected by a 10GiG network. Bacula currently at 5.2.13. >>I plan on upgrading once I've integrated the tape library and thing were >>working. A good starting point for an upgrade. >> >>My issue is when jobs are destined for the tape library I have enable >>job spooling, but these job always timeout after the first spooled block >>of data is written to tape. Here's an example: >> >>03-Sep 11:19 bkupsvr2-dir JobId 12570: No prior Full backup Job record >>found. >>03-Sep 11:19 bkupsvr2-dir JobId 12570: No prior or suitable Full backup >>found in catalog. Doing FULL backup. >>03-Sep 11:19 bkupsvr2-dir JobId 12570: Start Backup JobId 12570, >>Job=bighorn-home.2015-09-03_11.19.32_03 >>03-Sep 11:20 bkupsvr2-dir JobId 12570: Using Device "LTO5-0" to write. >>03-Sep 11:21 bkupsvr2-sd JobId 12570: 3304 Issuing autochanger "load slot >>94, drive 0" command. >>03-Sep 11:22 bkupsvr2-sd JobId 12570: 3305 Autochanger "load slot 94, >>drive 0", status is OK. >>03-Sep 11:22 bkupsvr2-sd JobId 12570: Volume "000094L5" previously >>written, moving to end of data. >>03-Sep 11:23 bkupsvr2-sd JobId 12570: Ready to append to end of Volume >>"000094L5" at file=1724. >>03-Sep 11:23 bkupsvr2-sd JobId 12570: Spooling data ... >>03-Sep 15:07 bkupsvr2-sd JobId 12570: User specified Device spool size >>reached: DevSpoolSize=322,122,610,512 MaxDevSpoolSize=322,122,547,200 >>03-Sep 15:07 bkupsvr2-sd JobId 12570: Writing spooled data to Volume. >>Despooling 322,122,610,512 bytes ... >>03-Sep 15:23 mmcnsd4-fd JobId 12570: Error: bsock.c:429 Write error >>sending 253977 bytes to Storage daemon:bkupsvr2.gg.uwyo.edu:9103: >>ERR=Connection timed out >>03-Sep 15:23 mmcnsd4-fd JobId 12570: Fatal error: backup.c:1200 Network >>send error to SD. ERR=Connection timed out >>03-Sep 15:23 bkupsvr2-dir JobId 12570: Error: Director's comm line to SD >>dropped. >>03-Sep 15:23 bkupsvr2-dir JobId 12570: Error: Bacula bkupsvr2-dir 5.2.13 >>(19Jan13): >> Build OS: x86_64-unknown-linux-gnu redhat Enterprise >>release >> JobId: 12570 >> Job: bighorn-home.2015-09-03_11.19.32_03 >> Backup Level: Full (upgraded from Incremental) >> Client: "mmonsd4-fd" 5.2.13 (19Jan13) >>x86_64-unknown-linux-gnu,redhat, >> FileSet: "bighorn-home" 2015-06-12 08:21:10 >> Pool: "ARCC" (From Job resource) >> Catalog: "MyCatalog" (From Client resource) >> Storage: "NEO4200" (From Pool resource) >> Scheduled time: 03-Sep-2015 11:19:31 >> Start time: 03-Sep-2015 11:20:26 >> End time: 03-Sep-2015 15:23:28 >> Elapsed time: 4 hours 3 mins 2 secs >> Priority: 10 >> FD Files Written: 1,805,322 >> SD Files Written: 0 >> FD Bytes Written: 321,716,097,371 (321.7 GB) >> SD Bytes Written: 0 (0 B) >> Rate: 22062.5 KB/s >> Software Compression: None >> VSS: no >> Encryption: no >> Accurate: yes >> Volume name(s): 000094L5 >> Volume Session Id: 1 >> Volume Session Time: 1441300753 >> Last Volume Bytes: 1,817,725,514,752 (1.817 TB) >> Non-fatal FD errors: 2 >> SD Errors: 0 >> FD termination status: Error >> SD termination status: Error >> Termination: *** Backup Error *** >> >>If I turn off job spooling then the job will complete as expected. >> >>I have enable "heartbeats" on the client, storage daemon and director >>but that didn't help. >> >>My current client configuration is this: >> >>FileDaemon { # this is me >> Name = mmcnsd4-fd >> FDport = 9102 # where we listen for the director >> WorkingDirectory = /usr/local/bacula/working >> Pid Directory = /usr/local/bacula/working >> Maximum Concurrent Jobs = 20 >> Maximum Network Buffer Size = 262144 >> Heartbeat Interval = 60 >>} >> >>My storage daemon configuration is: >>Storage { # definition of myself >> Name = bkupsvr2-sd >> SDPort = 9103 # Director's port >> WorkingDirectory = "/usr/local/bacula/working" >> Pid Directory = "/usr/local/bacula/working" >> Maximum Concurrent Jobs = 20 >> Heartbeat Interval = 60 >>} >> >>The only thing I can see is that with spooling turned off, data is >>constantly flowing over the network connection. With the spooling >>turned on there is a quiet period on the network connection. >> >>I've talked with my network engineer about this and he says there's >>nothing in the network that would cause the application to close the >>connection. >> >>So has any one seen this problem before? >>Any ideas on what to look at to figure this out? >> >>jeff >> >> > > >-------------------------------------------------------------------------- >---- >_______________________________________________ >Bacula-users mailing list >Bacula-users@lists.sourceforge.net >https://lists.sourceforge.net/lists/listinfo/bacula-users > ------------------------------------------------------------------------------ _______________________________________________ Bacula-users mailing list Bacula-users@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/bacula-users