Hello, I am trying to use bacula to write to a small LTO-8 library and have lately been having trouble getting jobs to complete. I have been using this setup to archive data to tape for about a year and a half and it has been working well. Usually my jobs are writing ~60TB at a time. I previously completed a larger job that totaled about 1.2P split across 4 jobs. My current task is to backup two NFS mounted areas, one ~600TB and the second about 1P. I created jobs that break these areas up into 100-200TB chunks and have been trying to run 4-6 at a time split across two tape drives, though I have also gotten this error when only one job is running. The issue that I am having is that the jobs keep failing and the logs give me a slight variation of “bacula-fd JobId 734: Error: bsock.c:649 Write error sending 996 bytes to Storage daemon:grendel.igs.umaryland.edu:9103: ERR=Connection reset by peer”. Below is some information on my setup.
OS: RHEL 8.5 Bacula Version: 9.0.6 DB: Postgres 10.4 Dir, SD and FD and DB are all running on the same host SD: Dell ML3 with 2x LTO-8 drives The jobs will get to a random point before they fail – anywhere between 2 and 60TB Things I have tried that have not resolved the error: Enabling spooling – greatly reduced speed but the same error occurred Heartbeat Interval – currently this is set to 30seconds, I have also tried 60seconds Limiting concurrent jobs – even with a single job the same behavior happened Please let me know what additional information I can provide. Thank you in advance for your help! Erik Anderson
_______________________________________________ Bacula-users mailing list Bacula-users@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/bacula-users