Hello,

I am trying to use bacula to write to a small LTO-8 library and have lately 
been having trouble getting jobs to complete. I have been using this setup to 
archive data to tape for about a year and a half and it has been working well. 
Usually my jobs are writing ~60TB at a time. I previously completed a larger 
job that totaled about 1.2P split across 4 jobs. My current task is to backup 
two NFS mounted areas, one ~600TB and the second about 1P. I created jobs that 
break these areas up into 100-200TB chunks and have been trying to run 4-6 at a 
time split across two tape drives, though I have also gotten this error when 
only one job is running. The issue that I am having is that the jobs keep 
failing and the logs give me a slight variation of “bacula-fd JobId 734: Error: 
bsock.c:649 Write error sending 996 bytes to Storage 
daemon:grendel.igs.umaryland.edu:9103: ERR=Connection reset by peer”. Below is 
some information on my setup.

OS:                         RHEL 8.5
Bacula Version:  9.0.6
DB:                         Postgres 10.4
Dir, SD and FD and DB are all running on the same host
SD:                         Dell ML3 with 2x LTO-8 drives

The jobs will get to a random point before they fail – anywhere between 2 and 
60TB

Things I have tried that have not resolved the error:
Enabling spooling – greatly reduced speed but the same error occurred
Heartbeat Interval – currently this is set to 30seconds, I have also tried 
60seconds
Limiting concurrent jobs – even with a single job the same behavior happened

Please let me know what additional information I can provide. Thank you in 
advance for your help!

Erik Anderson
_______________________________________________
Bacula-users mailing list
Bacula-users@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/bacula-users

Reply via email to