Hi all,I'm running into an issue with some bacula-fd instances and hoping someone can point me in the right direction.
In short: I have bacula-fd instances that are clearly running jobs (confirmed via strace), but they often time out when I run status client=CLIENTNAME. They only seem reliably responsive when idle.
Details: * Bacula version: 9.6.6 (yes, I know it's old — upgrade is planned). * Setup: Two hosts (`zhomebackup[1-2]`) running both SD and FD. A script at the beginning of each job snapshots NFS shares, mounts them, and outputs file paths for backups. * Problem: These hosts struggle to handle more than 6–7 jobs effectively. Going beyond that causes a drop in aggregate file scan rates. * Attempted solution: Spun up additional FD instances on separate ports (originally inside Docker, but now just running natively on non-standard ports). These new instances are /intermittently/ responsive to `status client`, even with only 1–3 jobs. The original FD (on the default port) remains responsive, even with 6–7 jobs.I'm wondering if this could be a shared resource issue or some FD limitation I'm not accounting for. Or is there a better way to scale job throughput?
I've attached a tarball containing systemd service files, FD configs, and relevant parts of the Director config, including an example job definition.
Any insights would be greatly appreciated. Thanks, Lloyd -- Lloyd Brown HPC Systems Administrator Office of Research Computing Brigham Young University http://rc.byu.edu
bacula_diag.tar.gz
Description: application/gzip
_______________________________________________ Bacula-users mailing list Bacula-users@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/bacula-users