Hi all,

I'm running into an issue with some bacula-fd instances and hoping someone can point me in the right direction.

In short: I have bacula-fd instances that are clearly running jobs (confirmed via strace), but they often time out when I run status client=CLIENTNAME. They only seem reliably responsive when idle.

Details:

  *     Bacula version: 9.6.6 (yes, I know it's old — upgrade is planned).
  *     Setup: Two hosts (`zhomebackup[1-2]`) running both SD and FD. A
   script at the beginning of each job snapshots NFS shares, mounts
   them, and outputs file paths for backups.
  *     Problem: These hosts struggle to handle more than 6–7 jobs
   effectively. Going beyond that causes a drop in aggregate file scan
   rates.
  *     Attempted solution: Spun up additional FD instances on separate
   ports (originally inside Docker, but now just running natively on
   non-standard ports). These new instances are /intermittently/
   responsive to `status client`, even with only 1–3 jobs. The original
   FD (on the default port) remains responsive, even with 6–7 jobs.

I'm wondering if this could be a shared resource issue or some FD limitation I'm not accounting for. Or is there a better way to scale job throughput?

I've attached a tarball containing systemd service files, FD configs, and relevant parts of the Director config, including an example job definition.

Any insights would be greatly appreciated.

Thanks,
Lloyd

--
Lloyd Brown
HPC Systems Administrator
Office of Research Computing
Brigham Young University
http://rc.byu.edu

Attachment: bacula_diag.tar.gz
Description: application/gzip

_______________________________________________
Bacula-users mailing list
Bacula-users@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/bacula-users

Reply via email to