To recap: My configuration uses Bacula in a disk-to-disk-to-removable-disk configuration (for about the past year since I gave up on replacing failed LTO drives every other year), with its catalog DB on MariaDB Galera Cluster and DB connections round-robinned via HAproxy (it's been on the cluster for about three years, and on HAproxy for about the past year). This configuration has Just Worked as long as I've been running it, with the one caveat that Bacula must be built with attribute spooling disabled, because the attribute spooling code for the MySQL driver just does not work in a useful way.
As soon as I updated to 9.6.5, jobs started hanging. Typically about one job in three would just hang for no reason I was ever able to determine. Roughly one job in three would hang mid-file. It was sometimes possible to cancel a hung job, but it would take a very long time. If I mistakenly tried to cancel a seconds hung job while the Director was still working on cancelling the first, it would almost invariably crash. Changing the configuration to connect directly to the local node of the cluster and treat it as a standalone MySQL instance slightly mitigated the problem, but did not fix it. I eventually discovered that I could turn the hung-job problem on and off simply by changing the Director version *ONLY*. With a 9.6.5 Director, jobs hung; with 9.6.3 Director, they didn't, with no other changes. No matter how deeply I dug into it I was unable to ever isolate the specific cause of the problem. I've now been running on 9.6.6 for a week and have not seen a single hung job. I *THINK* I can safely state that whatever the root cause of the problem in 9.6.5, it is fixed in 9.6.6. Since it is already reported that a TLS issue in 9.6.5 is fixed in 9.6.6, I'm going to *speculate* (with the caveat that I have no definite proof) that both issues were actually networking problems, that jobs were hanging because communication with the clients silently failed, and that the fix for the TLS problem ALSO fixed the hung-jobs problem by fixing the client communication failures. -- Phil Stracchino Babylon Communications ph...@caerllewys.net p...@co.ordinate.org Landline: +1.603.293.8485 Mobile: +1.603.998.6958 _______________________________________________ Bacula-users mailing list Bacula-users@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/bacula-users