Hi all, I ran into an interesting situation yesterday:
I was doing some maintenance on the mysql server that holds our Bacula database. I had hoped to be finished with that just in time for the next scheduled backup jobs. However, such was not the case and at the time the jobs started, the database was actually still down. My bad, obviously... However, I don't think Bacula handled the situation very graciously and/or resiliently. What happened was that Bacula immediately failed the first job with a fatal error and continued with the next job, which, of course, also failed, and the next and the next etc, leaving me with a whole bunch of failed jobs. To make things worse, these jobs vanished from bconsole into thin air and were nowhere to be found again. I assume this is because the job status could not be updated in the database. I would have hoped/expected that, in such a case, Bacula would back off for a (configurable) short amount of time and then retry the job. In the meantime, warnings could be sent to the administrator and eventually a permanent time out could occur. Even then, there would be no real use in discarding the queued jobs, would there? Blindly trying to continue with the next queued job after a failed database connection is practically useless and a bit silly, as there is only a very slim chance that the exact same database will magically be up and running mere seconds later? Anyway, I don't (really) mean this as a rant or anything, but I must say that I was quite surprised by this chain of events, which threw me into a whole new world of pain while trying to re-run the failed jobs, which I ultimately gave up on, because Bacula kept asking me to mount two tapes simultaneously (?). Obviously, in this case, there are some things I (c|sh)ould have done myself to prevent this. I.e.: not doing maintenance right before/during backups, shutting down Bacula during database maintenance and/or reschedule the jobs to a later time beforehand. However, the same thing would have happened if, for instance, the database had crashed right before backup time, and I had not had time to respond to the resulting monitoring alerts. So, what's your view on this? Should I just STFU and make bloody damned sure that the database is up and running at the time of jobs scheduled to run? Is there perhaps some configuration option I missed, that does what I proposed (back off/retry)? Is it a bug, a feature, a possible future improvement, supergrover? Kind regards, Leander ------------------------------------------------------------------------- Take Surveys. Earn Cash. Influence the Future of IT Join SourceForge.net's Techsay panel and you'll get the chance to share your opinions on IT & business topics through brief surveys-and earn cash http://www.techsay.com/default.php?page=join.php&p=sourceforge&CID=DEVDEV _______________________________________________ Bacula-users mailing list Bacula-users@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/bacula-users