In the message dated: Sat, 23 Feb 2008 12:40:43 +0100, Kern Sibbald used the subject line <[Bacula-users] Improving job scheduling flexibility> and wrote:
=> Hello, => => As you know, current job scheduling has a few deficiencies, particular if for => some reason your backups get blocked (a bad tape driver or operator => intervention required), which can lead to a big pile of duplicate jobs being => scheduled. Or if a job takes so long that it is still running when the next instance of the same job is launched (ie., a backup that takes more than 24 hours). [SNIP!] => => My current idea is to create a new "DuplicateJobs" resource and a new => Duplicate Jobs directive which would point to the duplicate jobs resource. Sounds great! => The reason for the resource is that there are just too many different => variations that it would require a lot of new directives, and it seems a => shame to add them to every Job. => => My current design calls for a Duplicate Jobs resource that looks something => like the following: => => DuplicateJobs { [SNIP!] => => Job Proximity = <time-interval> (0) => => } => [SNIP!] => => Finally Job Proximity is to allow a bit of overlap. For example, if a job has => been running 20 minutes or ran 20 minutes ago, you might want to not apply => the rules. Could you elaborate on what this means to you a bit more? I see the distinction here being mainly in terms of jobs that take a "long" time vrs a "short" time. If the entire job normally takes 30 minutes, I don't really care whether there's a duplicate, and it doesn't matter to me if the duplicate starts 1 minute after the original or 29 minutes after. However, if the job normally takes 18 hours, then the conditions are very different. In this case, I really, really, really don't want a duplicate running if there's a lot of overlap--this would have a major effect on disk loads on the client, on network traffic, and on disk/cpu/media resource on the bacula server. However, if the original job is almost near completion when the duplicate is launched, then I don't want to cancel the duplicate. In this case, the reasoning is that canceling the duplicate would result in a long window with no backups, in an effort to close a small window of duplicate (simultaneous) backups running. Here's a very complicated proposal, which will almost certainly be rejected, that really leverages Bacula's database backend and gives a really powerful feature: if the job historically takes over $DURATION [minutes|hours|days] and the current job is at least $PERCENTAGE complete, then allow the duplicate to run, otherwise kill the duplicate in this case, $DURATION would be determined from database stats, as an average of previous runs of the same job at the same level. I could also see an algorithm that gives more weight to the duration of the most recent backups if the standard deviation of the average vrs. the most recent backups is greater than a specified value. This is because a given backup is more likely to take "almost as much" time as the most recent backup of the same level than as much time as a much earlier backup. similarly, the $PERCENTAGE value could be expressed as a range, incorporating the standard deviation in the backup duration [As an aside, I'd like to see this kind of predictive/AI capability put into more of bacula, particularly in the scheduling. It would be wonderful to use the historic records to allow bacula to schedule jobs most efficiently, in a way similar to Amanda, rather than hard-coding specific times in each job resource.] => => As you can see, there is a lot of room for clarification of what should be => done, and also a need for a bit more functionality ... -- in other words a => bit more design is needed before beginning the implementation. => => Comments? => => Best regards, => => Kern => ---- Mark Bergman [EMAIL PROTECTED] 215-662-7310 System Administrator Section of Biomedical Image Analysis Department of Radiology University of Pennsylvania PGP Key at: https://www.rad.upenn.edu/sbia/bergman The information contained in this e-mail message is intended only for the personal and confidential use of the recipient(s) named above. If the reader of this message is not the intended recipient or an agent responsible for delivering it to the intended recipient, you are hereby notified that you have received this document in error and that any review, dissemination, distribution, or copying of this message is strictly prohibited. If you have received this communication in error, please notify us immediately by e-mail, and delete the original message. ------------------------------------------------------------------------- This SF.net email is sponsored by: Microsoft Defy all challenges. Microsoft(R) Visual Studio 2008. http://clk.atdmt.com/MRT/go/vse0120000070mrt/direct/01/ _______________________________________________ Bacula-users mailing list Bacula-users@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/bacula-users