Hello,

As you know, current job scheduling has a few deficiencies, particular if for 
some reason your backups get blocked (a bad tape driver or operator 
intervention required), which can lead to a big pile of duplicate jobs being 
scheduled.

We have previously discussed ways of fixing this, with some really good ideas.

I am now ready to take a stab at implementing it, and would like to present 
the current design and let some of you help in the design process.  I am 
currently pretty busy with my own project and helping with two major projects 
that are making very nice progress, so I would appreciate some input.

My current idea is to create a new "DuplicateJobs" resource and a new 
Duplicate Jobs directive which would point to the duplicate jobs resource.
The reason for the resource is that there are just too many different 
variations that it would require a lot of new directives, and it seems a 
shame to add them to every Job.

My current design calls for a Duplicate Jobs resource that looks something 
like the following:

DuplicateJobs {
  Name = "xxx"

  Allow = yes|no          (no = default)

  AllowHigherLevel = yes|no    (no)

  AllowLowerLevel = yes|no     (no)

  AllowSameLevel = yes|no

  Cancel = Running | New        (no)

  CancelledStatus = Fail | Skip  (fail)

  Job Proximity = <time-interval>  (0)

}

The first "Allow" directive is probably not needed, but it does make it more 
complete.  If this directive is set to yes, all the other directives would be 
ignored, which would be the same as today and with no Duplicate Jobs 
directive in the Job resource.

The AllowXXX directives are to try to define what job will be allowed to 
continue when there is one job running or waiting and a new one arrives.
For example AllowHigherLevel = yes, would mean to allow the higher level job 
to continue.

The Cancel directive specifies which job to cancel (the new job or the job 
already there.  I think there is probably a logic conflict between this 
directive and the AllowXXX directives, but I have not thought this through 
carefully enough.

The CancelledStatus is an attempt to tell Bacula to either fail one of the two 
jobs or to Skip it, which means to kill it but without a lot of noise.  Some 
options I could think of here that are not yet clearly specified are:

    Do not kill a running job in favor of a newly scheduled job.
    Do not print any messages about cancelling a job (I don't particularly 
       like this idea).
    Do not record any cancelled job in the catalog
    ...

Finally Job Proximity is to allow a bit of overlap.  For example, if a job has 
been running 20 minutes or ran 20 minutes ago, you might want to not apply 
the rules.

As you can see, there is a lot of room for clarification of what should be 
done, and also a need for a bit more functionality ... -- in other words a 
bit more design is needed before beginning the implementation.

Comments?

Best regards,

Kern

-------------------------------------------------------------------------
This SF.net email is sponsored by: Microsoft
Defy all challenges. Microsoft(R) Visual Studio 2008.
http://clk.atdmt.com/MRT/go/vse0120000070mrt/direct/01/
_______________________________________________
Bacula-users mailing list
Bacula-users@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/bacula-users

Reply via email to