On Thu, Jun 25, 2015 at 8:55 PM, Thomas Lohman <thom...@mtl.mit.edu> wrote:

> > Ok, so the option "Allow Duplicate Job=no" can at least prevent multiple
> > full backups of the same server in a row as stated before?
>
> As others mentioned, I think it may help in your case but it may not
> completely solve the problem that you saw.  It looks like you had 5
> instances of the same job queued up at the same time.  Disallowing
> duplicate jobs would mean the last 4 would be canceled once queued (but
> after being upgraded to Full).  Now, if we assume your original Full job
> actually ended up running and completed successfully, your next instance
> of this job will still get upgraded to Full I suspect since it's going
> to see the canceled jobs as "newer" than that successful Full.  The
> problem, I think, is what I described here in bug 1882
>
> "The original 5.2.13 behavior when determining if a failed job needs to
> be rerun was to look at the start time of the most recent successful
> backup. From there it would then see if any job had started since then
> and failed. As pointed out, this creates an issue when you have FULL
> jobs that tend to run longer than the time period between normal backups
> for those jobs. i.e. the job laps itself so to speak. Any new jobs would
> be upgraded to FULLs and then canceled since the original FULL was still
> running (this assumes that duplicate jobs are not allowed). But once the
> original FULL finished, Bacula was grabbing it's start time and then
> seeing those canceled FULL jobs that happened since the successful FULL
> was started. To me, it seems like looking at the end time of that
> successful job makes more sense."
>
> The change I made was to have Bacula look at the real end time of the
> last successful job and then see if any jobs have failed since that
> time.  This fixed these type of issues for us.  Sorry that this probably
> doesn't help you with fixing it right now if you're running 7.0.x, but I
> think it does explain the behavior that you're seeing and also says that
> it is still there in 7.0.x
>
> And just for completeness, these are the related settings that we run with:
>
> Allow Duplicate Jobs = no
> Cancel Lower Level Duplicates = yes
> Cancel Queued Duplicates = yes
> Cancel Running Duplicates = no
> Rerun Failed Levels = yes
>
> hope this helps,
>
>
> --tom
>

Wouldn't this changed behavior run into the problem that cancelled
duplicates are still seen as failed jobs and therefore jobs would be
upgraded still?

Eg:

   1. Full starts
   2. Incr is queued, upgraded to Full and cancelled.
   3. Full ends
   4. Incr is queued, checks that Full job no. 1 finished OK, but then
   checks that Incr->Full job no. 2 failed - thus it's still upgraded to Full
   and started.

--
Silver
------------------------------------------------------------------------------
Monitor 25 network devices or servers for free with OpManager!
OpManager is web-based network management software that monitors 
network devices and physical & virtual servers, alerts via email & sms 
for fault. Monitor 25 devices for free with no restriction. Download now
http://ad.doubleclick.net/ddm/clk/292181274;119417398;o
_______________________________________________
Bacula-users mailing list
Bacula-users@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/bacula-users

Reply via email to