Hello,

On Monday 06 March 2006 19:48, Enrique de la Torre Gordaliza wrote:
> Hi all!
>
> I'm a bit confused with "reschedule on error" behavior. Im running a 1.38.5
> server on a Xeon EM64T linux (GNU/Debian) server.
>
>
> Ive add to my job configuration (my workstation is not always online):
>
>         Rerun Failed Levels = yes
>         Reschedule on Error = yes
>         Reschedule Interval = 30  # (short time just for test)
>         Reschedule Times = 2
>         Run Before Job = "nc -w5 -z XXX.XXX.XXX.XXX 9102"
>
> I stop file daemon and run this job from bconsole and wait few minutes. It
> fails but get no mail about it (mail has been working for Backup Failed and
> OK notifications, operator notifications too). Ive found some emty .mail
>
> files on working directory:
> :/usr/local/bacula/bin/working# ls -1l *.mail
>
> -rw-r-----  1 bacula bacula 0 2006-03-06 18:42
> neptuno-dir.Sunipx1.2006-03-06_18.42.31.8031880.mail
>
> corresponding to this job. If I "status dir" at bconsole:
>
> Running Jobs:
>  JobId Level   Name                       Status
> ======================================================================
>    328 Increme  Sunipx1.2006-03-06_18.42.31 has a fatal error
> ====
>
> I cannot clean this "fatal error". No cancel or delete command can. It
> seems it's not the usual behavior of reschedule on error, is it? Is it a
> configuration problem? a compiler problem (its 64 bit on linux but compiled
> without -O2)?
>
> Logs and trace dont show any suspicious error... If I try
>
>         Rerun Failed Levels = yes
>  #      Reschedule on Error = yes
>  #      Reschedule Interval = 30  # (short time just for test)
>  #      Reschedule Times = 2
>         Run Before Job = "nc -w5 -z XXX.XXX.XXX.XXX 9102"
>
> it works great with mail notification about RunBeforeJob exit status so it
> seems just? a "Reschedule on Error" issue.
>

First, rescheduling has always been a bit fragile, and I haven't yet made a 
regression script to test it, so there could well be a bug in 1.38.5.  That 
said, I don't really see a problem here.  When the job fails, it probably 
puts a message in the job report, but the job is then rescheduled, and waits 
for the reschedule interval to expire.  While it is waiting, there is no way 
to kill it off (in fact, if you do, perhaps it will get confused as this 
something I did not test).  Only when it is again running will you be able to 
cancel it.  The only thing that I see a bit weird is that you apparently have 
a 30 second restart period.  However, if you modified this value without 
stopping and restarting the Director it will not be valid (i.e. even if you 
do a reload, the value may not change).

Anyway, if you are really sure that this is not working, it would be worth a 
bug report.  I believe that several users *are* successfully using 
rescheduling though ...


-------------------------------------------------------
This SF.Net email is sponsored by xPML, a groundbreaking scripting language
that extends applications into web and mobile media. Attend the live webcast
and join the prime developer group breaking into this new coding territory!
http://sel.as-us.falkag.net/sel?cmd=lnk&kid=110944&bid=241720&dat=121642
_______________________________________________
Bacula-users mailing list
Bacula-users@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/bacula-users

Reply via email to