On Mon, Mar 06, 2006 at 09:32:27PM +0100, Kern Sibbald wrote:
> Hello,
> 
> On Monday 06 March 2006 19:48, Enrique de la Torre Gordaliza wrote:
> > Hi all!
> >
> > I'm a bit confused with "reschedule on error" behavior. Im running a 1.38.5
> > server on a Xeon EM64T linux (GNU/Debian) server.
> >
> >
> > Ive add to my job configuration (my workstation is not always online):
> >
> >         Rerun Failed Levels = yes
> >         Reschedule on Error = yes
> >         Reschedule Interval = 30  # (short time just for test)
> >         Reschedule Times = 2
> >         Run Before Job = "nc -w5 -z XXX.XXX.XXX.XXX 9102"
> >
> > I stop file daemon and run this job from bconsole and wait few minutes. It
> > fails but get no mail about it (mail has been working for Backup Failed and
> > OK notifications, operator notifications too). Ive found some emty .mail
> >
> > files on working directory:
> > :/usr/local/bacula/bin/working# ls -1l *.mail
> >
> > -rw-r-----  1 bacula bacula 0 2006-03-06 18:42
> > neptuno-dir.Sunipx1.2006-03-06_18.42.31.8031880.mail
> >
> > corresponding to this job. If I "status dir" at bconsole:
> >
> > Running Jobs:
> >  JobId Level   Name                       Status
> > ======================================================================
> >    328 Increme  Sunipx1.2006-03-06_18.42.31 has a fatal error
> > ====
> >
> > I cannot clean this "fatal error". No cancel or delete command can. It
> > seems it's not the usual behavior of reschedule on error, is it? Is it a
> > configuration problem? a compiler problem (its 64 bit on linux but compiled
> > without -O2)?
> >
> > Logs and trace dont show any suspicious error... If I try
> >
> >         Rerun Failed Levels = yes
> >  #      Reschedule on Error = yes
> >  #      Reschedule Interval = 30  # (short time just for test)
> >  #      Reschedule Times = 2
> >         Run Before Job = "nc -w5 -z XXX.XXX.XXX.XXX 9102"
> >
> > it works great with mail notification about RunBeforeJob exit status so it
> > seems just? a "Reschedule on Error" issue.
> >
> 
> First, rescheduling has always been a bit fragile, and I haven't yet made a 
> regression script to test it, so there could well be a bug in 1.38.5.  That 
> said, I don't really see a problem here.  When the job fails, it probably 
> puts a message in the job report, but the job is then rescheduled, and waits 
> for the reschedule interval to expire.  While it is waiting, there is no way 
> to kill it off (in fact, if you do, perhaps it will get confused as this 
> something I did not test).  Only when it is again running will you be able to 
> cancel it.  The only thing that I see a bit weird is that you apparently have 
> a 30 second restart period.  However, if you modified this value without 
> stopping and restarting the Director it will not be valid (i.e. even if you 
> do a reload, the value may not change).
> 
> Anyway, if you are really sure that this is not working, it would be worth a 
> bug report.  I believe that several users *are* successfully using 
> rescheduling though ...
> 
>

I have 30 seconds restart period just for testing. I had the same
problem for 5 hours period (my first attempt). Its my first bacula
installation, so I dont now the expected behavior for Reschedule on
Error. I mean, I dont now if I should recieve and email if all the
director attempts have failed. The empty files on working directory make
me believe that I should, but It seems there is a problem (but I can
recieve all other notifications).

I dont know If after all failed attempts, the fatal error status should
be showed at "Terminated Jobs:" on status dir output, or at "Running
Jobs:". If it has failed all attempts, why is it listed at "Running Jobs:"?

I try to cancel the job after all attempts, not while its waiting. I try
to cancel it to take it from "Running Jobs" to "Terminated Jobs" after
last try, but I cant. It seems that the job has a problem to finish on last try 
(no mail is sent and no status is set correctly). If I start file daemon while 
it is waiting next try, It works perfectly and have an OK status backup after a 
while.

Thanks in advance,



                                Enrique















> -------------------------------------------------------
> This SF.Net email is sponsored by xPML, a groundbreaking scripting language
> that extends applications into web and mobile media. Attend the live webcast
> and join the prime developer group breaking into this new coding territory!
> http://sel.as-us.falkag.net/sel?cmd=lnk&kid=110944&bid=241720&dat=121642
> _______________________________________________
> Bacula-users mailing list
> Bacula-users@lists.sourceforge.net
> https://lists.sourceforge.net/lists/listinfo/bacula-users


-------------------------------------------------------
This SF.Net email is sponsored by xPML, a groundbreaking scripting language
that extends applications into web and mobile media. Attend the live webcast
and join the prime developer group breaking into this new coding territory!
http://sel.as-us.falkag.net/sel?cmd=lnk&kid=110944&bid=241720&dat=121642
_______________________________________________
Bacula-users mailing list
Bacula-users@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/bacula-users

Reply via email to