On Mon, Mar 06, 2006 at 09:32:27PM +0100, Kern Sibbald wrote: > Hello, > > On Monday 06 March 2006 19:48, Enrique de la Torre Gordaliza wrote: > > Hi all! > > > > I'm a bit confused with "reschedule on error" behavior. Im running a 1.38.5 > > server on a Xeon EM64T linux (GNU/Debian) server. > > > > > > Ive add to my job configuration (my workstation is not always online): > > > > Rerun Failed Levels = yes > > Reschedule on Error = yes > > Reschedule Interval = 30 # (short time just for test) > > Reschedule Times = 2 > > Run Before Job = "nc -w5 -z XXX.XXX.XXX.XXX 9102" > > > > I stop file daemon and run this job from bconsole and wait few minutes. It > > fails but get no mail about it (mail has been working for Backup Failed and > > OK notifications, operator notifications too). Ive found some emty .mail > > > > files on working directory: > > :/usr/local/bacula/bin/working# ls -1l *.mail > > > > -rw-r----- 1 bacula bacula 0 2006-03-06 18:42 > > neptuno-dir.Sunipx1.2006-03-06_18.42.31.8031880.mail > > > > corresponding to this job. If I "status dir" at bconsole: > > > > Running Jobs: > > JobId Level Name Status > > ====================================================================== > > 328 Increme Sunipx1.2006-03-06_18.42.31 has a fatal error > > ==== > > > > I cannot clean this "fatal error". No cancel or delete command can. It > > seems it's not the usual behavior of reschedule on error, is it? Is it a > > configuration problem? a compiler problem (its 64 bit on linux but compiled > > without -O2)? > > > > Logs and trace dont show any suspicious error... If I try > > > > Rerun Failed Levels = yes > > # Reschedule on Error = yes > > # Reschedule Interval = 30 # (short time just for test) > > # Reschedule Times = 2 > > Run Before Job = "nc -w5 -z XXX.XXX.XXX.XXX 9102" > > > > it works great with mail notification about RunBeforeJob exit status so it > > seems just? a "Reschedule on Error" issue. > > > > First, rescheduling has always been a bit fragile, and I haven't yet made a > regression script to test it, so there could well be a bug in 1.38.5. That > said, I don't really see a problem here. When the job fails, it probably > puts a message in the job report, but the job is then rescheduled, and waits > for the reschedule interval to expire. While it is waiting, there is no way > to kill it off (in fact, if you do, perhaps it will get confused as this > something I did not test). Only when it is again running will you be able to > cancel it. The only thing that I see a bit weird is that you apparently have > a 30 second restart period. However, if you modified this value without > stopping and restarting the Director it will not be valid (i.e. even if you > do a reload, the value may not change). > > Anyway, if you are really sure that this is not working, it would be worth a > bug report. I believe that several users *are* successfully using > rescheduling though ... > >
I have 30 seconds restart period just for testing. I had the same problem for 5 hours period (my first attempt). Its my first bacula installation, so I dont now the expected behavior for Reschedule on Error. I mean, I dont now if I should recieve and email if all the director attempts have failed. The empty files on working directory make me believe that I should, but It seems there is a problem (but I can recieve all other notifications). I dont know If after all failed attempts, the fatal error status should be showed at "Terminated Jobs:" on status dir output, or at "Running Jobs:". If it has failed all attempts, why is it listed at "Running Jobs:"? I try to cancel the job after all attempts, not while its waiting. I try to cancel it to take it from "Running Jobs" to "Terminated Jobs" after last try, but I cant. It seems that the job has a problem to finish on last try (no mail is sent and no status is set correctly). If I start file daemon while it is waiting next try, It works perfectly and have an OK status backup after a while. Thanks in advance, Enrique > ------------------------------------------------------- > This SF.Net email is sponsored by xPML, a groundbreaking scripting language > that extends applications into web and mobile media. Attend the live webcast > and join the prime developer group breaking into this new coding territory! > http://sel.as-us.falkag.net/sel?cmd=lnk&kid=110944&bid=241720&dat=121642 > _______________________________________________ > Bacula-users mailing list > Bacula-users@lists.sourceforge.net > https://lists.sourceforge.net/lists/listinfo/bacula-users ------------------------------------------------------- This SF.Net email is sponsored by xPML, a groundbreaking scripting language that extends applications into web and mobile media. Attend the live webcast and join the prime developer group breaking into this new coding territory! http://sel.as-us.falkag.net/sel?cmd=lnk&kid=110944&bid=241720&dat=121642 _______________________________________________ Bacula-users mailing list Bacula-users@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/bacula-users