OK i've come around- agree this is the right set up, let's just make sure it's clear in the eventual documentation, as it wasn't obvious to me (not much is these days..) - both retries and repeats respect the period. Cool, i like it.
On Sunday, August 19, 2012 7:13:15 AM UTC-4, Niphlod wrote: > > I didn't say that you must have to handle exceptions exclusively in your > functions, but that if you want a functionality of the kind "execute this > for the next 2 minutes and retry ASAP 3 times at most" and still you want > to have a single scheduler_task record it's the way to go. Sometimes your > functions relies on third-party services that are not "handable" in your > functions: you can manage the exception but you still want to execute that > function (e.g. you want to send an email but your email server doesn't > reply). The mail should be sent anyway, possibly as soon as the email > server is available again.... here's where the retry_failed comes handy. Of > course if it fails e.g. for 10 times it's better to stop trying and inspect > the email server :P > > On Saturday, August 18, 2012 11:48:41 PM UTC+2, Yarin wrote: >> >> OK i didn't understand that retries happened periodically- i indeed >> thought that it would retry right away, though i agree with you that that >> should be handled at the function level. But if we're handling failures >> within the scheduled function, then now im wondering what is the value in >> having retries at all? Just because the scheduler is running asynchronously >> does not mean it should necessarily be responsible for the scheduled >> functions' unhandled exceptions (which is what the failures are, right)? In >> other words, since our scheduler is scheduling we2py functions in a known >> environment (unlike environment agnostic task-queue systems, which don't >> know how their operations will resolve), shouldn't the onus be on the >> scheduled function to handle failures and reschedule if necessary? Maybe we >> should clarify this before discussing the rest- i may be missing something. >> >> btw im leaving for the night but am interested in finishing the >> discussion- ill be back in the morning if you dont hear from me.. >> >> On Saturday, August 18, 2012 5:14:46 PM UTC-4, Niphlod wrote: >>> >>> Ok, got the example (but not the "the last go-round is forgotten"). >>> Let's start saying that your requirements can be fullfilled (simply) >>> decorating your function in a loop and break after the first successful >>> attempt (and repeats=0, retry_failed=-1). Given that, the current behaviour >>> is not properly a limit to what you are trying to achieve, it's only a >>> matter on how implement the requeue facilities on the scheduler. >>> >>> Lets keep the discussion open...if I got it correctly you're basically >>> asking to ignore period for failed tasks (requeue them and execute ASAP) >>> and reset counters accordingly... right ? Period right now ensures that no >>> more than one task gets executed in n period seconds (and protects you from >>> "flapping", i.e. a continously failing function, and is somewhat required >>> e.g. for respecting webservices API limits, avoid "db pressure" if you're >>> doing heavy operations, etc, etc). >>> Respecting period in every case is "consistency" for me (because I >>> decided that I can "afford" (or "consume" resources) executing that >>> function only one time every hour). >>> You are suggesting to alter this for repeating tasks....what I didn't >>> get is that is required always or only when repeats=0 (that is, >>> incidentally, not consistent :P) ?! >>> >>> i.e. What behaviour should you expect from (repeats=2, retry_failed=3, >>> period=3600) ? >>> 2.00 am, failed >>> 2.00 am, failed >>> 2.00 am, completed >>> 3.00 am, failed >>> 3.00 am, failed >>> 3.00 am, failed >>> ? >>> This is basically what I'm missing. What could possibly be wrong at >>> 2.00am and be right a few seconds later ? >>> >>> On Saturday, August 18, 2012 10:32:14 PM UTC+2, Yarin wrote: >>>> >>>> I think retry_failed and repeats are two distinct concepts and >>>> shouldn't be mixed. >>>> >>>> For example, a task set to (repeats=0, retry_failed=0, period=3600) >>>> should be able to fail at 2:00pm, but will try again at 3:00pm regardless >>>> of what happened at 2:00. Likewise, if it was set to (repeats=0, >>>> retry_failed=2,period=3600), and failed all three times at 2:00pm, the >>>> retry count should be reset on the next go around. >>>> >>>> I think it's safer to presume that if a task is set up for indefinite >>>> repitition, a failure on one repeat should not bring down the whole task- >>>> rather the transactional unit that constitutes a failure should be limited >>>> to the any given attempt, repeated or not. >>>> >>>> This was one of the reasons i pressed for renaming repeats_failed to >>>> retry_failed- distinct concepts >>>> >>>> --