OK, played around a bit and produced a sample task that will demonstrate 
the problem.  And also a FIX.

in a model file put
# -*- coding: utf-8 -*-
from datetime import datetime
db.define_table("scheduler_testing",
    Field('start_time', 'datetime', default = datetime.now()),
    Field('my_message', 'string')
    )

def scheduler_testing():
    start_time = datetime.now()
    for i in range(5):
        db.scheduler_testing.insert(start_time = start_time,
                                    my_message = 'i=%s' %i)
        db.commit()
        if i == 3:
            #intentional error
            oops = 1/0
    return None

from gluon.scheduler import Scheduler
scheduler = Scheduler(db)

Then configure the scheduler_task as follows:
Task Name: Scheduler Error Test
Function Name: scheduler_testing
Args: []
Vars: {}
Enabled: yes
Start Time: Just pick a time
Repeats: 0 (unlimited)
Retry Failed: -1 (unlimited)
Period: 120 seconds
Prevent Drift: yes  <--This is the critical part to seeing the bug
Timeout: 20


Fire up a scheduler process and watch both the scheduler_testing table and 
your scheduler_task's record. You'll see that records are indeed being 
written into scheduler_testing so the task is working (well at least until 
the point where an error is deliberately included). Now take a look at the 
scheduler_task record. You'll notice that while the failure count keeps 
going up the Next Run Time value isn't progressing beyond the first 
increment (the next run time gets calculated before the task is actually 
run) which results in the task bring picked up again almost immediately 
rather than waiting 2 minutes as the period told it to. *This behavior only 
happens 
if using the Prevent Drift option. *I suspect that the root problem is that 
*upon 
failure the Times Run value is not being incremented* and when using 
Prevent Drift the Next Run Time is calculated based on the original start 
time plus # of runs multiplied by the period seconds.  Looking through the 
source code for gluon/scheduler.py I'm pretty sure that the problem exists 
between lines 911 and 932. Specifically, I think that the database update 
of the task record  that is on lines 928-932 should be including times_run=
task.times_run,

So
#gluon/scheduler.py starting at line 911if task_report.status == COMPLETED:
d = dict(status=status,
next_run_time=task.next_run_time,
times_run=task.times_run,
times_failed=0
)
db(st.id == task.task_id).update(**d)
if status == COMPLETED:
self.update_dependencies(db, task.task_id)
else:
st_mapping = {'FAILED': 'FAILED',
'TIMEOUT': 'TIMEOUT',
'STOPPED': 'FAILED'}[task_report.status]
status = (task.retry_failed
and task.times_failed < task.retry_failed
and QUEUED or task.retry_failed == -1
and QUEUED or st_mapping)
db(st.id == task.task_id).update(
times_failed=db.scheduler_task.times_failed + 1,
next_run_time=task.next_run_time,
status=status
)

Should probably become

#gluon/scheduler.py starting at line 911
if task_report.status == COMPLETED:
d = dict(status=status,
next_run_time=task.next_run_time,
times_run=task.times_run,
times_failed=0
)
db(st.id == task.task_id).update(**d)
if status == COMPLETED:
self.update_dependencies(db, task.task_id)
else:
st_mapping = {'FAILED': 'FAILED',
'TIMEOUT': 'TIMEOUT',
'STOPPED': 'FAILED'}[task_report.status]
status = (task.retry_failed
and task.times_failed < task.retry_failed
and QUEUED or task.retry_failed == -1
and QUEUED or st_mapping)
db(st.id == task.task_id).update(
times_failed=db.scheduler_task.times_failed + 1,
times_run = task.times_run,#<---new line to fix bug
next_run_time=task.next_run_time,
status=status
)


I've made the above change to my local version of gluon/scheduler.py and it 
does appear to fix the problem so after. So if Massimo, Nilphod or another 
developer can review and add the patch I'm sure at least Alex & I would 
appreciate it.

Brian



On Saturday, February 20, 2016 at 8:21:24 PM UTC-6, Brian M wrote:
>
> My setup is on Windows with the scheduler running as a service via nssm. I 
> have it set to run every 86400 seconds (24hrs) with infinite number of runs 
> and retries. The timeout is something like 2 or 3 minutes. I am also using 
> the "cron like" option so that it always runs at exactly the same time of 
> day. I wonder if that cron like seeing might have something to do with it 
> like under some failure condition the number of times run field doesn't get 
> updated so the next run time is the current time instead of getting bumped 
> x seconds forward? Alex are you also using the cron like option? 
>
> On Saturday, February 20, 2016 at my 12:43:42 PM UTC-6, Alex wrote:
>>
>> Good to know someone else is experiencing this as well. I think there is 
>> some web2py bug in the scheduler since there is nothing special about my 
>> setup (apache on linux). For now I set retry_failed to 0 to avoid that 
>> problem but I'll test it with -1 in a few weeks when I've got more time. Do 
>> you have the values of this task from the scheduler_task table? maybe it 
>> helps to find out what the problem is.
>>
>> Alex
>>
>>

-- 
Resources:
- http://web2py.com
- http://web2py.com/book (Documentation)
- http://github.com/web2py/web2py (Source code)
- https://code.google.com/p/web2py/issues/list (Report Issues)
--- 
You received this message because you are subscribed to the Google Groups 
"web2py-users" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to web2py+unsubscr...@googlegroups.com.
For more options, visit https://groups.google.com/d/optout.

Reply via email to