Hi,

I'm writing an application that uses the scheduler heavily. I create new
task using this simple function (there is a commit() because it's in an
external script):

def schedule_movie(*args, **kwargs):
    db_scheduler.scheduler_task.insert(
        status='QUEUED',
        application_name='milo',
        task_name='schedule_movie',
        function_name='import_or_update_movie',
        args=json.dumps(args),
        vars=json.dumps(kwargs),
        enabled=True,
#        start_time = request.now,
#        stop_time = request.now+datetime.timedelta(days=10),
        repeats = 1,
        timeout = 3600,
    )
    db_scheduler.commit()

I'm using 7 workers to run tasks (run as ./web2py.py -K milo).  If I've
less than 100 tasks everything seems fine (i tested up to 50) but as
soon as i schedule 35k tasks system becomes unstable (high load) due to
database operations and a lot of deadlocks begin to happen between workers.

In particular it seems that the queries that set the status of a task as
ASSIGNED is not correct. In fact doing this query:

db((db_scheduler.scheduler_task.status!='QUEUED')&(db_scheduler.scheduler_task.status!='COMPLETED')).select()

sometimes returns all the tasks as ASSIGNED (even if i've only 7 workers!).

This produces a deadlock of like this:

Traceback (most recent call last):
  File "/home/goshawk/web2py/gluon/shell.py", line 214, in run
    exec(python_code, _env)
  File "<string>", line 1, in <module>
  File "/home/goshawk/web2py/gluon/scheduler.py", line 365, in loop
    MetaScheduler.loop(self)
  File "/home/goshawk/web2py/gluon/scheduler.py", line 257, in loop
    task = self.pop_task()
  File "/home/goshawk/web2py/gluon/scheduler.py", line 395, in pop_task
    grabbed.update(assigned_worker_name='',status=QUEUED)
  File "/home/goshawk/web2py/gluon/dal.py", line 7591, in update
    return self.db._adapter.update(tablename,self.query,fields)
  File "/home/goshawk/web2py/gluon/dal.py", line 1116, in update
    self.execute(sql)
  File "/home/goshawk/web2py/gluon/dal.py", line 1392, in execute
    return self.log_execute(*a, **b)
  File "/home/goshawk/web2py/gluon/dal.py", line 1386, in log_execute
    ret = self.cursor.execute(*a, **b)
TransactionRollbackError: deadlock detected
DETAIL:  Process 7147 waits for ShareLock on transaction 68012578;
blocked by process 5038.
Process 5038 waits for ShareLock on transaction 68012565; blocked by
process 7147.
HINT:  See server log for query details.

Looking at the log in the database i found that the deadlock is indeed
in the query that updates status to ASSIGNED.


2012-08-01 20:07:03 CEST ERROR:  deadlock detected
2012-08-01 20:07:03 CEST DETAIL:  Process 5724 waits for ShareLock on
transactio
n 68012520; blocked by process 7147.
        Process 7147 waits for ShareLock on transaction 68012547;
blocked by pro
cess 5724.

        Process 5724: UPDATE scheduler_task SET
status='ASSIGNED',assigned_worke
r_name='whisperer#6a223526-9337-4447-bad7-3aac5ab3e261' WHERE
((((((((scheduler_
task.status IN ('QUEUED','RUNNING')) AND ((scheduler_task.times_run <
scheduler_
task.repeats) OR (scheduler_task.repeats = 0))) AND
(scheduler_task.start_time <= '2012-08-01 20:06:37')) AND
(scheduler_task.stop_time > '2012-08-01 20:06:37')) AND
(scheduler_task.next_run_time <= '2012-08-01 20:06:37')) AND
(scheduler_task.enabled = 'T')) AND (scheduler_task.group_name IN
('main'))) AND (scheduler_task.assigned_worker_name IN
(NULL,'','whisperer#6a223526-9337-4447-bad7-3aac5ab3e261')));

  Process 7147: UPDATE scheduler_task SET
status='ASSIGNED',assigned_worker_name='whisperer#9f4bae50-24b1-4613-9724-ecfcbc083100'
WHERE ((((((((scheduler_task.status IN ('QUEUED','RUNNING')) AND
((scheduler_task.times_run < scheduler_task.repeats) OR
(scheduler_task.repeats = 0))) AND (scheduler_task.start_time <=
'2012-08-01 20:06:03')) AND (scheduler_task.stop_time > '2012-08-01
20:06:03')) AND (scheduler_task.next_run_time <= '2012-08-01 20:06:03'))
AND (scheduler_task.enabled = 'T')) AND (scheduler_task.group_name IN
('main'))) AND (scheduler_task.assigned_worker_name IN
(NULL,'','whisperer#9f4bae50-24b1-4613-9724-ecfcbc083100')));


Can it be a bug of the scheduler? Where can i find the code about it in
the web2py source tree?

Thank you

-- 
Vincenzo Ampolo
http://vincenzo-ampolo.net
http://goshawknest.wordpress.com

-- 



Reply via email to