In case it is useful to someone, here is the full code I used with locking, using postgresql advisory locks. The benefit of using postgresql's locks are that: • It locks on the database—works across multiple clients • The locks are automatically released if a client disconnects from the db • I think it's fast
def check_daemon(task_name, period=None): period = period or 4 tasks_query = ((db.scheduler_task.function_name == task_name) & db.scheduler_task.status.belongs(('QUEUED', 'ASSIGNED', 'RUNNING', 'ACTIVE'))) # Launch a launch_queue task if there isn't one already tasks = db(tasks_query).select() if len(tasks) > 1: # Check for error raise Exception('Too many open %s tasks!!! Noooo, there are %s' % (task_name, len(tasks))) if len(tasks) < 1: if not db.executesql('select pg_try_advisory_lock(1);')[0][0]: debug('Tasks table is already locked.') return # Check again now that we're locked if db(tasks_query).count() >= 1: debug('Caught a race condition! Glad we got outa there!') db.executesql('select pg_advisory_unlock(1);') return debug('Adding a %s task!', task_name) db.scheduler_task.insert(function_name=task_name, application_name='utility/utiliscope', task_name=task_name, stop_time = now + timedelta(days=90000), repeats=0, period=period) db.commit() db.executesql('select pg_advisory_unlock(1);') elif tasks[0].period != period: debug('Updating period for task %s', task_name) tasks[0].update_record(period=period) db.commit() check_daemon('process_launch_queue_task') check_daemon('refresh_hit_status') check_daemon('process_bonus_queue') On Tuesday, June 26, 2012 7:57:25 PM UTC-7, Michael Toomim wrote: > > All, thank you for the excellent discussion! > > I should explain why I posted that recommendation. The "vision" of using > the scheduler for background tasks was: > > "Woohoo, this scheduler will *automatically handle locks*—so I don't need > to worry about stray background processes running in parallel > automatically, and it will *automatically start/stop the processes* with > the web2py server with -K, which makes it much easier to deploy the code!" > > > It turned out: > • Setting up scheduler tasks was complicated in itself. > • 3 static tasks had to be inserted into every new db. > This requires new installations of my software to run a setup > routine. Yuck. > • When I made that automatic in models/, it required locks to avoid > db race condition. > (I used postgresql advisory locks. Not cross-platform, but I dunno > a better solution.) > • The goal was to avoid locks in the first place! > • When things go wrong, it's harder to debug. > • The scheduler adds a new layer of complexity. > • Because now I have to make sure my tasks are there properly. > • And then look for the scheduler_run instances to see how they went. > > I must admit that this second problem would probably go away if we fixed > all the scheduler's bugs! But it still leaves me uneasy. And I don't like > having 40,000 scheduler_run instances build up over time. > > At this point, I realized that what I really want is a new feature in > web2py that: > • Runs a function in models (akin to scheduler's executor function) in a > subprocess repeatedly > • Ensures, with locks etc., that: > • Only one is running at a time > • That it dies if the parent web2py process dies > > And it seems better to just implement this as a web2py feature, than to > stranglehold the scheduler into a different design. > > Cron's @reboot is very close to this. I used to use it. The problems: > • I still had to implement my own locks and kills. (what I was trying to > avoid) > • It spawns 2 python subprocesses for each cron task (ugly, but not > horrible) > • It was really buggy. @reboot didn't work. I think massimo fixed this. > • Syntax is gross. > I basically just got scared of cron. > Now I guess I'm scared of everything. :/ > > Hopefully this detailed report of my experience will be of help to > somebody. I'm sure that fixing the bugs will make things 5x better. I will > try your new scheduler.py Niphlod! > > On Tuesday, June 26, 2012 12:13:32 PM UTC-7, Niphlod wrote: >> >> problem here started as "I can't ensure my app to insert only one task >> per function", that is not a scheduler problem "per se": it's a common >> database problem. Would have been the same if someone created a >> db.define_table('mytable', >> Field('name'), >> Field('uniquecostraint') >> ) >> and have to ensure, without specifying Field('uniquecostraint', >> unique=True) that there are no records with the same value into the column >> uniquecostraint. >> >> From there to "now I have tasks stuck in RUNNING status, please avoid >> using the scheduler" without any further details, the leap is quite >> "undocumented". >> >> And please do note that scheduler in trunk has gone under some changes: >> there was a point in time where abnormally killed schedulers (as kill >> -SIGKILL the process) left tasks in RUNNING status, that would not be >> picked up by subsequent scheduler processes. >> >> That was a design issue: if a task is RUNNING and you kill scheduler >> while the task was processed, you had no absolutely way to tell what the >> function did (say, send a batch of 500 emails) before it was actually >> killed. >> If the task was not planned properly it could send e.g. 359 mails, be >> killed, and if it was picked up again by another scheduler after the "first >> killed round" 359 of your recipients would get 2 identical mails. >> It has been decided to requeue RUNNING tasks without any active worker >> doing that (i.e. leave to the function the eventual check of what has been >> done), so now RUNNING tasks with a dead worker assigned get requeued. >> >> With other changes (soon in trunk, the previously attached file) you're >> able to stop workers, so they may be killed "ungracefully" being sure that >> they're not processing tasks. >> >> If you need more details, as always, I'm happy to help, and other >> developers too, I'm sure :D >> >> --