Re: [web2py] Re: Best practice using scheduler as a task queue?

Michael Toomim Tue, 26 Jun 2012 20:01:42 -0700

In case it is useful to someone, here is the full code I used with locking, 
using postgresql advisory locks. The benefit of using postgresql's locks 
are that:
  • It locks on the database—works across multiple clients
  • The locks are automatically released if a client disconnects from the db
  • I think it's fast


def check_daemon(task_name, period=None):
    period = period or 4

    tasks_query = ((db.scheduler_task.function_name == task_name)
                   & db.scheduler_task.status.belongs(('QUEUED',
                                                       'ASSIGNED',
                                                       'RUNNING',
                                                       'ACTIVE')))

    # Launch a launch_queue task if there isn't one already
    tasks = db(tasks_query).select()
    if len(tasks) > 1:          #  Check for error
        raise Exception('Too many open %s tasks!!!  Noooo, there are %s'
                        % (task_name, len(tasks)))
    if len(tasks) < 1:
        if not db.executesql('select pg_try_advisory_lock(1);')[0][0]:
            debug('Tasks table is already locked.')
            return

        # Check again now that we're locked
        if db(tasks_query).count() >= 1:
            debug('Caught a race condition! Glad we got outa there!')
            db.executesql('select pg_advisory_unlock(1);')
            return

        debug('Adding a %s task!', task_name)
        db.scheduler_task.insert(function_name=task_name,
                                     application_name='utility/utiliscope',
                                     task_name=task_name,
                                     stop_time = now + 
timedelta(days=90000),
                                     repeats=0, period=period)
        db.commit()
        db.executesql('select pg_advisory_unlock(1);')

    elif tasks[0].period != period:
        debug('Updating period for task %s', task_name)
        tasks[0].update_record(period=period)
        db.commit()

check_daemon('process_launch_queue_task')
check_daemon('refresh_hit_status')
check_daemon('process_bonus_queue')



On Tuesday, June 26, 2012 7:57:25 PM UTC-7, Michael Toomim wrote:
>
> All, thank you for the excellent discussion!
>
> I should explain why I posted that recommendation. The "vision" of using 
> the scheduler for background tasks was:
>
> "Woohoo, this scheduler will *automatically handle locks*—so I don't need 
> to worry about stray background processes running in parallel 
> automatically, and it will *automatically start/stop the processes* with 
> the web2py server with -K, which makes it much easier to deploy the code!"
>
>  
> It turned out:
>   • Setting up scheduler tasks was complicated in itself.
>      • 3 static tasks had to be inserted into every new db.
>        This requires new installations of my software to run a setup 
> routine. Yuck.
>      • When I made that automatic in models/, it required locks to avoid 
> db race condition.
>        (I used postgresql advisory locks. Not cross-platform, but I dunno 
> a better solution.)
>      • The goal was to avoid locks in the first place!
>   • When things go wrong, it's harder to debug.
>     • The scheduler adds a new layer of complexity.
>     • Because now I have to make sure my tasks are there properly.
>     • And then look for the scheduler_run instances to see how they went.
>
> I must admit that this second problem would probably go away if we fixed 
> all the scheduler's bugs! But it still leaves me uneasy. And I don't like 
> having 40,000 scheduler_run instances build up over time.
>
> At this point, I realized that what I really want is a new feature in 
> web2py that:
>   • Runs a function in models (akin to scheduler's executor function) in a 
> subprocess repeatedly
>   • Ensures, with locks etc., that:
>      • Only one is running at a time
>      • That it dies if the parent web2py process dies
>
> And it seems better to just implement this as a web2py feature, than to 
> stranglehold the scheduler into a different design.
>
> Cron's @reboot is very close to this. I used to use it. The problems:
>   • I still had to implement my own locks and kills. (what I was trying to 
> avoid)
>   • It spawns 2 python subprocesses for each cron task (ugly, but not 
> horrible)
>   • It was really buggy. @reboot didn't work. I think massimo fixed this.
>   • Syntax is gross.
> I basically just got scared of cron.
> Now I guess I'm scared of everything. :/
>
> Hopefully this detailed report of my experience will be of help to 
> somebody. I'm sure that fixing the bugs will make things 5x better. I will 
> try your new scheduler.py Niphlod!
>
> On Tuesday, June 26, 2012 12:13:32 PM UTC-7, Niphlod wrote:
>>
>> problem here started as "I can't ensure my app to insert only one task 
>> per function", that is not a scheduler problem "per se": it's a common 
>> database problem. Would have been the same if someone created a 
>> db.define_table('mytable',
>>      Field('name'),
>>      Field('uniquecostraint')
>> )
>> and have to ensure, without specifying Field('uniquecostraint', 
>> unique=True) that there are no records with the same value into the column 
>> uniquecostraint.
>>
>> From there to "now I have tasks stuck in RUNNING status, please avoid 
>> using the scheduler" without any further details, the leap is quite 
>> "undocumented".
>>
>> And please do note that scheduler in trunk has gone under some changes: 
>> there was a point in time where abnormally killed schedulers (as kill 
>> -SIGKILL the process) left tasks in RUNNING status, that would not be 
>> picked up by subsequent scheduler processes.
>>
>> That was a design issue: if a task is RUNNING and you kill scheduler 
>> while the task was processed, you had no absolutely way to tell what the 
>> function did (say, send a batch of 500 emails) before it was actually 
>> killed. 
>> If the task was not planned properly it could send e.g. 359 mails, be 
>> killed, and if it was picked up again by another scheduler after the "first 
>> killed round" 359 of your recipients would get 2 identical mails.
>> It has been decided to requeue RUNNING tasks without any active worker 
>> doing that (i.e. leave to the function the eventual check of what has been 
>> done), so now RUNNING tasks with a dead worker assigned get requeued.
>>
>> With other changes (soon in trunk, the previously attached file) you're 
>> able to stop workers, so they may be killed "ungracefully" being sure that 
>> they're not processing tasks.
>>
>> If you need more details, as always, I'm happy to help, and other 
>> developers too, I'm sure :D
>>
>>

--

Re: [web2py] Re: Best practice using scheduler as a task queue?

Reply via email to