All, thank you for the excellent discussion!

I should explain why I posted that recommendation. The "vision" of using 
the scheduler for background tasks was:

"Woohoo, this scheduler will *automatically handle locks*—so I don't need 
to worry about stray background processes running in parallel 
automatically, and it will *automatically start/stop the processes* with 
the web2py server with -K, which makes it much easier to deploy the code!"

 
It turned out:
  • Setting up scheduler tasks was complicated in itself.
     • 3 static tasks had to be inserted into every new db.
       This requires new installations of my software to run a setup 
routine. Yuck.
     • When I made that automatic in models/, it required locks to avoid db 
race condition.
       (I used postgresql advisory locks. Not cross-platform, but I dunno a 
better solution.)
     • The goal was to avoid locks in the first place!
  • When things go wrong, it's harder to debug.
    • The scheduler adds a new layer of complexity.
    • Because now I have to make sure my tasks are there properly.
    • And then look for the scheduler_run instances to see how they went.

I must admit that this second problem would probably go away if we fixed 
all the scheduler's bugs! But it still leaves me uneasy. And I don't like 
having 40,000 scheduler_run instances build up over time.

At this point, I realized that what I really want is a new feature in 
web2py that:
  • Runs a function in models (akin to scheduler's executor function) in a 
subprocess repeatedly
  • Ensures, with locks etc., that:
     • Only one is running at a time
     • That it dies if the parent web2py process dies

And it seems better to just implement this as a web2py feature, than to 
stranglehold the scheduler into a different design.

Cron's @reboot is very close to this. I used to use it. The problems:
  • I still had to implement my own locks and kills. (what I was trying to 
avoid)
  • It spawns 2 python subprocesses for each cron task (ugly, but not 
horrible)
  • It was really buggy. @reboot didn't work. I think massimo fixed this.
  • Syntax is gross.
I basically just got scared of cron.
Now I guess I'm scared of everything. :/

Hopefully this detailed report of my experience will be of help to 
somebody. I'm sure that fixing the bugs will make things 5x better. I will 
try your new scheduler.py Niphlod!

On Tuesday, June 26, 2012 12:13:32 PM UTC-7, Niphlod wrote:
>
> problem here started as "I can't ensure my app to insert only one task per 
> function", that is not a scheduler problem "per se": it's a common database 
> problem. Would have been the same if someone created a 
> db.define_table('mytable',
>      Field('name'),
>      Field('uniquecostraint')
> )
> and have to ensure, without specifying Field('uniquecostraint', 
> unique=True) that there are no records with the same value into the column 
> uniquecostraint.
>
> From there to "now I have tasks stuck in RUNNING status, please avoid 
> using the scheduler" without any further details, the leap is quite 
> "undocumented".
>
> And please do note that scheduler in trunk has gone under some changes: 
> there was a point in time where abnormally killed schedulers (as kill 
> -SIGKILL the process) left tasks in RUNNING status, that would not be 
> picked up by subsequent scheduler processes.
>
> That was a design issue: if a task is RUNNING and you kill scheduler while 
> the task was processed, you had no absolutely way to tell what the function 
> did (say, send a batch of 500 emails) before it was actually killed. 
> If the task was not planned properly it could send e.g. 359 mails, be 
> killed, and if it was picked up again by another scheduler after the "first 
> killed round" 359 of your recipients would get 2 identical mails.
> It has been decided to requeue RUNNING tasks without any active worker 
> doing that (i.e. leave to the function the eventual check of what has been 
> done), so now RUNNING tasks with a dead worker assigned get requeued.
>
> With other changes (soon in trunk, the previously attached file) you're 
> able to stop workers, so they may be killed "ungracefully" being sure that 
> they're not processing tasks.
>
> If you need more details, as always, I'm happy to help, and other 
> developers too, I'm sure :D
>
>

-- 



Reply via email to