Michael -- Here's a common scenario. I'm looking for the best implementation using the > scheduler. > > I want to support a set of background tasks (task1, task2...), where each > task: > • processes a queue of items > • waits a few seconds > > It's safe to have task1 and task2 running in parallel, but I cannot have > two task1s running in parallel. They will duplicately process the same > queue of items > .... > > So how can I ensure there is always EXACTLY ONE of each task in the > database? >
This won't solve your installation / setup issue, but I wonder if it would help with the overrun and timeout problems... Instead of scheduling a periodic task, what about having the task reschedule itself? When it's done with the queue, schedule itself for later. Remove the time limit so it can take whatever time it needs to finish the queue. Or maybe launch a process on startup outside of the scheduler -- when it exhausts the queue, have it sleep and either wake periodically to check the queue, or have it waked when something is inserted. Is the transaction processing issue you encountered with PostgreSQL preventing you from setting up your queue as a real producer consumer queue, where you could have multiple workers? Re. inserting tasks only once: We have a "first run" check in our models to assure that setup code only runs once -- this only runs if the database is empty -- but that's not adequate if you update code on a running system and add a new task. We added an "update check" using a version number -- we write a breadcrumb file into the models directory with the current version, and then check that against a version in the code that is changed by the developers when some update code needs to run or the site needs to take some action -- you might do something like that to insert new tasks just once. (Details: The breadcrumb file is named so it's run first before other models, and contains one statement that sets a global with the version number found during the previous models run. The first "real" model file compares that last version against the current version. If the breadcrumb file didn't exist or the version is different, it runs some update code and rewrites the breadcrumb file. IIRC we open the breadcrumb file for exclusive access and spin if it's locked -- will need to make sure I did that...) I don't think this would help with your case, but will mention... I'm working on chaining scheduler tasks -- letting one task conditionally release held tasks or insert new ones. Our need was different from yours -- we didn't know which task(s) we wanted to run until we read remote data (via a task for that purpose). So our reader task fetches the data, figures out what needs to run and puts work in queues, releases previously scheduled tasks. Since this mod made changes like having unique names for all tasks independent of the task function, there may be some issues with having a task reschedule itself using an unmodified scheduler that I'm not thinking of. As an aside, there's always a problem with processing items in a queue (at least if the items are consumable rather than a persistent to-do list) namely, how do you assure that each item is completely processed, and the work within one item gets done only once, if the worker processing them might fail in the middle of processing an item? If the worker takes the item out of the queue before starting work, then the item is lost if the worker dies. If it leaves the item in the queue but marks it as being worked on by itself, another worker can redo it, but encounters the issue of picking up where the previous one left off. For a database, that might be solved with transactions and rollback (assuming that's working...). This isn't a problem with the scheduler per se -- it's a generic queue processing issue. I'm probably missing some aspect of your situation, so let me say sorry! in advance if this isn't relevant. -- Pat