Victor wrote: > Hi, > > I was planning to use Django to create a webapp that runs several > Python applications in parallel with user input data. The applications > take a while to run (>2 minutes) when they work in parallel, so I want > to make a page that says "Processing..." and show the output of each > of the applications as they finish. Is there a way to do this without > using threading? Does Django even support this functionality? > There are a few ways to implement this. After working in an environment with similar requirements and restrictions, this is how I would do it:
Your site will accept jobs via the web interface. It only stores the *request* for whatever is needed in a database. It won't try to execute the request on its own, or in a thread. An external script periodically checks the database for jobs. If it sees a new job, it sets the status, and begins processing. If your worker script is smart enough, it could even update the percentage complete, or store what is being done specifically in the database. I've always thought that threading could be a viable way to do this type of task, but the more I've read about it, and the more I've seen its pitfalls I've decided that it's evil. If the webserver crashes, it's more difficult to resume the task. The external script doesn't care if the webserver is even running, and having the jobs in a database is a built-in error recovery mechanism. I've even added jobs to the queue manually at times. The user's "processing" page should likely be ajax-driven. It'd poll the site for the status every few seconds until the job is done, or there is an error. The external script will either need to be a daemon or a cronjob that is periodically run. I prefer making the external script its own daemon. We have a system that usually only gets one job every few minutes, and the job won't take more than a minute to complete. Most jobs will take less than a second to complete. Occasionally, we need to run hundreds of jobs all at once. Since it uses a "polling" model, a delay between checking the database is usually a good idea, but even a two second delay between each job of 500 will put a huge performance hit for when we need the large jobs. The solution is to check the database for new jobs, and then run *all* new jobs that are currently in the database. After processing all jobs, it waits two seconds and polls again. The 500 jobs (most of which take a fraction of a second) get done very quickly, and the database won't get over-hammered by our polling. If daemonizing your own script isn't an option, a similar objective can be achieved with a cron job. I'd suggest using a lock file to prevent the system(s) from being bogged down with too many requests. The external script can still use your models and the Django ORM, even though it isn't a web application. I'm working on a set of examples of things I've implemented using Django, but I haven't gotten my example for this one running quite yet. Hopefully this is enough to give you an idea as to how to design things. I don't really have any code that is sharable, but the concept is fairly straightforward (I hope.) I'm happy to clarify anything if needed. Happy Coding! Jeff Anderson
signature.asc
Description: OpenPGP digital signature