Victor wrote:
> Hi,
>
> I was planning to use Django to create a webapp that runs several
> Python applications in parallel with user input data. The applications
> take a while to run (>2 minutes) when they work in parallel, so I want
> to make a page that says "Processing..." and show the output of each
> of the applications as they finish. Is there a way to do this without
> using threading? Does Django even support this functionality?
>   
There are a few ways to implement this. After working in an environment
with similar requirements and restrictions, this is how I would do it:

Your site will accept jobs via the web interface. It only stores the
*request* for whatever is needed in a database. It won't try to execute
the request on its own, or in a thread.

An external script periodically checks the database for jobs. If it sees
a new job, it sets the status, and begins processing. If your worker
script is smart enough, it could even update the percentage complete, or
store what is being done specifically in the database. I've always
thought that threading could be a viable way to do this type of task,
but the more I've read about it, and the more I've seen its pitfalls
I've decided that it's evil. If the webserver crashes, it's more
difficult to resume the task. The external script doesn't care if the
webserver is even running, and having the jobs in a database is a
built-in error recovery mechanism. I've even added jobs to the queue
manually at times.

The user's "processing" page should likely be ajax-driven. It'd poll the
site for the status every few seconds until the job is done, or there is
an error.

The external script will either need to be a daemon or a cronjob that is
periodically run. I prefer making the external script its own daemon. We
have a system that usually only gets one job every few minutes, and the
job won't take more than a minute to complete. Most jobs will take less
than a second to complete. Occasionally, we need to run hundreds of jobs
all at once. Since it uses a "polling" model, a delay between checking
the database is usually a good idea, but even a two second delay between
each job of 500 will put a huge performance hit for when we need the
large jobs. The solution is to check the database for new jobs, and then
run *all* new jobs that are currently in the database. After processing
all jobs, it waits two seconds and polls again. The 500 jobs (most of
which take a fraction of a second) get done very quickly, and the
database won't get over-hammered by our polling.

If daemonizing your own script isn't an option, a similar objective can
be achieved with a cron job. I'd suggest using a lock file to prevent
the system(s) from being bogged down with too many requests.

The external script can still use your models and the Django ORM, even
though it isn't a web application.

I'm working on a set of examples of things I've implemented using
Django, but I haven't gotten my example for this one running quite yet.
Hopefully this is enough to give you an idea as to how to design things.
I don't really have any code that is sharable, but the concept is fairly
straightforward (I hope.) I'm happy to clarify anything if needed.

Happy Coding!


Jeff Anderson

Attachment: signature.asc
Description: OpenPGP digital signature

Reply via email to