Re: What is the best way to implement time-based / cronjob actions in a Django app?

Shawn Milochik Wed, 13 Oct 2010 13:43:25 -0700

On Oct 13, 2010, at 4:11 PM, ringemup wrote:

> 
>> It's surprisingly easy to get set up with nothing more than the 
>> tutorial/into for django-celery. If anyone has problems with it I'd be happy 
>> to try to assist.
> 
> Thanks, I might take you up on that.
> 
>> Although getting everything working is fairly easy, in my opinion the docs 
>> aren't too clear on how the big picture really works for first-timers.
> 
> Yeah, that's a big reason I never tried to use it.  Would you be
> willing to share a high-level overview with us?
> 
> Thanks!


Okay, so here's how it works, as I understand it. I hope Brian will jump in and 
correct where necessary.

So, as I see it there are three moving parts.

1. Your application.

    A. Your application will have, somewhere, some configuration information 
which allows it to connect to the message broker. 
    B. It will also have one or more files containing callable code (probably 
functions), which are decorated with a Celery decorator. These are referred to 
as "tasks".
    C. It will have other code which will call these decorated functions when 
you want things to run asynchronously (in your views, for example).

2. The broker (traditionally RabbitMQ).

    A. The broker probably lives on another machine, and runs as a service.
    B. The broker knows nothing about your code or applications.
    C. The broker simply receives messages, holds onto them, and passes them on 
when requested.

3. The Celery Daemon (the simplest use-case)

    A. The Celery daemon is a separate process running on the same machine as 
your application.
    B. The Celery daemon uses the same config info (probably the same config 
file) as your application.
    C. The Celery daemon polls the broker regularly, looking for tasks.
    D. When the daemon retrieves a task, it runs it, using the code in your 
application's "tasks" files.

Basic working example:

        1. You have a function in your tasks.py called update_user. It accepts 
an integer as its only argument, which should be the primary key of a user in 
your User table. It is decorated by the Celery decorator "task."

        @task
        def update_user(pk):

            #trivial sample function
            user = User.objects.get(pk = pk)
            user.last_login = datetime.now()
            user.save()

        2. Your application imports your update_user function from your tasks 
file. One of your views calls it like this:  
update_user.delay(request.user.pk).  
        Note that the delay() method is of the Celery task decorator.
        This call to update_user.delay() returns a UUID which you may store for 
later retrieval of the results.

        3. Celery passes a serialized version of this function call to the 
broker. Something like a plain-text "update_user(123)."

        4. The Celery daemon, in its continual polling process, is handed a 
message containing something like 'update_user(123).' It is aware of the 
update_user function because it has been configured to use the task files in 
your application, so it calls your update_user function with the argument 123. 
At this point your code runs. The celery daemon records the result using 
whatever method specified in your Celery config file. This could be in MongoDB, 
passed back to the broker, or several others. Optionally, if the code execution 
fails, Celery may e-mail you.

       5. (Optional) Your application uses the UUID it received in step 2 at a 
later time to ascertain the status of the job. If the result was stored with 
the broker, then it may only be retrieved once; it is considered just a 
plain-old plain-text "message" to the broker, and after being passed on it is 
no longer stored. If the result was stored in a database (such as PostgreSQL or 
MongoDB), then you can request it repeatedly.

I hope this helps, and that others will correct me where I'm blatantly wrong. I 
have intentionally simplified some things so that the basic flow is more 
understandable; much more complex setups are possible, especially ones which 
allow multiple servers to run Celery daemons (and individual servers to run 
multiple daemons). For example, you may have one server handle communication 
tasks (such as sending e-mail and SMS messages), while another server handles 
processing of images. It may be beneficial to do one on your application server 
(where your Django app lives), while doing the more resource-intensive stuff 
(such as transcoding video uploads) on another machine.

Shawn



-- 
You received this message because you are subscribed to the Google Groups 
"Django users" group.
To post to this group, send email to django-us...@googlegroups.com.
To unsubscribe from this group, send email to 
django-users+unsubscr...@googlegroups.com.
For more options, visit this group at 
http://groups.google.com/group/django-users?hl=en.

Re: What is the best way to implement time-based / cronjob actions in a Django app?

Reply via email to