Re: How many workers do you run on one machine using django celery?

Scott Anderson Sun, 21 Apr 2013 08:07:26 -0700

You can't test a system like this by sending one message: you're just 
testing the latency, not throughput. Latency is the end-to-end time it 
takes for a single message to make its way through the system. Throughput 
is the number of total messages per second that can make their way through. 
As long as your tasks are not sensitive to delays (SMS messages are not, 
generally), a queueing system can help greatly increase the overall 
throughput.

Queueing systems are for spreading work around so they can be completed *in 
aggregate* more quickly and reliably. They're not for reducing the latency 
of a single message.

SQS in particular is architected for massive scale and reliability. To 
achieve this the latency for a single message is very high, but it can 
handle millions and millions of messages per second overall.

If you test with a single thread feeding and a single thread reading (as in 
the amazon-sqs-vs-rabbitmq blog) you're strictly testing queue latency, not 
throughput.

Time taken to process all of the messages will look something like this, 
where:

Nm= number of messages
Ts = SQS latency, or 3 to 4s from your tests
Te = time to process a message for enqueuing
Td = time to process a dequeued task
Ne = number of enqueue workers
Nd = number of dequeue workers

*As long as Ne * Te <= Nd * Td (ie. the enqueue workers can keep up with 
the dequeuing workers)*, the total time to process Nm messages will look 
like this:

Te + Ts + (ceil(Nm / Nd) * Td)

Or: 
<enqueue processing for one message><SQS><Nd tasks being processed in 
parallel>

You can starve a queueing system on the front as well as the back (which is 
what that blog post does).

So here's a more appropriate test:
Nm = 100,000 messages
Ts = 4s
Te = 20ms, time to ready a message to send
Td = 200ms, time for the task to process a message
Ne = 1 thread putting messages on the queue
Nd = 10 threads pulling messages from the queue

You'll probably find that the entire thing will take this much time:

20ms + 4s + (ceil(100,000 / 10) * 200ms), or just over 2004s.

Up the enqueue threads to 10, and dequeue workers to 100:

20ms + 4s + (ceil(100,000 / 100) * 200ms), or just over 204s.

Note that the SQS latency is a constant, however.

In other words, it will take 3-4 seconds to get a message through the 
queue, and then whatever your task execution time is, all for any 
individual message. But you'll be processing 10 at a time through this 
pipeline. Increase the number of enqueuers and dequeuers and your 
throughput will scale linearly, assuming you spread the workers amongst 
enough EC2 instances to handle the load of the tasks themselves. You're 
trading end-to-end latency for higher throughput.

If you only send 1 message though, it looks like this with 1, 10, and 100 
dequeue workers:

20ms + 4s + (ceil(1 / 1) * 200ms) == 4020ms + (1 * 200ms) == 4.22s
20ms + 4s + (ceil(1 / 10) * 200ms) == 4020ms + (1 * 200ms) == 4.22s
20ms + 4s + (ceil(1 / 100) * 200ms) == 4020ms + (1 * 200ms) == 4.22s

So, at a single message you're testing latency only, not throughput.

For the visual folk out there, in this amazingly well-rendered ASCII 
representation of a parallel communication system each line is a message, 
the distance between Start and End is the latency, and the height of the 
stack is throughput, and the distance from the first start to the last end 
is the amount of time it takes to process all of the messages.

What you tested:

(Start ========== End)
<-------- 4s -------->

What you would test with 5 workers enqueuing and dequeuing in parallel:

(Start ========== End)
 (Start ========== End)
  (Start ========== End)
   (Start ========== End)
    (Start ========== End)
     (Start ========== End)
<--------- 4s + N -------->

Where N is based on the parallel execution time of individual tasks by the 
dequeue workers.

A single RabbitMQ system will have much lower latency but won't be able to 
handle the high aggregate throughput of SQS, and at higher message rates 
will fall behind.

(Start = End)(Start = End)(Start = End)
 (Start = End)(Start = End)(Start = End)
<-------------------------------------->

Obviously this is neither to scale nor truly representative, but hopefully 
it helps to illustrate the point. The takeaway is that the more dequeue 
workers you have, the more overall throughput a system like SQS can give 
you (modulus EC2 time for RabbitMQ vs. SQS costs, which is a completely 
different discussion).

That said if you feel like maintaining your own RabbitMQ cluster with the 
maintenance that it would entail, for lower message throughputs RabbitMQ 
may be cheaper for the same throughput.

Regards,
-scott

On Sunday, April 21, 2013 5:47:40 AM UTC-4, sparky wrote:
>
> One last thing to add, the task it's self does not seems to be the issue, 
> 'got message from broker'  is the 3-4 second wait I can see.
>

-- 
You received this message because you are subscribed to the Google Groups 
"Django users" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to django-users+unsubscr...@googlegroups.com.
To post to this group, send email to django-users@googlegroups.com.
Visit this group at http://groups.google.com/group/django-users?hl=en.
For more options, visit https://groups.google.com/groups/opt_out.

Re: How many workers do you run on one machine using django celery?

Reply via email to