You can't test a system like this by sending one message: you're just testing the latency, not throughput. Latency is the end-to-end time it takes for a single message to make its way through the system. Throughput is the number of total messages per second that can make their way through. As long as your tasks are not sensitive to delays (SMS messages are not, generally), a queueing system can help greatly increase the overall throughput.
Queueing systems are for spreading work around so they can be completed *in aggregate* more quickly and reliably. They're not for reducing the latency of a single message. SQS in particular is architected for massive scale and reliability. To achieve this the latency for a single message is very high, but it can handle millions and millions of messages per second overall. If you test with a single thread feeding and a single thread reading (as in the amazon-sqs-vs-rabbitmq blog) you're strictly testing queue latency, not throughput. Time taken to process all of the messages will look something like this, where: Nm= number of messages Ts = SQS latency, or 3 to 4s from your tests Te = time to process a message for enqueuing Td = time to process a dequeued task Ne = number of enqueue workers Nd = number of dequeue workers *As long as Ne * Te <= Nd * Td (ie. the enqueue workers can keep up with the dequeuing workers)*, the total time to process Nm messages will look like this: Te + Ts + (ceil(Nm / Nd) * Td) Or: <enqueue processing for one message><SQS><Nd tasks being processed in parallel> You can starve a queueing system on the front as well as the back (which is what that blog post does). So here's a more appropriate test: Nm = 100,000 messages Ts = 4s Te = 20ms, time to ready a message to send Td = 200ms, time for the task to process a message Ne = 1 thread putting messages on the queue Nd = 10 threads pulling messages from the queue You'll probably find that the entire thing will take this much time: 20ms + 4s + (ceil(100,000 / 10) * 200ms), or just over 2004s. Up the enqueue threads to 10, and dequeue workers to 100: 20ms + 4s + (ceil(100,000 / 100) * 200ms), or just over 204s. Note that the SQS latency is a constant, however. In other words, it will take 3-4 seconds to get a message through the queue, and then whatever your task execution time is, all for any individual message. But you'll be processing 10 at a time through this pipeline. Increase the number of enqueuers and dequeuers and your throughput will scale linearly, assuming you spread the workers amongst enough EC2 instances to handle the load of the tasks themselves. You're trading end-to-end latency for higher throughput. If you only send 1 message though, it looks like this with 1, 10, and 100 dequeue workers: 20ms + 4s + (ceil(1 / 1) * 200ms) == 4020ms + (1 * 200ms) == 4.22s 20ms + 4s + (ceil(1 / 10) * 200ms) == 4020ms + (1 * 200ms) == 4.22s 20ms + 4s + (ceil(1 / 100) * 200ms) == 4020ms + (1 * 200ms) == 4.22s So, at a single message you're testing latency only, not throughput. For the visual folk out there, in this amazingly well-rendered ASCII representation of a parallel communication system each line is a message, the distance between Start and End is the latency, and the height of the stack is throughput, and the distance from the first start to the last end is the amount of time it takes to process all of the messages. What you tested: (Start ========== End) <-------- 4s --------> What you would test with 5 workers enqueuing and dequeuing in parallel: (Start ========== End) (Start ========== End) (Start ========== End) (Start ========== End) (Start ========== End) (Start ========== End) <--------- 4s + N --------> Where N is based on the parallel execution time of individual tasks by the dequeue workers. A single RabbitMQ system will have much lower latency but won't be able to handle the high aggregate throughput of SQS, and at higher message rates will fall behind. (Start = End)(Start = End)(Start = End) (Start = End)(Start = End)(Start = End) <--------------------------------------> Obviously this is neither to scale nor truly representative, but hopefully it helps to illustrate the point. The takeaway is that the more dequeue workers you have, the more overall throughput a system like SQS can give you (modulus EC2 time for RabbitMQ vs. SQS costs, which is a completely different discussion). That said if you feel like maintaining your own RabbitMQ cluster with the maintenance that it would entail, for lower message throughputs RabbitMQ may be cheaper for the same throughput. Regards, -scott On Sunday, April 21, 2013 5:47:40 AM UTC-4, sparky wrote: > > One last thing to add, the task it's self does not seems to be the issue, > 'got message from broker' is the 3-4 second wait I can see. > -- You received this message because you are subscribed to the Google Groups "Django users" group. To unsubscribe from this group and stop receiving emails from it, send an email to django-users+unsubscr...@googlegroups.com. To post to this group, send email to django-users@googlegroups.com. Visit this group at http://groups.google.com/group/django-users?hl=en. For more options, visit https://groups.google.com/groups/opt_out.