On 08/26/2014 11:41 PM, Kurt Griffiths wrote: > Hi folks, > > I ran some rough benchmarks to get an idea of where Zaqar currently stands > re latency and throughput for Juno. These results are by no means > conclusive, but I wanted to publish what I had so far for the sake of > discussion. > > Note that these tests do not include results for our new Redis driver, but > I hope to make those available soon. > > As always, the usual disclaimers apply (i.e., benchmarks mostly amount to > lies; these numbers are only intended to provide a ballpark reference; you > should perform your own tests, simulating your specific scenarios and > using your own hardware; etc.). > > ## Setup ## > > Rather than VMs, I provisioned some Rackspace OnMetal[8] servers to > mitigate noisy neighbor when running the performance tests: > > * 1x Load Generator > * Hardware > * 1x Intel Xeon E5-2680 v2 2.8Ghz > * 32 GB RAM > * 10Gbps NIC > * 32GB SATADOM > * Software > * Debian Wheezy > * Python 2.7.3 > * zaqar-bench from trunk with some extra patches[1] > * 1x Web Head > * Hardware > * 1x Intel Xeon E5-2680 v2 2.8Ghz > * 32 GB RAM > * 10Gbps NIC > * 32GB SATADOM > * Software > * Debian Wheezy > * Python 2.7.3 > * zaqar server from trunk @47e07cad > * storage=mongodb > * partitions=4 > * MongoDB URI configured with w=majority > * uWSGI + gevent > * config: http://paste.openstack.org/show/100592/ > * app.py: http://paste.openstack.org/show/100593/ > * 3x MongoDB Nodes > * Hardware > * 2x Intel Xeon E5-2680 v2 2.8Ghz > * 128 GB RAM > * 10Gbps NIC > * 2x LSI Nytro WarpDrive BLP4-1600[2] > * Software > * Debian Wheezy > * mongod 2.6.4 > * Default config, except setting replSet and enabling periodic > logging of CPU and I/O > * Journaling enabled > * Profiling on message DBs enabled for requests over 10ms > > For generating the load, I used the zaqar-bench tool we created during > Juno as a stepping stone toward integration with Rally. Although the tool > is still fairly rough, I thought it good enough to provide some useful > data[3]. The tool uses the python-zaqarclient library. > > Note that I didn’t push the servers particularly hard for these tests; web > head CPUs averaged around 20%, while the mongod primary’s CPU usage peaked > at around 10% with DB locking peaking at 5%. > > Several different messaging patterns were tested, taking inspiration > from: https://wiki.openstack.org/wiki/Use_Cases_(Zaqar) > > Each test was executed three times and the best time recorded. > > A ~1K sample message (1398 bytes) was used for all tests. > > ## Results ## > > ### Event Broadcasting (Read-Heavy) ### > > OK, so let's say you have a somewhat low-volume source, but tons of event > observers. In this case, the observers easily outpace the producer, making > this a read-heavy workload. > > Options > * 1 producer process with 5 gevent workers > * 1 message posted per request > * 2 observer processes with 25 gevent workers each > * 5 messages listed per request by the observers > * Load distributed across 4[7] queues > * 10-second duration[4] > > Results > * Producer: 2.2 ms/req, 454 req/sec > * Observer: 1.5 ms/req, 1224 req/sec > > ### Event Broadcasting (Balanced) ### > > This test uses the same number of producers and consumers, but note that > the observers are still listing (up to) 5 messages at a time[5], so they > still outpace the producers, but not as quickly as before. > > Options > * 2 producer processes with 10 gevent workers each > * 1 message posted per request > * 2 observer processes with 25 gevent workers each > * 5 messages listed per request by the observers > * Load distributed across 4 queues > * 10-second duration > > Results > * Producer: 2.2 ms/req, 883 req/sec > * Observer: 2.8 ms/req, 348 req/sec > > ### Point-to-Point Messaging ### > > In this scenario I simulated one client sending messages directly to a > different client. Only one queue is required in this case[6]. > > Note the higher latency. While running the test there were 1-2 message > posts that skewed the average by taking much longer (~100ms) than the > others to complete. Such outliers are probably present in the other tests > as well, and further investigation is need to discover the root cause. > > Options > * 1 producer process with 1 gevent worker > * 1 message posted per request > * 1 observer process with 1 gevent worker > * 1 message listed per request > * All load sent to a single queue > * 10-second duration > > Results > * Producer: 5.5 ms/req, 179 req/sec > * Observer: 3.5 ms/req, 278 req/sec > > ### Task Distribution ### > > This test uses several producers and consumers in order to simulate > distributing tasks to a worker pool. In contrast to the observer worker > type, consumers claim and delete messages in such a way that each message > is processed once and only once. > > Options > * 2 producer processes with 25 gevent workers > * 1 message posted per request > * 2 consumer processes with 25 gevent workers > * 5 messages claimed per request, then deleted one by one before > claiming the next batch of messages > * Load distributed across 4 queues > * 10-second duration > > Results > * Producer: 2.5 ms/req, 798 req/sec > * Consumer > * Claim: 8.4 ms/req > * Delete: 2.5 ms/req > * 813 req/sec (overall) > > ### Auditing / Diagnostics ### > > This test is the same as performed in Task Distribution, but also adds a > few observers to the mix: > > Options > * 2 producer processes with 25 gevent workers each > * 1 message posted per request > * 2 consumer processes with 25 gevent workers each > * 5 messages claimed per request, then deleted one by one before > claiming the next batch of messages > * 1 observer processes with 5 gevent workers each > * 5 messages listed per request > * Load distributed across 4 queues > * 10-second duration > > Results > * Producer: 2.2 ms/req, 878 req/sec > * Consumer > * Claim: 8.2 ms/req > * Delete: 2.3 ms/req > * 876 req/sec (overall) > * Observer: 7.4 ms/req, 133 req/sec > > ## Conclusions ## > > While more testing is needed to track performance against increasing > load (spoiler: latency will increase), these initial results are > Encouraging; turning around requests in ~10 (or even ~20) ms is fast > enough for a variety of use cases. I anticipate enabling the keystone > middleware will add 1-2 ms (assuming tokens are cached). > > Let’s keep digging and see what we can learn, and what needs to be > improved.
Kurt, Thanks a lot for working on this. These results are indeed encouraging from a performance point of view. I'm looking forward to see the results of these tests on the new redis driver. I think the next round should focus on doing the same tests with keystone enabled since I'd expect most of the deployers to use Zaqar with it. Flavio > > @kgriffs > > -------- > > [1]: https://review.openstack.org/#/c/116384/ > [2]: Yes, I know that's some crazy IOPS, but there is plenty of RAM to > avoid paging, so you should be able to get similar results with some > regular disks, assuming they are decent enough to support enabling > journaling (if you need that level of durability). > [3]: It would be interesting to verify the results presented here using > Tsung and/or JMeter; zaqar-bench isn't particularly efficient, but it does > provide the potential to do some interesting reporting, such as measuring > the total end-to-end time of enqueuing and subsequently dequeuing each > Message (TODO). In any case, I'd love to see the team set up a > benchmarking cluster that runs 2-3 tools regularly (or as part of every > patch) and reports the results so we always know where we stand. > [4]: Yes, I know this is a short duration; I'll try to do some longer > tests in my next round of benchmarking. > [5]: In a real app, messages will usually be requested in batches. > [6]: In this test, the target client does not send a response message back > to the sender. However, if it did, the test would still only require a > single queue, since in Zaqar queues are duplex. > [7]: Chosen somewhat arbitrarily. > [8]: One might argue that the only thing these performance tests show > is that *OnMetal* is fast. However, as I pointed out, there was plenty > of headroom left on these servers during the tests, so similar results > should be achievable using more modest hardware. > > _______________________________________________ > OpenStack-dev mailing list > OpenStack-dev@lists.openstack.org > http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev > -- @flaper87 Flavio Percoco _______________________________________________ OpenStack-dev mailing list OpenStack-dev@lists.openstack.org http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev