Hi, I'm interested in benchmarking the impact of compaction on producers and consumers and long term cluster stability. That's not *quite* the impact of it on the server side, but it certainly plays into it. For example, I'd like to be able to answer "in configuration X, if we write N messages into a compacted topic with a certain key range, a certain number of deletes etc, *then* replay that into a consumer that does nothing. How long does that consumer take? What happens if we're continually running that compaction process, along with restarting consumers once an hour or two, *and* producing a lot of messages. What happens to perf on compacted topics with different disk configurations (e.g. magnetic vs ssd, RAID vs JBOD).
I certainly welcome some topic/partition specific compaction metrics, and would be willing to contribute there. Thanks Tom On Wed, May 18, 2016 at 1:32 PM, Manikumar Reddy <manikumar.re...@gmail.com> wrote: > Hi, > > There is a kafka.tools.TestLogCleaning tool, which is used to stress test > the compaction feature. > This tool validates the correctness of compaction process. This tool can be > improved for perf testing. > > I think you want to benchmark server side compaction process. Currently we > have few compaction > related metrics. We may need to add few more topic specific metrics for > better analysis. > > log compaction related JMX metrics: > kafka.log:type=LogCleaner,name=cleaner-recopy-percent > kafka.log:type=LogCleaner,name=max-buffer-utilization-percent > kafka.log:type=LogCleaner,name=max-clean-time-secs > kafka.log:type=LogCleanerManager,name=max-dirty-percent > > Manikumar > > On Tue, May 17, 2016 at 8:45 PM, Tom Crayford <tcrayf...@heroku.com> > wrote: > > > Hi there, > > > > As noted in the 0.10.0.0-RC4 release thread, we (Heroku Kafka) have been > > doing extensive benchmarking of Kafka. In our case this is to help give > > customers a good idea of the performance of our various configurations. > For > > this we orchestrate the Kafka `producer-perf.sh` and `consumer-perf.sh` > > across multiple machines, which was relatively easy to do and very > > successful (recently leading to a doc change and a good lesson about > 0.10). > > > > However, we're finding one thing missing from the current > producer/consumer > > perf tests, which is that there's no good perf testing on compacted > topics. > > Some folk will undoubtedly use compacted topics, so it would be extremely > > helpful (I think) for the community to have benchmarks that test > > performance on compacted topics. We're interested in working on this and > > contributing it upstream, but are pretty unsure what such a test should > > look like. One straw proposal is to adapt the existing producer/consumer > > perf tests to work on a compacted topic, likely with an additional flag > on > > the producer that lets you choose how wide a key range to emit, if it > > should emit deletes (and how often to do so) and so on. Is there anything > > more we could or should do there? > > > > We're happy writing the code here, and want to continue contributing > back, > > I'd just love a hand thinking about what perf tests for compacted topics > > should look like. > > > > Thanks > > > > Tom Crayford > > Heroku Kafka > > >