Hi,

I'm interested in benchmarking the impact of compaction on producers and
consumers and long term cluster stability. That's not *quite* the impact of
it on the server side, but it certainly plays into it. For example, I'd
like to be able to answer "in configuration X, if we write N messages into
a compacted topic with a certain key range, a certain number of deletes
etc, *then* replay that into a consumer that does nothing. How long does
that consumer take? What happens if we're continually running that
compaction process, along with restarting consumers once an hour or two,
*and* producing a lot of messages. What happens to perf on compacted topics
with different disk configurations (e.g. magnetic vs ssd, RAID vs JBOD).

I certainly welcome some topic/partition specific compaction metrics, and
would be willing to contribute there.

Thanks

Tom

On Wed, May 18, 2016 at 1:32 PM, Manikumar Reddy <manikumar.re...@gmail.com>
wrote:

> Hi,
>
> There is a kafka.tools.TestLogCleaning tool, which is used to stress test
> the compaction feature.
> This tool validates the correctness of compaction process. This tool can be
> improved for perf testing.
>
> I think you want to benchmark server side compaction process.  Currently we
> have few compaction
> related metrics. We may need to add few more topic specific metrics for
> better analysis.
>
> log compaction related JMX metrics:
> kafka.log:type=LogCleaner,name=cleaner-recopy-percent
> kafka.log:type=LogCleaner,name=max-buffer-utilization-percent
> kafka.log:type=LogCleaner,name=max-clean-time-secs
> kafka.log:type=LogCleanerManager,name=max-dirty-percent
>
> Manikumar
>
> On Tue, May 17, 2016 at 8:45 PM, Tom Crayford <tcrayf...@heroku.com>
> wrote:
>
> > Hi there,
> >
> > As noted in the 0.10.0.0-RC4 release thread, we (Heroku Kafka) have been
> > doing extensive benchmarking of Kafka. In our case this is to help give
> > customers a good idea of the performance of our various configurations.
> For
> > this we orchestrate the Kafka `producer-perf.sh` and `consumer-perf.sh`
> > across multiple machines, which was relatively easy to do and very
> > successful (recently leading to a doc change and a good lesson about
> 0.10).
> >
> > However, we're finding one thing missing from the current
> producer/consumer
> > perf tests, which is that there's no good perf testing on compacted
> topics.
> > Some folk will undoubtedly use compacted topics, so it would be extremely
> > helpful (I think) for the community to have benchmarks that test
> > performance on compacted topics. We're interested in working on this and
> > contributing it upstream, but are pretty unsure what such a test should
> > look like. One straw proposal is to adapt the existing producer/consumer
> > perf tests to work on a compacted topic, likely with an additional flag
> on
> > the producer that lets you choose how wide a key range to emit, if it
> > should emit deletes (and how often to do so) and so on. Is there anything
> > more we could or should do there?
> >
> > We're happy writing the code here, and want to continue contributing
> back,
> > I'd just love a hand thinking about what perf tests for compacted topics
> > should look like.
> >
> > Thanks
> >
> > Tom Crayford
> > Heroku Kafka
> >
>

Reply via email to