[ https://issues.apache.org/jira/browse/KAFKA-8295?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17352731#comment-17352731 ]
A. Sophie Blee-Goldman commented on KAFKA-8295: ----------------------------------------------- Thanks for the initial results – I think it would be valuable to try plugging it into Kafka Streams with a basic POC and then running some kind of throughput benchmarks. I imagine you can get some idea of how well this works even with some very rough benchmarks, for example loading up an input topic with a very large amount of data and then using the TopologyTestDriver to compare how many records can be processed within some constant time (eg 5 minutes) between the POC and the original. As long as there is enough input data to ensure it won't run out of records to process before that time limit is up, this should give us a good sense of how the merge operator compares. Does that make sense? It may be that the jmh benchmarks for the ByteBuffer optimization could be reused for this too > Optimize count() using RocksDB merge operator > --------------------------------------------- > > Key: KAFKA-8295 > URL: https://issues.apache.org/jira/browse/KAFKA-8295 > Project: Kafka > Issue Type: Improvement > Components: streams > Reporter: A. Sophie Blee-Goldman > Assignee: Sagar Rao > Priority: Major > > In addition to regular put/get/delete RocksDB provides a fourth operation, > merge. This essentially provides an optimized read/update/write path in a > single operation. One of the built-in (C++) merge operators exposed over the > Java API is a counter. We should be able to leverage this for a more > efficient implementation of count() > > (Note: Unfortunately it seems unlikely we can use this to optimize general > aggregations, even if RocksJava allowed for a custom merge operator, unless > we provide a way for the user to specify and connect a C++ implemented > aggregator – otherwise we incur too much cost crossing the jni for a net > performance benefit) -- This message was sent by Atlassian Jira (v8.3.4#803005)