[ https://issues.apache.org/jira/browse/KAFKA-7432?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16722855#comment-16722855 ]
sam commented on KAFKA-7432: ---------------------------- [~Yohan123] This ticket is not really for "microbatching", perhaps you could call it "nanobatching" since unlike spark these batches are meant to be very small. Kafka itself is technically always nanobatching anyway, since you do not want to ack messages one by one - as that is very inefficient. Typically when you use the lower level KafkaConsumer API, you will process very small batches of data. I would hazard a guess (without reading the code) that this is also how Kafka Streams is implemented. > API Method on Kafka Streams for processing chunks/batches of data > ----------------------------------------------------------------- > > Key: KAFKA-7432 > URL: https://issues.apache.org/jira/browse/KAFKA-7432 > Project: Kafka > Issue Type: New Feature > Components: streams > Reporter: sam > Priority: Major > > For many situations in Big Data it is preferable to work with a small buffer > of records at a go, rather than one record at a time. > The natural example is calling some external API that supports batching for > efficiency. > How can we do this in Kafka Streams? I cannot find anything in the API that > looks like what I want. > So far I have: > {{builder.stream[String, String]("my-input-topic") > .mapValues(externalApiCall).to("my-output-topic")}} > What I want is: > {{builder.stream[String, String]("my-input-topic") .batched(chunkSize = > 2000).map(externalBatchedApiCall).to("my-output-topic")}} > In Scala and Akka Streams the function is called {{grouped}} or {{batch}}. In > Spark Structured Streaming we can do > {{mapPartitions.map(_.grouped(2000).map(externalBatchedApiCall))}}. > > > https://stackoverflow.com/questions/52366623/how-to-process-data-in-chunks-batches-with-kafka-streams -- This message was sent by Atlassian JIRA (v7.6.3#76005)