[jira] [Commented] (KAFKA-7432) API Method on Kafka Streams for processing chunks/batches of data

sam (JIRA) Mon, 17 Dec 2018 02:32:22 -0800


    [ 
https://issues.apache.org/jira/browse/KAFKA-7432?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16722855#comment-16722855
 ]


sam commented on KAFKA-7432:
----------------------------

[~Yohan123] This ticket is not really for "microbatching", perhaps you could 
call it "nanobatching" since unlike spark these batches are meant to be very 
small.

Kafka itself is technically always nanobatching anyway, since you do not want 
to ack messages one by one - as that is very inefficient.  Typically when you 
use the lower level KafkaConsumer API, you will process very small batches of 
data.  I would hazard a guess (without reading the code) that this is also how 
Kafka Streams is implemented.

> API Method on Kafka Streams for processing chunks/batches of data
> -----------------------------------------------------------------
>
>                 Key: KAFKA-7432
>                 URL: https://issues.apache.org/jira/browse/KAFKA-7432
>             Project: Kafka
>          Issue Type: New Feature
>          Components: streams
>            Reporter: sam
>            Priority: Major
>
> For many situations in Big Data it is preferable to work with a small buffer 
> of records at a go, rather than one record at a time.
> The natural example is calling some external API that supports batching for 
> efficiency.
> How can we do this in Kafka Streams? I cannot find anything in the API that 
> looks like what I want.
> So far I have:
> {{builder.stream[String, String]("my-input-topic") 
> .mapValues(externalApiCall).to("my-output-topic")}}
> What I want is:
> {{builder.stream[String, String]("my-input-topic") .batched(chunkSize = 
> 2000).map(externalBatchedApiCall).to("my-output-topic")}}
> In Scala and Akka Streams the function is called {{grouped}} or {{batch}}. In 
> Spark Structured Streaming we can do 
> {{mapPartitions.map(_.grouped(2000).map(externalBatchedApiCall))}}.
>  
>  
> https://stackoverflow.com/questions/52366623/how-to-process-data-in-chunks-batches-with-kafka-streams



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

[jira] [Commented] (KAFKA-7432) API Method on Kafka Streams for processing chunks/batches of data

Reply via email to