Re: Efficient Kafka batch processing

Ewen Cheslack-Postava Sat, 10 Dec 2016 18:30:21 -0800

You may actually want this implemented in a Streams app eventually, there
is a KIP being discussed to support this type of incremental batch
processing in Streams:
https://cwiki.apache.org/confluence/display/KAFKA/KIP-95%3A+Incremental+Batch+Processing+for+Kafka+Streams

However, for now the approach you mentioned using a consumer would be the
best approach. When you start up the app you can use the endOffsets API to
determine what offset you should treat as the last offset:
http://docs.confluent.io/3.1.1/clients/javadocs/org/apache/kafka/clients/consumer/KafkaConsumer.html#endOffsets(java.util.Collection)
In terms of memory usage, you'll simply need to process in reasonably sized
blocks. If you can already handle incremental processing like this then
presumably it should be possible to create smaller sub-blocks and just run
that process N times if you have too many messages.

-Ewen

On Sat, Dec 10, 2016 at 10:29 AM, Dominik Safaric <dominiksafa...@gmail.com>
wrote:

> Hi everyone,
>
> What is among the most efficient ways to fast consume, transform and
> process Kafka messages? Importantly, I am not referring nor interested in
> streams, because the Kafka topic from which I would like to process the
> messages will eventually stop receiving messages, after which I should
> process the messages by extracting certain keys in a batch processing like
> manner.
>
> So far I’ve implemented a a Kafka Consumer group that consumers these
> messages, hashes them according to a certain key, and upon retrieval of the
> last message starts the processing script. However, I am dealing with
> exactly 100.000.000 log messages, each of 16 bytes, meaning that preserving
> 1.6GB of data in-memory i.e. on heap is not the most efficient manner -
> performance and memory wise.
>
> Regards,
> Dominik
>
>

-- 
Thanks,
Ewen

Re: Efficient Kafka batch processing

Reply via email to