Re: Clarifications on FLINK-KAFKA consumer

Tzu-Li (Gordon) Tai Fri, 22 Sep 2017 03:34:13 -0700

Hi Rahul!

1. Will FLink-Kafka consumer 0.8x run on multiple task slots or a single task 
slot? Basically I mean if its going to be a parallel operation or a non 
parallel operation?
Yes, the FlinkKafkaConsumer is a parallel consumer.

2. If its a parallel operation, then do multiple task slots read data from
single kafka partition or multiple kafka partition?
Each single parallel instance of a FlinkKafkaConsumer source can subscribe to
multiple Kafka partitions. Each Kafka partition is handled by exactly one
FlinkKafkaConsumer parallel instance.

3. If data is read from multiple Kafka partition, then how duplication is
avoided? Is it done from KAFKA or by FLINK?
Yes, the FlinkKafkaConsumer is a parallel consumer.I’m not sure exactly what
you are referring to by “duplication” here. Do you mean duplication in the data
itself in the Kafka topics, or duplicated consumption by Flink?
If it is the former: prior to Kafka 0.11, Kafka writes did not support
transactions and therefore can only have at-least-once writes.
If you mean the latter: the FlinkKafkaConsumer achieves exactly-once guarantees
when consuming from Kafka topics using Flink’s checkpointing mechanism. You can
read about that here [1][2].
Hope the pointers help!

- Gordon

[1]
https://ci.apache.org/projects/flink/flink-docs-release-1.3/internals/stream_checkpointing.html
[2]
https://ci.apache.org/projects/flink/flink-docs-release-1.3/dev/connectors/kafka.html#kafka-consumers-and-fault-tolerance

On 22 September 2017 at 10:46:55 AM, Rahul Raj (rahulrajms...@gmail.com) wrote:

Hi,

I have just started working with FLINK and I am working on a project which
involves reading KAFKA data and processing it. Following questions came to my
mind:

1. Will FLink-Kafka consumer 0.8x run on multiple task slots or a single task
slot? Basically I mean if its going to be a parallel operation or a non
parallel operation?

2. If its a parallel operation, then do multiple task slots read data from
single kafka partition or multiple kafka partition?

3. If data is read from multiple Kafka partition, then how duplication is
avoided? Is it done from KAFKA or by FLINK?

Rahul Raj

Re: Clarifications on FLINK-KAFKA consumer

Reply via email to