Hi Rahul!

1. Will FLink-Kafka consumer 0.8x run on multiple task slots or a single task 
slot? Basically I mean if its going to be a parallel operation or a non 
parallel operation?
Yes, the FlinkKafkaConsumer is a parallel consumer.

2. If its a parallel operation, then do multiple task slots read data from 
single kafka partition or multiple kafka partition?
Each single parallel instance of a FlinkKafkaConsumer source can subscribe to 
multiple Kafka partitions. Each Kafka partition is handled by exactly one 
FlinkKafkaConsumer parallel instance.

3. If data is read from multiple Kafka partition, then how duplication is 
avoided? Is it done from KAFKA or by FLINK?
Yes, the FlinkKafkaConsumer is a parallel consumer.I’m not sure exactly what 
you are referring to by “duplication” here. Do you mean duplication in the data 
itself in the Kafka topics, or duplicated consumption by Flink?
If it is the former: prior to Kafka 0.11, Kafka writes did not support 
transactions and therefore can only have at-least-once writes.
If you mean the latter: the FlinkKafkaConsumer achieves exactly-once guarantees 
when consuming from Kafka topics using Flink’s checkpointing mechanism. You can 
read about that here [1][2].
Hope the pointers help!

- Gordon

[1] 
https://ci.apache.org/projects/flink/flink-docs-release-1.3/internals/stream_checkpointing.html
[2] 
https://ci.apache.org/projects/flink/flink-docs-release-1.3/dev/connectors/kafka.html#kafka-consumers-and-fault-tolerance

On 22 September 2017 at 10:46:55 AM, Rahul Raj (rahulrajms...@gmail.com) wrote:

Hi,

I have just started working with FLINK and I am working on a project which 
involves reading KAFKA data and processing it. Following questions came to my 
mind:

1. Will FLink-Kafka consumer 0.8x run on multiple task slots or a single task 
slot? Basically I mean if its going to be a parallel operation or a non 
parallel operation?

2. If its a parallel operation, then do multiple task slots read data from 
single kafka partition or multiple kafka partition?

3. If data is read from multiple Kafka partition, then how duplication is 
avoided? Is it done from KAFKA or by FLINK?

Rahul Raj

Reply via email to