Hello,

As far as I've seen, there are a lot of projects using Flink and Kafka
together, but I'm not seeing the point of that. Let me know what you think
about this.

1. If I'm not wrong, Kafka provides basically two things: storage (records
retention) and fault tolerance in case of failure, while Flink mostly cares
about the transformation of such records. That means I can write a pipeline
with Flink alone, and even distribute it on a cluster, but in case of
failure some records may be lost, or I won't be able to reprocess the data
if I change the code, since the records are not kept in Flink by default
(only when sinked properly). Is that right?

2. In my use case the records come from a WebSocket and I create a custom
class based on messages on that socket. Should I put those records inside a
Kafka topic right away using a Flink custom source (SourceFunction) with a
Kafka sink (FlinkKafkaProducer), and independently create a Kafka source
(KafkaConsumer) for that topic and pipe the Flink transformations there? Is
that data flow fine?

Basically what I'm trying to understand with both question is how and why
people are using Flink and Kafka.

Regards,
Matt

Reply via email to