"So in your case I would directly ingest my messages into Kafka" I will do that through a custom SourceFunction that reads the messages from the WebSocket, creates simple java objects (POJOs) and sink them in a Kafka topic using a FlinkKafkaProducer, if that makes sense.
The problem now is I need a DeserializationSchema for my class. I read Flink is able to de/serialize POJO objects by its own, but I'm still required to provide a serializer to create the FlinkKafkaProducer (and FlinkKafkaConsumer). Any idea or example? Should I create a DeserializationSchema for each POJO class I want to put into a Kafka stream? On Tue, Nov 15, 2016 at 7:43 AM, Till Rohrmann <trohrm...@apache.org> wrote: > Hi Matt, > > as you've stated Flink is a stream processor and as such it needs to get > its inputs from somewhere. Flink can provide you up to exactly-once > processing guarantees. But in order to do this, it requires a re-playable > source because in case of a failure you might have to reprocess parts of > the input you had already processed prior to the failure. Kafka is such a > source and people use it because it happens to be one of the most popular > and widespread open source message queues/distributed logs. > > If you don't require strong processing guarantees, then you can simply use > the WebSocket source. But, for any serious use case, you probably want to > have these guarantees because otherwise you just might calculate bogus > results. So in your case I would directly ingest my messages into Kafka and > then let Flink read from the created topic to do the processing. > > Cheers, > Till > > On Tue, Nov 15, 2016 at 8:14 AM, Dromit <dromitl...@gmail.com> wrote: > >> Hello, >> >> As far as I've seen, there are a lot of projects using Flink and Kafka >> together, but I'm not seeing the point of that. Let me know what you think >> about this. >> >> 1. If I'm not wrong, Kafka provides basically two things: storage >> (records retention) and fault tolerance in case of failure, while Flink >> mostly cares about the transformation of such records. That means I can >> write a pipeline with Flink alone, and even distribute it on a cluster, but >> in case of failure some records may be lost, or I won't be able to >> reprocess the data if I change the code, since the records are not kept in >> Flink by default (only when sinked properly). Is that right? >> >> 2. In my use case the records come from a WebSocket and I create a custom >> class based on messages on that socket. Should I put those records inside a >> Kafka topic right away using a Flink custom source (SourceFunction) with a >> Kafka sink (FlinkKafkaProducer), and independently create a Kafka source >> (KafkaConsumer) for that topic and pipe the Flink transformations there? Is >> that data flow fine? >> >> Basically what I'm trying to understand with both question is how and why >> people are using Flink and Kafka. >> >> Regards, >> Matt >> > >