Hi Kostas, Thanks for pointing me in the right direction.
I have gone and extended MessageAcknowledgingSourceBase. It was quite easy to do. I have however some follow-up questions about the guarantees it gives and testing my solution. 1. Guarantees: *Questions:* a. When the acknowledgeIDs method is called, is it certain that all the rest of the operators, including the Sinks finished successfully? E.g: If I have a sink that writes to MySQL/Cassandra and one that writes to SQS/Kafka, will the writes to both of these systems have been completed successfully before acknowledgeIDs is called? b. Messages can be duplicated in case the processing takes longer than the queue timeout or if there are failures and Flink needs to recover. This is a problem for sinks that write to non-idempotent systems e.g. SQS, Kafka. What are the recommended approaches to avoid this duplication? Are there any example repos? 2. Testing: *Work done so far:* In order to convince myself that indeed this source is reliable, I wrote some integration tests. I used LocalFlinkMiniCluster which is quite nice. However when I tried to test what happens when I kill the TaskManager running the task that is executing my MessageAcknowledgingSourceBase I found it not so straight forward. I managed to get around it by starting the cluster and the job, getting all the task manager actor references, adding a new task manager to the cluster, killing the initial task managers by sending a poison pill actor msg. I had to kill all the initial task managers as I did not find a way to get a mapping between task running the source and the task manager actor to which it got assigned. *Questions:* a. Is there some better api for fine grained killing of various services, tasks, resources in a Flink cluster or job? b. Can you point me to a repo with reliability tests for Flink i.e. where things are killed to see whether the system recovers etc? Thanks, M On Tue, Apr 25, 2017 at 9:23 AM, Kostas Kloudas <k.klou...@data-artisans.com > wrote: > Hi Martin! > > For an example of a source that acknowledges received messages, you could > check the MessageAcknowledgingSourceBase > and the MultipleIdsMessageAcknowledgingSourceBase that ship with Flink. I > hope this will give you some ideas. > > Now for the Flink version on top of which to implement your source, I > would suggest the Flink 1.3. The reason is that it will > come out soon (~1 month) and it will include a lot of new features and > bug-fixes. Until then, it may change a bit, but the APIs > that you will be using, will not change. > > So why not going straight for the more future-proof way? > > Thanks, > Kostas > > > On Apr 24, 2017, at 11:20 PM, Martin Eden <martineden...@gmail.com> > wrote: > > > > Hi everyone, > > > > Are there any examples of how to implement a reliable (zero data loss) > Flink source reading from a system that is not replay-able but supports > acknowledging messages? > > > > Or any pointers of how one can achieve this and how Flink can help? > > > > I imagine it should involve a write ahead log but not yet clear of how > to implement it and how to integrate with the Flink fault tolerance > mechanism. Can Flink maintain the write ahead log for me? > > > > Also, does it make sense to start implementing this in the current > stable Flink release 1.2 or is there any advantage in implementing it > directly in Flink 1.3 since it is coming up soon anyway? > > > > Thanks, > > M > >