Hi!
If you go with the Batch API, then any failed task (like a sink trying to
insert into the database) will be completely re-executed. That makes sure
no data is lost in any way, no extra effort needed.
It may insert a lot of duplicates, though, if the task is re-started after
half the data was
Hi Stephan,
thank you very much for your answer. I was happy to meet Robert in Munich last
week and he proposed that for our problem, batch processing is the way to go.
We also talked about how exactly to guarantee in this context that no data is
lost even in the case the job dies while writing
Hi!
You can use both the DataSet API or the DataStream API for that. In case of
failures, they would behave slightly differently.
DataSet:
Fault tolerance for the DataSet API works by restarting the job and redoing
all of the work. In some sense, that is similar to what happens in
MapReduce, onl
Hi everyone,
I am considering using Flink in a project. The setting would be a YARN cluster
where data is first read in from HDFS, then processed and finally written into
an Oracle database using an upsert command. If I understand the documentation
correctly, the DataSet API would be the natura