Re: Checkpoints in batch processing & JDBC Output Format

2015-11-18 Thread Stephan Ewen
Hi! If you go with the Batch API, then any failed task (like a sink trying to insert into the database) will be completely re-executed. That makes sure no data is lost in any way, no extra effort needed. It may insert a lot of duplicates, though, if the task is re-started after half the data was

Re: Checkpoints in batch processing & JDBC Output Format

2015-11-16 Thread Maximilian Bode
Hi Stephan, thank you very much for your answer. I was happy to meet Robert in Munich last week and he proposed that for our problem, batch processing is the way to go. We also talked about how exactly to guarantee in this context that no data is lost even in the case the job dies while writing

Re: Checkpoints in batch processing & JDBC Output Format

2015-11-11 Thread Stephan Ewen
Hi! You can use both the DataSet API or the DataStream API for that. In case of failures, they would behave slightly differently. DataSet: Fault tolerance for the DataSet API works by restarting the job and redoing all of the work. In some sense, that is similar to what happens in MapReduce, onl

Checkpoints in batch processing & JDBC Output Format

2015-11-09 Thread Maximilian Bode
Hi everyone, I am considering using Flink in a project. The setting would be a YARN cluster where data is first read in from HDFS, then processed and finally written into an Oracle database using an upsert command. If I understand the documentation correctly, the DataSet API would be the natura