Hi Konstantin, Thanks for the response. What still concerned me is:
1. Am I able to recover from checkpoints even if I change my program (for example, changing Filter and Map functions, data objects, ..) ? I was not able to recover from savepoints when I changed my program. On Mon, Mar 25, 2019 at 3:19 PM Konstantin Knauf <konstan...@ververica.com> wrote: > Hi Son, > > yes, this is possible, but your sink needs to play its part in Flink's > checkpointing mechanism. Depending on the implementation of the sink you > should either: > > * implemented *CheckpointedFunction *and flush all records to BigQuery in > *snapshotState*. This way in case of a failure/restart of the job, all > records up to the last successful checkpoint will have been written to > BigQuery and all other records will be replayed. > * use managed operator state to store all pending records in the sink. > Thereby they will be be persisted in *snapshotState*. This way in case > of a failure/restart of the job, all records up to the last successful > checkpoint, which have not been written to BigQuery, will be restored in > the sink, all other records will be replayed. > > In both cases, you might write the same record to the BigQuery twice. > > If in doubt if your sink fulfills the criteria above, feel free to share > it. > > Cheers, > > Konstantin > > > > On Mon, Mar 25, 2019 at 7:50 AM Son Mai <hongson1...@gmail.com> wrote: > >> Hello, >> >> I have a topic in Kafka that Flink reads from. I parse and write messages >> in this topic to BigQuery using streaming insert in batch of 500 messages >> using in CountWindow in Flink. >> >> *Problem*: I want to commit manually only when a batch was written >> successfully to Bigquery. >> >> *Reason:* >> I saw that Flink KafkaConsumer does not use offset committing to Kafka >> but uses its own checkpointing. I don't know how Flink checkpointing works >> and I'm worried that Flink's checkpointing does not solve my following >> situation: >> - let's say I have a Flink job running and processing a batch of 500 >> messages of Kafka offset 1000-1500. >> - I stopped this job before it saves to BigQuery and makes some >> modifications to the program. Savepoints did not work when I tried because >> it required the operators code does not change. >> >> What I want is when I start the modified app, it would start every time >> from offset 1000-1500 in Kafka because these messages have not been written >> to BigQuery. >> >> Is there any way to achieve this in Flink? >> >> Thanks, >> SM >> > > > -- > > Konstantin Knauf | Solutions Architect > > +49 160 91394525 > > <https://www.ververica.com/> > > Follow us @VervericaData > > -- > > Join Flink Forward <https://flink-forward.org/> - The Apache Flink > Conference > > Stream Processing | Event Driven | Real Time > > -- > > Data Artisans GmbH | Invalidenstrasse 115, 10115 Berlin, Germany > > -- > Data Artisans GmbH > Registered at Amtsgericht Charlottenburg: HRB 158244 B > Managing Directors: Dr. Kostas Tzoumas, Dr. Stephan Ewen >