Re: Reserving Kafka offset in Flink after modifying app

Son Mai Mon, 25 Mar 2019 18:52:29 -0700

Hi Konstantin,

Thanks for the response. What still concerned me is:


   1. Am I able to recover from  checkpoints even if I change my program
   (for example, changing Filter and Map functions, data objects, ..) ? I was
   not able to recover from savepoints when I changed my program.


On Mon, Mar 25, 2019 at 3:19 PM Konstantin Knauf <konstan...@ververica.com>
wrote:

> Hi Son,
>
> yes, this is possible, but your sink needs to play its part in Flink's
> checkpointing mechanism. Depending on the implementation of the sink you
> should either:
>
> * implemented *CheckpointedFunction *and flush all records to BigQuery in
> *snapshotState*. This way in case of a failure/restart of the job, all
> records up to the last successful checkpoint will have been written to
> BigQuery and all other records will be replayed.
> * use managed operator state to store all pending records in the sink.
> Thereby they will be be persisted in *snapshotState*. This way in  case
> of a failure/restart of the job, all records up to the last successful
> checkpoint, which have not been written to BigQuery, will be restored in
> the sink, all other records will be replayed.
>
> In both cases, you might write the same record to the BigQuery twice.
>
> If in doubt if your sink fulfills the criteria above, feel free to share
> it.
>
> Cheers,
>
> Konstantin
>
>
>
> On Mon, Mar 25, 2019 at 7:50 AM Son Mai <hongson1...@gmail.com> wrote:
>
>> Hello,
>>
>> I have a topic in Kafka that Flink reads from. I parse and write messages
>> in this topic to BigQuery using streaming insert in batch of 500 messages
>> using in CountWindow in Flink.
>>
>> *Problem*: I want to commit manually only when a batch was written
>> successfully to Bigquery.
>>
>> *Reason:*
>> I saw that Flink KafkaConsumer does not use offset committing to Kafka
>> but uses its own checkpointing. I don't know how Flink checkpointing works
>> and I'm worried that Flink's checkpointing does not solve my following
>> situation:
>> - let's say I have a Flink job running and processing a batch of 500
>> messages of Kafka offset 1000-1500.
>> - I stopped this job before it saves to BigQuery and makes some
>> modifications to the program. Savepoints did not work when I tried because
>> it required the operators code does not change.
>>
>> What I want is when I start the modified app, it would start every time
>> from offset 1000-1500 in Kafka because these messages have not been written
>> to BigQuery.
>>
>> Is there any way to achieve this in Flink?
>>
>> Thanks,
>> SM
>>
>
>
> --
>
> Konstantin Knauf | Solutions Architect
>
> +49 160 91394525
>
> <https://www.ververica.com/>
>
> Follow us @VervericaData
>
> --
>
> Join Flink Forward <https://flink-forward.org/> - The Apache Flink
> Conference
>
> Stream Processing | Event Driven | Real Time
>
> --
>
> Data Artisans GmbH | Invalidenstrasse 115, 10115 Berlin, Germany
>
> --
> Data Artisans GmbH
> Registered at Amtsgericht Charlottenburg: HRB 158244 B
> Managing Directors: Dr. Kostas Tzoumas, Dr. Stephan Ewen
>

Re: Reserving Kafka offset in Flink after modifying app

Reply via email to