Any ideas, guys? On Mon, May 2, 2022 at 6:11 PM Hemanga Borah <borah.hema...@gmail.com> wrote:
> Hello, > We are attempting to port our Flink applications from one cloud provider > to another. > > These Flink applications consume data from Kafka topics and output to > various destinations (Kafka or databases). The applications have states > stored in them. Some of these stored states are aggregations, for example, > at times we store hours (or days) worth of data to aggregate over time. > Some other applications have cached information for data enrichment, for > example, we store data in Flink state for days, so that we can join them > with newly arrived data. The amount of data on the input topics is a lot, > and it will be expensive to reprocess the data from the beginning of the > topic. > > As such, we want to retain the state of the application when we move to a > different cloud provider so that we can retain the aggregations and cache, > and do not have to start from the beginning of the input topics. > > We are replicating the Kafka topics using MirrorMaker 2. This is our > procedure: > > - Replicate the input topics of each Flink application from source > cloud to destination cloud. > - Take a savepoint of the Flink application on the source cloud > provider. > - Start the Flink application on the destination cloud provider using > the savepoint from the source cloud provider. > > > However, this does not work as we want because there is a difference in > offset in the new topics in the new cloud provider (because of MirrorMaker > implementation). The offsets of the new topic do not match the ones stored > on the Flink savepoint, hence, Flink cannot map to the offsets of the new > topic during startup. > > Has anyone tried to move clouds while retaining the Flink state? > > Thanks, > Hemanga >