Hey Hemanga, That's quite annoying of MirrorMaker to change the offsets on you. One solution would be to use the State Processor API[1] to read the savepoint and update the offsets to the new ones — does MirrorMaker give you these ahead of time? There might also be more specific tricks people could give if you're able to share which cloud/ cloud services you're migrating to and from.
Best, Austin [1]: https://nightlies.apache.org/flink/flink-docs-release-1.14/docs/libs/state_processor_api/ On Tue, May 3, 2022 at 5:11 PM Hemanga Borah <borah.hema...@gmail.com> wrote: > Any ideas, guys? > > On Mon, May 2, 2022 at 6:11 PM Hemanga Borah <borah.hema...@gmail.com> > wrote: > >> Hello, >> We are attempting to port our Flink applications from one cloud provider >> to another. >> >> These Flink applications consume data from Kafka topics and output to >> various destinations (Kafka or databases). The applications have states >> stored in them. Some of these stored states are aggregations, for example, >> at times we store hours (or days) worth of data to aggregate over time. >> Some other applications have cached information for data enrichment, for >> example, we store data in Flink state for days, so that we can join them >> with newly arrived data. The amount of data on the input topics is a lot, >> and it will be expensive to reprocess the data from the beginning of the >> topic. >> >> As such, we want to retain the state of the application when we move to >> a different cloud provider so that we can retain the aggregations and >> cache, and do not have to start from the beginning of the input topics. >> >> We are replicating the Kafka topics using MirrorMaker 2. This is our >> procedure: >> >> - Replicate the input topics of each Flink application from source >> cloud to destination cloud. >> - Take a savepoint of the Flink application on the source cloud >> provider. >> - Start the Flink application on the destination cloud provider using >> the savepoint from the source cloud provider. >> >> >> However, this does not work as we want because there is a difference in >> offset in the new topics in the new cloud provider (because of MirrorMaker >> implementation). The offsets of the new topic do not match the ones stored >> on the Flink savepoint, hence, Flink cannot map to the offsets of the new >> topic during startup. >> >> Has anyone tried to move clouds while retaining the Flink state? >> >> Thanks, >> Hemanga >> >