Yes Nico. I have evaluated this. I have tried below:
1. Take the savepoint 2. Stop the job 3. Shutdown the instances 4. Started new pod using below command: /docker-entrypoint.sh "standalone-job" "-Ds3.access-key=${AWS_ACCESS_KEY_ID} " "-Ds3.secret-key=${AWS_SECRET_ACCESS_KEY}" "-Ds3.endpoint= ${AWS_S3_ENDPOINT}" "-Dhigh-availability.zookeeper.quorum= ${ZOOKEEPER_CLUSTER}" "--job-classname" "com.test.MySpringBootApp" "--fromSavepoint" "s3://s3-health-service-discovery/savepoints" ${args} I haven't observed any errors during start-up in logs. But the state got reset i.e. values stored inside the accumulator got flushed. On Tue, Oct 5, 2021 at 9:40 PM Nicolaus Weidner < nicolaus.weid...@ververica.com> wrote: > Hi Parag, > > I am not so familiar with the setup you are using, but did you check out > [1]? Maybe the parameter > [--fromSavepoint /path/to/savepoint [--allowNonRestoredState]] > is what you are looking for? > > Best regards, > Nico > > [1] > https://ci.apache.org/projects/flink/flink-docs-master/docs/deployment/resource-providers/standalone/docker/#application-mode-on-docker > > On Tue, Oct 5, 2021 at 12:37 PM Parag Somani <somanipa...@gmail.com> > wrote: > >> Hello, >> >> We are currently using Apache flink 1.12.0 deployed on k8s cluster of >> 1.18 with zk for HA. Due to certain vulnerabilities in container related >> with few jar(like netty-*, meso), we are forced to upgrade. >> >> While upgrading flink to 1.14.0, faced NPE, >> https://issues.apache.org/jira/browse/FLINK-23901?page=com.atlassian.jira.plugin.system.issuetabpanels%3Acomment-tabpanel&focusedCommentId=17402570#comment-17402570 >> >> To address it, I have followed steps >> >> 1. savepoint creation >> 2. Stop the job >> 3. Restore from save point where i am facing challenge. >> >> For step #3 from above, i was able to restore from savepoint mainly >> because: >> "bin/flink run -s :savepointPath [:runArgs] " >> It majorly about restarting a jar file uploaded. As our application is >> based on k8s and running using docker, i was not able to restore it. And >> because of it, state of variables in accumulator got corrupted and i lost >> the data in one of env. >> >> My query is, what is preffered way to restore from savepoint, if >> application is running on k8s using docker. >> >> We are using following command to run job manager: >> /docker-entrypoint.sh "standalone-job" "-Ds3.access-key= >> ${AWS_ACCESS_KEY_ID}" "-Ds3.secret-key=${AWS_SECRET_ACCESS_KEY}" >> "-Ds3.endpoint=${AWS_S3_ENDPOINT}" "-Dhigh-availability.zookeeper.quorum= >> ${ZOOKEEPER_CLUSTER}" "--job-classname" "<class-name>" ${args} >> >> Thank you in advance...! >> >> -- >> Regards, >> Parag Surajmal Somani. >> > -- Regards, Parag Surajmal Somani.