Hey Thomas, Hmm, I see no reason why you should not be able to update the checkpoint interval at runtime, and don't believe that information is stored in a savepoint. Can you share the JobManager logs of the job where this is ignored?
Thanks, Austin On Wed, Jul 21, 2021 at 11:47 AM Thms Hmm <thms....@gmail.com> wrote: > Hey Austin, > > Thanks for your help. > > I tried to change the checkpoint interval as example. The value for it > comes from an additional config file and is read and set within main() of > the job. > > The job is running in Application mode. Basically the same configuration > as from the official Flink website but instead of running the JobManager as > job it is created as deployment. > > For the redeployment of the job the REST API is triggered to create a > savepoint and cancel the job. After completion the deployment is updated > and the pods are recreated. The -s <latest_savepoint> Is always added as a > parameter to start the JobManager (standalone-job.sh). CLI is not involved. > We have automated these steps. But I tried the steps manually and have the > same results. > > I also tried to trigger a savepoint, scale the pods down, update the start > parameter with the recent savepoint and renamed ‚kubernetes.cluster-id‘ as > well as ‚high-availability.storageDir‘. > > When I trigger a savepoint with cancel, I also see that the HA config maps > are cleaned up. > > > Kr Thomas > > Austin Cawley-Edwards <austin.caw...@gmail.com> schrieb am Mi. 21. Juli > 2021 um 16:52: > >> Hi Thomas, >> >> I've got a few questions that will hopefully help get to find an answer: >> >> What job properties are you trying to change? Something like parallelism? >> >> What mode is your job running in? i.e., Session, Per-Job, or Application? >> >> Can you also describe how you're redeploying the job? Are you using the >> Native Kubernetes integration or Standalone (i.e. writing k8s manifest >> files yourself)? It sounds like you are using the Flink CLI as well, is >> that correct? >> >> Thanks, >> Austin >> >> On Wed, Jul 21, 2021 at 4:05 AM Thms Hmm <thms....@gmail.com> wrote: >> >>> Hey, >>> >>> we have some application clusters running on Kubernetes and explore the >>> HA mode which is working as expected. When we try to upgrade a job, e.g. >>> trigger a savepoint, cancel the job and redeploy, Flink is not restarting >>> from the savepoint we provide using the -s parameter. So all state is lost. >>> >>> If we just trigger the savepoint without canceling the job and redeploy >>> the HA mode picks up from the latest savepoint. >>> >>> But this way we can not upgrade job properties as they were picked up >>> from the savepoint as it seems. >>> >>> Is there any advice on how to do upgrades with HA enabled? >>> >>> Flink version is 1.12.2. >>> >>> Thanks for your help. >>> >>> Kr thomas >>> >>