Hi Gyula, Thanks for answering my questions! >Savepoint upgrades on the other hand would generate a new job id (at least after a recent fix on operator main). Yes, the savepoint can help. However, IMO savepoint is not ideal compared with checkpoints because of 1) performance concern: savepoint does full snapshot which could take a long time especially for jobs with large state 2) Flink jobs need to be running to allow the savepoint to get created. So just simply leverage savepoints instead of checkpoints for all job redeployment / upgrade may not be practical,(e.g. job downtime could be longer than SLA).
My understanding is that the "last-state" upgrade is recommended for the deployments of near-real time use cases that are usually sensitive to latency(including job downtime). So once we want to redeploy a Flink job, we just do in-place updates on the existing FlinkDeployment with "last-state" enabled. Aka, use a job failover trick to achieve job redeployment. What are your thoughts on using "last-state" vs "savepoint"? Would you mind sharing how you use / decide "last-state" vs "savepoint" in production? >I am actually working on adding a new way to perform the last-state upgrade via simple cancellation but that's a slightly orthogonal question. Will this new way help generate a new job.id during last-state upgrade? Thanks, Alan On Tue, Aug 20, 2024 at 10:17 PM Gyula Fóra <gyula.f...@gmail.com> wrote: > Hi Alan! > > The job.id remains the same as the last-state mode uses flinks internal > failover mechanism to access the state. We cannot change the job.id while > doing this unfortunately. > > Savepoint upgrades on the other hand would generate a new job id (at least > after a recent fix on operator main). I am actually working on adding a new > way to perform the last-state upgrade via simple cancellation but that's a > slightly orthogonal question. > > Long story short if you really need to integrate this with the history > server, then you should switch to savepoint upgrades. > > Cheers, > Gyula > > On Wed, Aug 21, 2024 at 12:14 AM Alan Zhang <shuai....@gmail.com> wrote: > >> Hi, >> >> We are using Apache Flink Kubernetes operator to manage the deployment >> lifecycle of our Flink jobs. And we are using the application mode with >> "last-state" upgrade mode for each FlinkDeployment. >> >> As I know, each FlinkDeployment will keep using the same job id across >> different job deployments / upgrades, because the operator uses the >> job failover mechanism to achieve "last-state" upgrade mode. >> However, with it, it seems impossible to integrate with Flink history >> server which uses job.id to differentiate different job deployments. >> >> Questions: >> >> - Is there any way to make the job.id different for "last-state" >> upgrade mode? >> - What could be the right way to enable Flink history server in this >> case? >> >>