Hi Frank, I'm not really familiar with the internal workings of the Spotify's operator, but here are few general notes:
- You only need the JM process for the REST API to become available (TMs can join in asynchronously). I'd personally aim for < 1m for this step, if it takes longer it could signal a problem with your infrastructure (eg. images taking long time to pull, incorrect setup of liveness / readiness probes, not enough resources). The job is packaged as a fat jar, but it is already baked in the docker > images we use (so technically there would be no need to "submit" it from a > separate pod). > That's where the application mode comes in. Please note that this might be also one of the reasons for previous steps taking too long (as all pods are pulling an image with your fat jar that might not be cached). Then the application needs to start up and load its state from the latest > savepoint, which again takes a couple of minutes > This really depends on the state size, state backend (eg. rocksdb restore might take longer), object store throughput / rate limit. The native-savepoint feature that will come out with 1.15 might help to shave off some time here as the there is no conversion into the state backend structures. Best, D. - On Fri, Mar 25, 2022 at 9:46 AM Frank Dekervel <fr...@kapernikov.com> wrote: > Hello, > > We run flink using the spotify flink Kubernetes operator (job cluster > mode). Everything works fine, including upgrades and crash recovery. We do > not run the job manager in HA mode. > > One of the problems we have is that upon upgrades (or during testing), the > startup time of the flink cluster takes a very long time: > > - First the operator needs to create the cluster (JM+TM), and wait for > it to respond for api requests. This already takes a couple of minutes. > - Then the operator creates a job-submitter pod that submits the job > to the cluster. The job is packaged as a fat jar, but it is already baked > in the docker images we use (so technically there would be no need to > "submit" it from a separate pod). The submission goes rather fast tho (the > time between the job submitter seeing the cluster is online and the "hello" > log from the main program is <1min) > - Then the application needs to start up and load its state from the > latest savepoint, which again takes a couple of minutes > > All steps take quite some time, and we are looking to reduce the startup > time to allow for easier testing but also less downtime during upgrades. So > i have some questions: > > - I wonder if the situation is the same for all kubernetes operators. > I really need some kind of operator because i otherwise i have to set which > savepoint to load from myself every startup. > - What cluster startup time is considered to be acceptable / best > practise ? > - If there are other tricks to reduce startup time, i would be very > interested in knowing them :-) > > There is also a discussion ongoing on running flink on spot nodes. I guess > the startup time is relevant there too. > > Thanks already > Frank > > > > > >