Re: [DISCUSS] Change the default restart-strategy to exponential-delay

2023-12-19 Thread Rui Fan
Thanks everyone for the feedback! It doesn't have more feedback here, so I started the new vote[1] just now to update the default value of backoff-multiplier from 1.2 to 1.5. [1] https://lists.apache.org/thread/0b1dcwb49owpm6v1j8rhrg9h0fvs5nkt Best, Rui On Tue, Dec 12, 2023 at 7:14 PM Maximilia

Re: [DISCUSS] Change the default restart-strategy to exponential-delay

2023-12-12 Thread Maximilian Michels
Thank you Rui! I think a 1.5 multiplier is a reasonable tradeoff between restarting fast but not putting too much pressure on the cluster due to restarts. -Max On Tue, Dec 12, 2023 at 8:19 AM Rui Fan <1996fan...@gmail.com> wrote: > > Hi Maximilian and Mason, > > Thanks a lot for your feedback! >

Re: [DISCUSS] Change the default restart-strategy to exponential-delay

2023-12-11 Thread Rui Fan
Hi Maximilian and Mason, Thanks a lot for your feedback! After an offline consultation with Max, I guess I understand your concern for now: when flink job restarts, it will make a bunch of calls to the Kubernetes API, e.g. read/write to config maps, create task managers. Currently, the default re

Re: [DISCUSS] Change the default restart-strategy to exponential-delay

2023-12-07 Thread Maximilian Michels
Hey Rui, +1 for changing the default restart strategy to exponential-delay. This is something all users eventually run into. They end up changing the restart strategy to exponential-delay. I think the current defaults are quite balanced. Restarts happen quickly enough unless there are consecutive

Re: [DISCUSS] Change the default restart-strategy to exponential-delay

2023-12-05 Thread Mason Chen
Hi Rui, Sorry for the late reply. I was suggesting that perhaps we could do some testing with Kubernetes wrt configuring values for the exponential restart strategy. We've noticed that the default strategy in 1.17 caused a lot of requests to the K8s API server for unstable deployments. However, p

Re: [DISCUSS] Change the default restart-strategy to exponential-delay

2023-11-19 Thread Rui Fan
Hi David and Mason, Thanks for your feedback! To David: > Given that the new default feels more complex than the current behavior, if we decide to do this I think it will be important to include the rationale you've shared in the documentation. Sounds make sense to me, I will add the related do

Re: [DISCUSS] Change the default restart-strategy to exponential-delay

2023-11-17 Thread Mason Chen
Hi Rui, I suppose we could do some benchmarking on what works well for the resource providers that Flink relies on e.g. Kubernetes. Based on conferences and blogs, it seems most people are relying on Kubernetes to deploy Flink and the restart strategy has a large dependency on how well Kubernetes

Re: [DISCUSS] Change the default restart-strategy to exponential-delay

2023-11-17 Thread David Anderson
Rui, I don't have any direct experience with this topic, but given the motivation you shared, the proposal makes sense to me. Given that the new default feels more complex than the current behavior, if we decide to do this I think it will be important to include the rationale you've shared in the