Hi all,
The user mail[1] has started for 13 days, and it collected
one useful suggestion.
> Given that the new default feels more complex than the current behavior,
if we decide to do this I think it will be important to include the
rationale you've shared in the documentation.
I will add the re
Thank you Rui. It makes sense to me now.
On Thu, Nov 16, 2023 at 2:57 AM Rui Fan <1996fan...@gmail.com> wrote:
> Hi all,
>
> Zhu and I had an offline discussion today. We prefer this FLIP
> focuses on improving exponential-delay and uses exponential-delay
> as the default strategy. It means this
Hi all,
Zhu and I had an offline discussion today. We prefer this FLIP
focuses on improving exponential-delay and uses exponential-delay
as the default strategy. It means this FLIP doesn't include
improvements related to fixed-delay and failover-delay, and the
second part of FLIP(Improve restartAt
Hi Zhu and Matthias:
> 3. failure counting
> Flink currently will try to recognize concurrent failures and group them
> together, which can be seen in the web UI. So how about to align the
> failure counting with the concurrent failures computing? This can make it
> more consistent and easier for
Hi Zhu, Jing and Mingliang:
Thanks for your feedback about consider exponential-delay
as the default restart-strategy, and updating the default
values of exponential-delay as well. I have started a
discussion on user, user-zh and dev mail list about it[1].
[1] https://lists.apache.org/thread/6glz
Thanks for sharing your data points.
Among a few thousand jobs (from the smallest 1 task manager and the largest
300+ task managers), I presume most of them use the default. However, the
default values we have been using were not broadly discussed but instead
based on a priori knowledge as we mana
Hi Mingliang:
Thanks you for the feedback here!
Glad to hear Netflix have made exponential-delay as the
default restart strategy. Our production(Shopee) also makes
exponential-delay as the default since May 2021, and the
current number of flink jobs far exceeds tens of thousands.
These jobs work
Thanks Rui for driving this. I just call out that making exponential-delay
the default is a good change. At Netflix, we have enabled this as the
default restart strategy 2 quarters ago and it has been working well.
Keeping it restarting indefinitely by default makes sense to me.
On Mon, Oct 16, 20
awesome! @Rui Thanks for your effort! Appreciate it!
Best regards,
Jing
On Tue, Nov 14, 2023 at 1:32 PM Rui Fan <1996fan...@gmail.com> wrote:
> Thanks a lot Zhu and Jing for the comments!
>
> Regarding concurrent failures mentioned by zhu, I am not familiar with it
> before
> and need some time
Thanks a lot Zhu and Jing for the comments!
Regarding concurrent failures mentioned by zhu, I am not familiar with it
before
and need some time to get familiar with it. So I will reply to them later.
I will give Jing an answer first:
> NIT: @Rui it would be great if you could point out the sourc
Hi Rui,
Thanks for the proposal! I agree with Zhu that any changes of the default
behaviors will have impact on users' jobs in the production environment and
it would be necessary to have users' attention to to avoid
any surprises after upgrading Flink.
@Zhu
for 1, if we change the default values
Hi Rui,
Thanks for creating this FLIP and sorry for jumping in so late into the
discussion.
The improvements of exponential-delay strategy and making it the default
strategy looks good it me in general. I have some comments for it, as well
as for the failure counting.
1. default values of exponen
I'll start voting next Monday if there isn't any other comment.
Best,
Rui
On Thu, Oct 19, 2023 at 6:59 PM Rui Fan <1996fan...@gmail.com> wrote:
> Hi Konstantin and Max,
>
> Thanks for your feedback!
>
> Sorry, I forgot to mention the default value of
> `restart-strategy.exponential-delay.max-att
Hi Konstantin and Max,
Thanks for your feedback!
Sorry, I forgot to mention the default value of
`restart-strategy.exponential-delay.max-attempts-before-reset-backoff`.
Retrying forever sounds good to me, I have added it to the FLIP:
The default value of
`restart-strategy.exponential-delay.max-
Hey Rui,
+1 for making exponential backoff the default. I agree with Konstantin
that retrying forever is a good default for exponential backoff
because oftentimes the issue will resolve eventually. The purpose of
exponential backoff is precisely to continue to retry without causing
too much load.
Hi Rui,
Thank you for this proposal and working on this. I also agree that
exponential back off makes sense as a new default in general. I think
restarting indefinitely (no max attempts) makes sense by default, though,
but of course allowing users to change is valuable.
So, overall +1.
Cheers,
16 matches
Mail list logo