Objet : Re: [DISCUSS] FLIP-185: Shorter heartbeat timeout and interval default
values
Thanks for your inputs Gen and Arnaud.
I do agree with you, Gen, that we need better guidance for our users on when to
change the heartbeat configuration. I think this should happen in any case. I
am, however
> Hbase). If most of those calls are very fast, sometimes when the system
>> is
>> > under heavy load they may block more than a few seconds, and having our
>> app
>> > killed because of a short timeout is not an option.
>> >
>> >
>> >
>
> Arnaud
>
>
>
>
>
> *De :* Gen Luo mailto:luogen...@gmail.com>>
> *Envoyé :* jeudi 22 juillet 2021 05:46
> *À :* Till Rohrmann mailto:trohrm...@apache.org>>
> *Cc :* Yang Wang mailto:danrtsey...@gmail.com>>;
hould have no impact on heartbeats, but from experience, it
> > really does)
> >
> >
> >
> > Cheers,
> >
> > Arnaud
> >
> >
> >
> >
> >
> > *De :* Gen Luo
> > *Envoyé :* jeudi 22 juillet 2021 05:46
> > *À :* Ti
(I
> understand that normally, as user code is not a JVM-blocking activity such
> as a GC, it should have no impact on heartbeats, but from experience, it
> really does)
>
>
>
> Cheers,
>
> Arnaud
>
>
>
>
>
> *De :* Gen Luo
> *Envoyé :* jeudi 22 juillet 20
)
Cheers,
Arnaud
De : Gen Luo
Envoyé : jeudi 22 juillet 2021 05:46
À : Till Rohrmann
Cc : Yang Wang ; dev ; user
Objet : Re: [DISCUSS] FLIP-185: Shorter heartbeat timeout and interval default
values
Hi,
Thanks for driving this @Till Rohrmann<mailto:trohrm...@apache.org> . I would
give
Hi,
Thanks for driving this @Till Rohrmann . I would
give +1 on reducing the heartbeat timeout and interval, though I'm not sure
if 15s and 3s would be enough either.
IMO, except for the standalone cluster, where the heartbeat mechanism in
Flink is totally relied, reducing the heartbeat can also
Thanks for sharing these insights.
I think it is no longer true that the ResourceManager notifies the
JobMaster about lost TaskExecutors. See FLINK-23216 [1] for more details.
Given the GC pauses, would you then be ok with decreasing the heartbeat
timeout to 20 seconds? This should give enough ti
Thanks @Till Rohrmann for starting this discussion
Firstly, I try to understand the benefit of shorter heartbeat timeout.
IIUC, it will make the JobManager aware of
TaskManager faster. However, it seems that only the standalone cluster
could benefit from this. For Yarn and
native Kubernetes depl
+1 to this change!
When I was working on the reactive mode blog post [1] I also ran into this
issue, leading to a poor "out of the box" experience when scaling down.
For my experiments, I've chosen a timeout of 8 seconds, and the cluster has
been running for 76 days (so far) on Kubernetes.
I also
Hi everyone,
Since Flink 1.5 we have the same heartbeat timeout and interval default
values that are defined as heartbeat.timeout: 50s and heartbeat.interval:
10s. These values were mainly chosen to compensate for lengthy GC pauses
and blocking operations that were executed in the main threads of
11 matches
Mail list logo