Hi Till, I'd like to revive this thread since 1.9.0 has been released.
IMHO we already reached a consensus on JIRA and if you can review the pull request we hopefully address the issue in next release. Best, tison. Zili Chen <wander4...@gmail.com> 于2019年7月29日周一 下午11:05写道: > Hi Till, > > Thanks for your explanation. Let's pick up this thread in 1.10 developing. > > Best, > tison. > > > Till Rohrmann <trohrm...@apache.org> 于2019年7月29日周一 下午9:12写道: > >> Hi Tison, >> >> I would consider this a new feature and as such it won't be possible to >> include it in the 1.9.0 release since the feature freeze has been passed. >> We might target 1.10, though. >> >> Cheers, >> Till >> >> On Mon, Jul 29, 2019 at 3:01 AM Zili Chen <wander4...@gmail.com> wrote: >> >> > Hi committers, >> > >> > Now that we have an ongoing pr[1] to this JIRA, we need a committer >> > to push this thread forward. It would be glad to see this issue fixed >> > in 1.9.0. >> > >> > Best, >> > tison. >> > >> > [1] https://github.com/apache/flink/pull/9158 >> > >> > >> > 未来阳光 <2217232...@qq.com> 于2019年7月23日周二 下午9:28写道: >> > >> > > Ok, If you have any suggestions, we can talk aobut the details under >> > > FLINK-10052. >> > > >> > > >> > > Best. >> > > >> > > >> > > ------------------ 原始邮件 ------------------ >> > > 发件人: "Till Rohrmann"<trohrm...@apache.org>; >> > > 发送时间: 2019年7月23日(星期二) 晚上9:19 >> > > 收件人: "dev"<dev@flink.apache.org>; >> > > >> > > 主题: Re: [DISSCUSS] Tolerate temporarily suspended ZooKeeper >> connections >> > > >> > > >> > > >> > > Hi Lamber-Ken, >> > > >> > > thanks for starting this discussion. I think there is benefit of not >> > > directly losing leadership if the ZooKeeper connection goes into the >> > > SUSPENDED state. In particular if we can guarantee that there is only >> a >> > > single JobMaster, it might make sense to not overly eagerly give up >> > > leadership. I would suggest to continue the technical discussion on >> the >> > > JIRA issue thread since it already contains a good amount of details. >> > > >> > > Cheers, >> > > Till >> > > >> > > On Sat, Jul 20, 2019 at 12:55 PM QQ邮箱 <2217232...@qq.com> wrote: >> > > >> > > > Hi All, >> > > > >> > > > Desc >> > > > We deploy flink streaming jobs on hadoop cluster on per-job model >> and >> > use >> > > > zookeeper as HighAvailabilityService, but we found that flink job >> will >> > > > restart because of the network disconnected temporarily between >> > > jobmanager >> > > > and zookeeper.So we analyze this problem deeply. Flink JobManager >> use >> > > > curator's `LeaderLatch` to maintain the leadership. When network >> > > > disconncet, the `LeaderLatch` will change leadership to false >> directly. >> > > We >> > > > think it's too brutally that many flink longrunning jobs will >> restart >> > > > because of the network shake.Instead of directly revoking the >> > leadership >> > > > upon a SUSPENDED ZooKeeper connection, it would be better to wait >> until >> > > the >> > > > ZooKeeper connection is LOST. >> > > > >> > > > Here're two jiras about the problem, FLINK-10052 and FLINK-13189, >> they >> > > are >> > > > duplicate. Thanks to @Elias Levy told us that FLINK-13189, so close >> > > > FLINK-13189. >> > > > >> > > > Solution >> > > > Back to this problem, there're two ways to solve this currently, >> one is >> > > > rewrite LeaderLatch#handleStateChange method, another is upgrade >> > > > curator-4.2.0. The first way is hackly but right, the second way >> need >> > to >> > > > consider the >> > > > compatibility. For more detail, please see FLINK-10052. >> > > > >> > > > Hope >> > > > The FLINK-10052 was reported at 2018-08-03(about a year ago), so we >> > hope >> > > > this problem can fix as soon as possible. >> > > > btw, thanks @TisonKun for talking about this problem and review pr. >> > > > >> > > > Links >> > > > FLINK-10052 https://issues.apache.org/jira/browse/FLINK-10052 < >> > > > https://issues.apache.org/jira/browse/FLINK-10052> >> > > > FLINK-13189 https://issues.apache.org/jira/browse/FLINK-13189 < >> > > > https://issues.apache.org/jira/browse/FLINK-13189> >> > > > >> > > > Any suggestion is welcome, what do you think? >> > > > >> > > > Best, lamber-ken. >> > >> >