Hi Gary, Thanks for the clarification!
When we upgrade to a new Flink release, we don't start with a default flink-conf.yaml but upgrade our existing tooling and configuration. Therefore we notice this issue as part of the upgrade to 1.10, and not when we upgraded to 1.9. I would expect many other users to be in the same camp, and therefore consider making these regressions a blocker for 1.10? Thanks, Thomas On Wed, Feb 5, 2020 at 4:53 AM Gary Yao <g...@apache.org> wrote: > > also notice that the exception causing a restart is no longer displayed > > in the UI, which is probably related? > > Yes, this is also related to the new scheduler. I created FLINK-15917 [1] > to > track this. Moreover, I created a ticket about the uptime metric not > resetting > [2]. Both issues already exist in 1.9 if > "jobmanager.execution.failover-strategy" is set to "region", which is the > case > in the default flink-conf.yaml. > > In 1.9, unsetting "jobmanager.execution.failover-strategy" was enough to > fall > back to the previous behavior. > > In 1.10, you can still fall back to the previous behavior by setting > "jobmanager.scheduler: legacy" and unsetting > "jobmanager.execution.failover-strategy" in your flink-conf.yaml > > I would not consider these issues blockers since there is a workaround for > them, but of course we would like to see the new scheduler getting some > production exposure. More detailed release notes about the caveats of the > new > scheduler will be added to the user documentation. > > > > The watermark issue was > https://issues.apache.org/jira/browse/FLINK-14470 > > This should be fixed now [3]. > > > [1] https://issues.apache.org/jira/browse/FLINK-15917 > [2] https://issues.apache.org/jira/browse/FLINK-15918 > [3] https://issues.apache.org/jira/browse/FLINK-8949 > > On Wed, Feb 5, 2020 at 7:04 AM Thomas Weise <t...@apache.org> wrote: > >> Hi Gary, >> >> Thanks for the reply. >> >> --> >> >> On Tue, Feb 4, 2020 at 5:20 AM Gary Yao <g...@apache.org> wrote: >> >> > Hi Thomas, >> > >> > > 2) Was there a change in how job recovery reflects in the uptime >> metric? >> > > Didn't uptime previously reset to 0 on recovery (now it just keeps >> > > increasing)? >> > >> > The uptime is the difference between the current time and the time when >> the >> > job transitioned to RUNNING state. By default we no longer transition >> the >> > job >> > out of the RUNNING state when restarting. This has something to do with >> the >> > new scheduler which enables pipelined region failover by default [1]. >> > Actually >> > we enabled pipelined region failover already in the binary distribution >> of >> > Flink 1.9 by setting: >> > >> > jobmanager.execution.failover-strategy: region >> > >> > in the default flink-conf.yaml. Unless you have removed this config >> option >> > or >> > you are using a custom yaml, you should be seeing this behavior in Flink >> > 1.9. >> > If you do not want region failover, set >> > >> > jobmanager.execution.failover-strategy: full >> > >> > >> We are using the default (the jobmanager.execution.failover-strategy >> setting is not present in our flink config). >> >> The change in behavior I see is between the 1.9 based deployment and the >> 1.10 RC. >> >> Our 1.9 branch is here: >> https://github.com/lyft/flink/tree/release-1.9-lyft >> >> I also notice that the exception causing a restart is no longer displayed >> in the UI, which is probably related? >> >> >> > >> > > 1) Is the low watermark display in the UI still broken? >> > >> > I was not aware that this is broken. Is there an issue tracking this >> bug? >> > >> >> The watermark issue was https://issues.apache.org/jira/browse/FLINK-14470 >> >> (I don't have a good way to verify it is fixed at the moment.) >> >> Another problem with this 1.10 RC is that the checkpointAlignmentTime >> metric is missing. (I have not been able to investigate this further yet.) >> >> >> > >> > Best, >> > Gary >> > >> > [1] https://issues.apache.org/jira/browse/FLINK-14651 >> > >> > On Tue, Feb 4, 2020 at 2:56 AM Thomas Weise <t...@apache.org> wrote: >> > >> >> I opened a PR for FLINK-15868 >> >> <https://issues.apache.org/jira/browse/FLINK-15868>: >> >> https://github.com/apache/flink/pull/11006 >> >> >> >> With that change, I was able to run an application that consumes from >> >> Kinesis. >> >> >> >> I should have data tomorrow regarding the performance. >> >> >> >> Two questions/observations: >> >> >> >> 1) Is the low watermark display in the UI still broken? >> >> 2) Was there a change in how job recovery reflects in the uptime >> metric? >> >> Didn't uptime previously reset to 0 on recovery (now it just keeps >> >> increasing)? >> >> >> >> Thanks, >> >> Thomas >> >> >> >> >> >> >> >> >> >> On Mon, Feb 3, 2020 at 10:55 AM Thomas Weise <t...@apache.org> wrote: >> >> >> >> > I found another issue with the Kinesis connector: >> >> > >> >> > https://issues.apache.org/jira/browse/FLINK-15868 >> >> > >> >> > >> >> > On Mon, Feb 3, 2020 at 3:35 AM Gary Yao <g...@apache.org> wrote: >> >> > >> >> >> Hi everyone, >> >> >> >> >> >> I am hereby canceling the vote due to: >> >> >> >> >> >> FLINK-15837 >> >> >> FLINK-15840 >> >> >> >> >> >> Another RC will be created later today. >> >> >> >> >> >> Best, >> >> >> Gary >> >> >> >> >> >> On Mon, Jan 27, 2020 at 10:06 PM Gary Yao <g...@apache.org> wrote: >> >> >> >> >> >> > Hi everyone, >> >> >> > Please review and vote on the release candidate #1 for the version >> >> >> 1.10.0, >> >> >> > as follows: >> >> >> > [ ] +1, Approve the release >> >> >> > [ ] -1, Do not approve the release (please provide specific >> comments) >> >> >> > >> >> >> > >> >> >> > The complete staging area is available for your review, which >> >> includes: >> >> >> > * JIRA release notes [1], >> >> >> > * the official Apache source release and binary convenience >> releases >> >> to >> >> >> be >> >> >> > deployed to dist.apache.org [2], which are signed with the key >> with >> >> >> > fingerprint BB137807CEFBE7DD2616556710B12A1F89C115E8 [3], >> >> >> > * all artifacts to be deployed to the Maven Central Repository >> [4], >> >> >> > * source code tag "release-1.10.0-rc1" [5], >> >> >> > >> >> >> > The announcement blog post is in the works. I will update this >> voting >> >> >> > thread with a link to the pull request soon. >> >> >> > >> >> >> > The vote will be open for at least 72 hours. It is adopted by >> >> majority >> >> >> > approval, with at least 3 PMC affirmative votes. >> >> >> > >> >> >> > Thanks, >> >> >> > Yu & Gary >> >> >> > >> >> >> > [1] >> >> >> > >> >> >> >> >> >> https://issues.apache.org/jira/secure/ReleaseNote.jspa?projectId=12315522&version=12345845 >> >> >> > [2] >> https://dist.apache.org/repos/dist/dev/flink/flink-1.10.0-rc1/ >> >> >> > [3] https://dist.apache.org/repos/dist/release/flink/KEYS >> >> >> > [4] >> >> >> >> https://repository.apache.org/content/repositories/orgapacheflink-1325 >> >> >> > [5] >> https://github.com/apache/flink/releases/tag/release-1.10.0-rc1 >> >> >> > >> >> >> >> >> > >> >> >> > >> >