Hi Gary,

Thanks for the clarification!

When we upgrade to a new Flink release, we don't start with a default
flink-conf.yaml but upgrade our existing tooling and configuration.
Therefore we notice this issue as part of the upgrade to 1.10, and not when
we upgraded to 1.9.

I would expect many other users to be in the same camp, and therefore
consider making these regressions a blocker for 1.10?

Thanks,
Thomas


On Wed, Feb 5, 2020 at 4:53 AM Gary Yao <g...@apache.org> wrote:

> > also notice that the exception causing a restart is no longer displayed
> > in the UI, which is probably related?
>
> Yes, this is also related to the new scheduler. I created FLINK-15917 [1]
> to
> track this. Moreover, I created a ticket about the uptime metric not
> resetting
> [2]. Both issues already exist in 1.9 if
> "jobmanager.execution.failover-strategy" is set to "region", which is the
> case
> in the default flink-conf.yaml.
>
> In 1.9, unsetting "jobmanager.execution.failover-strategy" was enough to
> fall
> back to the previous behavior.
>
> In 1.10, you can still fall back to the previous behavior by setting
> "jobmanager.scheduler: legacy" and unsetting
> "jobmanager.execution.failover-strategy" in your flink-conf.yaml
>
> I would not consider these issues blockers since there is a workaround for
> them, but of course we would like to see the new scheduler getting some
> production exposure. More detailed release notes about the caveats of the
> new
> scheduler will be added to the user documentation.
>
>
> > The watermark issue was
> https://issues.apache.org/jira/browse/FLINK-14470
>
> This should be fixed now [3].
>
>
> [1] https://issues.apache.org/jira/browse/FLINK-15917
> [2] https://issues.apache.org/jira/browse/FLINK-15918
> [3] https://issues.apache.org/jira/browse/FLINK-8949
>
> On Wed, Feb 5, 2020 at 7:04 AM Thomas Weise <t...@apache.org> wrote:
>
>> Hi Gary,
>>
>> Thanks for the reply.
>>
>> -->
>>
>> On Tue, Feb 4, 2020 at 5:20 AM Gary Yao <g...@apache.org> wrote:
>>
>> > Hi Thomas,
>> >
>> > > 2) Was there a change in how job recovery reflects in the uptime
>> metric?
>> > > Didn't uptime previously reset to 0 on recovery (now it just keeps
>> > > increasing)?
>> >
>> > The uptime is the difference between the current time and the time when
>> the
>> > job transitioned to RUNNING state. By default we no longer transition
>> the
>> > job
>> > out of the RUNNING state when restarting. This has something to do with
>> the
>> > new scheduler which enables pipelined region failover by default [1].
>> > Actually
>> > we enabled pipelined region failover already in the binary distribution
>> of
>> > Flink 1.9 by setting:
>> >
>> >     jobmanager.execution.failover-strategy: region
>> >
>> > in the default flink-conf.yaml. Unless you have removed this config
>> option
>> > or
>> > you are using a custom yaml, you should be seeing this behavior in Flink
>> > 1.9.
>> > If you do not want region failover, set
>> >
>> >     jobmanager.execution.failover-strategy: full
>> >
>> >
>> We are using the default (the jobmanager.execution.failover-strategy
>> setting is not present in our flink config).
>>
>> The change in behavior I see is between the 1.9 based deployment and the
>> 1.10 RC.
>>
>> Our 1.9 branch is here:
>> https://github.com/lyft/flink/tree/release-1.9-lyft
>>
>> I also notice that the exception causing a restart is no longer displayed
>> in the UI, which is probably related?
>>
>>
>> >
>> > > 1) Is the low watermark display in the UI still broken?
>> >
>> > I was not aware that this is broken. Is there an issue tracking this
>> bug?
>> >
>>
>> The watermark issue was https://issues.apache.org/jira/browse/FLINK-14470
>>
>> (I don't have a good way to verify it is fixed at the moment.)
>>
>> Another problem with this 1.10 RC is that the checkpointAlignmentTime
>> metric is missing. (I have not been able to investigate this further yet.)
>>
>>
>> >
>> > Best,
>> > Gary
>> >
>> > [1] https://issues.apache.org/jira/browse/FLINK-14651
>> >
>> > On Tue, Feb 4, 2020 at 2:56 AM Thomas Weise <t...@apache.org> wrote:
>> >
>> >> I opened a PR for FLINK-15868
>> >> <https://issues.apache.org/jira/browse/FLINK-15868>:
>> >> https://github.com/apache/flink/pull/11006
>> >>
>> >> With that change, I was able to run an application that consumes from
>> >> Kinesis.
>> >>
>> >> I should have data tomorrow regarding the performance.
>> >>
>> >> Two questions/observations:
>> >>
>> >> 1) Is the low watermark display in the UI still broken?
>> >> 2) Was there a change in how job recovery reflects in the uptime
>> metric?
>> >> Didn't uptime previously reset to 0 on recovery (now it just keeps
>> >> increasing)?
>> >>
>> >> Thanks,
>> >> Thomas
>> >>
>> >>
>> >>
>> >>
>> >> On Mon, Feb 3, 2020 at 10:55 AM Thomas Weise <t...@apache.org> wrote:
>> >>
>> >> > I found another issue with the Kinesis connector:
>> >> >
>> >> > https://issues.apache.org/jira/browse/FLINK-15868
>> >> >
>> >> >
>> >> > On Mon, Feb 3, 2020 at 3:35 AM Gary Yao <g...@apache.org> wrote:
>> >> >
>> >> >> Hi everyone,
>> >> >>
>> >> >> I am hereby canceling the vote due to:
>> >> >>
>> >> >>     FLINK-15837
>> >> >>     FLINK-15840
>> >> >>
>> >> >> Another RC will be created later today.
>> >> >>
>> >> >> Best,
>> >> >> Gary
>> >> >>
>> >> >> On Mon, Jan 27, 2020 at 10:06 PM Gary Yao <g...@apache.org> wrote:
>> >> >>
>> >> >> > Hi everyone,
>> >> >> > Please review and vote on the release candidate #1 for the version
>> >> >> 1.10.0,
>> >> >> > as follows:
>> >> >> > [ ] +1, Approve the release
>> >> >> > [ ] -1, Do not approve the release (please provide specific
>> comments)
>> >> >> >
>> >> >> >
>> >> >> > The complete staging area is available for your review, which
>> >> includes:
>> >> >> > * JIRA release notes [1],
>> >> >> > * the official Apache source release and binary convenience
>> releases
>> >> to
>> >> >> be
>> >> >> > deployed to dist.apache.org [2], which are signed with the key
>> with
>> >> >> > fingerprint BB137807CEFBE7DD2616556710B12A1F89C115E8 [3],
>> >> >> > * all artifacts to be deployed to the Maven Central Repository
>> [4],
>> >> >> > * source code tag "release-1.10.0-rc1" [5],
>> >> >> >
>> >> >> > The announcement blog post is in the works. I will update this
>> voting
>> >> >> > thread with a link to the pull request soon.
>> >> >> >
>> >> >> > The vote will be open for at least 72 hours. It is adopted by
>> >> majority
>> >> >> > approval, with at least 3 PMC affirmative votes.
>> >> >> >
>> >> >> > Thanks,
>> >> >> > Yu & Gary
>> >> >> >
>> >> >> > [1]
>> >> >> >
>> >> >>
>> >>
>> https://issues.apache.org/jira/secure/ReleaseNote.jspa?projectId=12315522&version=12345845
>> >> >> > [2]
>> https://dist.apache.org/repos/dist/dev/flink/flink-1.10.0-rc1/
>> >> >> > [3] https://dist.apache.org/repos/dist/release/flink/KEYS
>> >> >> > [4]
>> >> >>
>> https://repository.apache.org/content/repositories/orgapacheflink-1325
>> >> >> > [5]
>> https://github.com/apache/flink/releases/tag/release-1.10.0-rc1
>> >> >> >
>> >> >>
>> >> >
>> >>
>> >
>>
>

Reply via email to