I also agree with Till and Robert's proposals. 

In general I think we should not block the release based on current estimation. 
Otherwise we continuously postpone the release, it might probably occur new 
bugs for blockers, then we might probably
get stuck in such cycle to not give a final release for users in time. But that 
does not mean RC4 would be the final one, and we can reevaluate the effects in 
progress with the accumulated issues.

Regarding the performance regression, if possible we can reproduce to analysis 
the reason based on Thomas's feedback, then we can evaluate its effect.

Regarding the FLINK-18461, after syncing with Jark offline, the bug would 
effect one of three scenarios for using CDC feature, and this effected scenario 
is actually the most commonly used way by users.
My suggestion is to merge it into release-1.11 ATM since the PR already open 
for review, then let's further finalize the conclusion later. If this issue is 
the only one after RC4 going through, then another option is to cover it in 
next release-1.11.1 as Robert suggested, as we can prepare for the next minor 
release soon. If there are other blockers issues during voting and necessary to 
be resolved soon, then it is no doubt to cover all of them in next RC5.

Best,
Zhijiang


------------------------------------------------------------------
From:Till Rohrmann <trohrm...@apache.org>
Send Time:2020年7月2日(星期四) 16:46
To:dev <dev@flink.apache.org>
Cc:Zhijiang <wangzhijiang...@aliyun.com>
Subject:Re: [VOTE] Release 1.11.0, release candidate #4

I agree with Robert.

@Chesnay: The problem has probably already existed in Flink 1.10 and before 
because we cannot run jobs with eager execution calls from the web ui. I agree 
with Robert that we can/should improve our documentation in this regard, though.

@Thomas: 
1. I will update the release notes to add a short section describing that one 
needs to configure the JobManager memory. 
2. Concerning the performance regression we should look into it. I believe 
Zhijiang is very eager to learn more about your exact setup to further debug 
it. Again I agree with Robert to not block the release on it at the moment.

@Jark: How much of a problem is FLINK-18461? Will it make the CDC feature 
completely unusable or will only make a subset of the use cases to not work? If 
it is the latter, then I believe that we can document the limitations and try 
to fix it asap. Depending on the remaining testing the fix might make it into 
the 1.11.0 or the 1.11.1 release.

Cheers,
Till
On Thu, Jul 2, 2020 at 10:33 AM Robert Metzger <rmetz...@apache.org> wrote:
Thanks a lot for the thorough testing Thomas! This is really helpful!

 @Chesnay: I would not block the release on this. The web submission does
 not seem to be the documented / preferred way of job submission. It is
 unlikely to harm the beginner's experience (and they would anyways not read
 the release notes). I mention the beginner experience, because they are the
 primary audience of the examples.

 Regarding FLINK-18461 / Jark's issue: I would not block the release on
 that, but still try to get it fixed asap. It is likely that this RC doesn't
 go through (given the rate at which we are finding issues), and even if it
 goes through, we can document it as a known issue in the release
 announcement and immediately release 1.11.1.
 Blocking the release on this causes quite a bit of work for the release
 managers for rolling a new RC. Until we have understood the performance
 regression Thomas is reporting, I would keep this RC open, and keep testing.


 On Thu, Jul 2, 2020 at 8:34 AM Jark Wu <imj...@gmail.com> wrote:

 > Hi,
 >
 > I'm very sorry but we just found a blocker issue FLINK-18461 [1] in the new
 > feature of changelog source (CDC).
 > This bug will result in queries on changelog source can’t be inserted into
 > upsert sink (e.g. ES, JDBC, HBase),
 > which is a common case in production. CDC is one of the important features
 > of Table/SQL in this release,
 > so from my side, I hope we can have this fix in 1.11.0, otherwise, this is
 > a broken feature...
 >
 > Again, I am terribly sorry for delaying the release...
 >
 > Best,
 > Jark
 >
 > [1]: https://issues.apache.org/jira/browse/FLINK-18461
 >
 > On Thu, 2 Jul 2020 at 12:02, Zhijiang <wangzhijiang...@aliyun.com.invalid>
 > wrote:
 >
 > > Hi Thomas,
 > >
 > > Thanks for the efficient feedback.
 > >
 > > Regarding the suggestion of adding the release notes document, I agree
 > > with your point. Maybe we should adjust the vote template accordingly in
 > > the respective wiki to guide the following release processes.
 > >
 > > Regarding the performance regression, could you provide some more details
 > > for our better measurement or reproducing on our sides?
 > > E.g. I guess the topology only includes two vertexes source and sink?
 > > What is the parallelism for every vertex?
 > > The upstream shuffles data to the downstream via rebalance partitioner or
 > > other?
 > > The checkpoint mode is exactly-once with rocksDB state backend?
 > > The backpressure happened in this case?
 > > How much percentage regression in this case?
 > >
 > > Best,
 > > Zhijiang
 > >
 > >
 > >
 > > ------------------------------------------------------------------
 > > From:Thomas Weise <t...@apache.org>
 > > Send Time:2020年7月2日(星期四) 09:54
 > > To:dev <dev@flink.apache.org>
 > > Subject:Re: [VOTE] Release 1.11.0, release candidate #4
 > >
 > > Hi Till,
 > >
 > > Yes, we don't have the setting in flink-conf.yaml.
 > >
 > > Generally, we carry forward the existing configuration and any change to
 > > default configuration values would impact the upgrade.
 > >
 > > Yes, since it is an incompatible change I would state it in the release
 > > notes.
 > >
 > > Thanks,
 > > Thomas
 > >
 > > BTW I found a performance regression while trying to upgrade another
 > > pipeline with this RC. It is a simple Kinesis to Kinesis job. Wasn't able
 > > to pin it down yet, symptoms include increased checkpoint alignment time.
 > >
 > > On Wed, Jul 1, 2020 at 12:04 AM Till Rohrmann <trohrm...@apache.org>
 > > wrote:
 > >
 > > > Hi Thomas,
 > > >
 > > > just to confirm: When starting the image in local mode, then you don't
 > > have
 > > > any of the JobManager memory configuration settings configured in the
 > > > effective flink-conf.yaml, right? Does this mean that you have
 > explicitly
 > > > removed `jobmanager.heap.size: 1024m` from the default configuration?
 > If
 > > > this is the case, then I believe it was more of an unintentional
 > artifact
 > > > that it worked before and it has been corrected now so that one needs
 > to
 > > > specify the memory of the JM process explicitly. Do you think it would
 > > help
 > > > to explicitly state this in the release notes?
 > > >
 > > > Cheers,
 > > > Till
 > > >
 > > > On Wed, Jul 1, 2020 at 7:01 AM Thomas Weise <t...@apache.org> wrote:
 > > >
 > > > > Thanks for preparing another RC!
 > > > >
 > > > > As mentioned in the previous RC thread, it would be super helpful if
 > > the
 > > > > release notes that are part of the documentation can be included [1].
 > > > It's
 > > > > a significant time-saver to have read those first.
 > > > >
 > > > > I found one more non-backward compatible change that would be worth
 > > > > addressing/mentioning:
 > > > >
 > > > > It is now necessary to configure the jobmanager heap size in
 > > > > flink-conf.yaml (with either jobmanager.heap.size
 > > > > or jobmanager.memory.heap.size). Why would I not want to do that
 > > anyways?
 > > > > Well, we set it dynamically for a cluster deployment via the
 > > > > flinkk8soperator, but the container image can also be used for
 > testing
 > > > with
 > > > > local mode (./bin/jobmanager.sh start-foreground local). That will
 > fail
 > > > if
 > > > > the heap wasn't configured and that's how I noticed it.
 > > > >
 > > > > Thanks,
 > > > > Thomas
 > > > >
 > > > > [1]
 > > > >
 > > > >
 > > >
 > >
 > https://ci.apache.org/projects/flink/flink-docs-release-1.11/release-notes/flink-1.11.html
 > > > >
 > > > > On Tue, Jun 30, 2020 at 3:18 AM Zhijiang <wangzhijiang...@aliyun.com
 > > > > .invalid>
 > > > > wrote:
 > > > >
 > > > > > Hi everyone,
 > > > > >
 > > > > > Please review and vote on the release candidate #4 for the version
 > > > > 1.11.0,
 > > > > > as follows:
 > > > > > [ ] +1, Approve the release
 > > > > > [ ] -1, Do not approve the release (please provide specific
 > comments)
 > > > > >
 > > > > > The complete staging area is available for your review, which
 > > includes:
 > > > > > * JIRA release notes [1],
 > > > > > * the official Apache source release and binary convenience
 > releases
 > > to
 > > > > be
 > > > > > deployed to dist.apache.org [2], which are signed with the key
 > with
 > > > > > fingerprint 2DA85B93244FDFA19A6244500653C0A2CEA00D0E [3],
 > > > > > * all artifacts to be deployed to the Maven Central Repository [4],
 > > > > > * source code tag "release-1.11.0-rc4" [5],
 > > > > > * website pull request listing the new release and adding
 > > announcement
 > > > > > blog post [6].
 > > > > >
 > > > > > The vote will be open for at least 72 hours. It is adopted by
 > > majority
 > > > > > approval, with at least 3 PMC affirmative votes.
 > > > > >
 > > > > > Thanks,
 > > > > > Release Manager
 > > > > >
 > > > > > [1]
 > > > > >
 > > > >
 > > >
 > >
 > https://issues.apache.org/jira/secure/ReleaseNote.jspa?projectId=12315522&version=12346364
 > > > > > [2] https://dist.apache.org/repos/dist/dev/flink/flink-1.11.0-rc4/
 > > > > > [3] https://dist.apache.org/repos/dist/release/flink/KEYS
 > > > > > [4]
 > > > > >
 > > >
 > https://repository.apache.org/content/repositories/orgapacheflink-1377/
 > > > > > [5]
 > https://github.com/apache/flink/releases/tag/release-1.11.0-rc4
 > > > > > [6] https://github.com/apache/flink-web/pull/352
 > > > > >
 > > > > >
 > > > >
 > > >
 > >
 > >
 >

Reply via email to