Hi Roman, Here are the checkpoint summaries for both commits:
https://docs.google.com/presentation/d/159IVXQGXabjnYJk3oVm3UP2UW_5G-TGs_u9yzYb030I/edit#slide=id.g86d15b2fc7_0_0 The config: CheckpointConfig checkpointConfig = env.getCheckpointConfig(); checkpointConfig.setCheckpointingMode(CheckpointingMode.EXACTLY_ONCE); checkpointConfig.setCheckpointInterval(*10_000*); checkpointConfig.setMinPauseBetweenCheckpoints(*10_000*); checkpointConfig.enableExternalizedCheckpoints(DELETE_ON_CANCELLATION); checkpointConfig.setCheckpointTimeout(600_000); checkpointConfig.setMaxConcurrentCheckpoints(1); checkpointConfig.setFailOnCheckpointingErrors(true); The values marked bold when changed to *60_000* make the symptom disappear. I meanwhile also verified that with the 1.11.0 release commit. I will take a look at the sleep time issue. Thanks, Thomas On Fri, Aug 7, 2020 at 1:44 AM Roman Khachatryan <ro...@data-artisans.com> wrote: > Hi Thomas, > > Thanks for your reply! > > I think you are right, we can remove this sleep and improve > KinesisProducer. > Probably, it's snapshotState can also be sped up by forcing records flush > more often. > Do you see that 30s checkpointing duration is caused by KinesisProducer > (or maybe other operators)? > > I'd also like to understand the reason behind this increase in checkpoint > frequency. > Can you please share these values: > - execution.checkpointing.min-pause > - execution.checkpointing.max-concurrent-checkpoints > - execution.checkpointing.timeout > > And what is the "new" observed checkpoint frequency (or how many > checkpoints are created) compared to older versions? > > > On Fri, Aug 7, 2020 at 4:49 AM Thomas Weise <t...@apache.org> wrote: > >> Hi Roman, >> >> Indeed there are more frequent checkpoints with this change! The >> application was configured to checkpoint every 10s. With 1.10 ("good >> commit"), that leads to fewer completed checkpoints compared to 1.11 ("bad >> commit"). Just to be clear, the only difference between the two runs was >> the commit 355184d69a8519d29937725c8d85e8465d7e3a90 >> >> Since the sync part of checkpoints with the Kinesis producer always takes >> ~30 seconds, the 10s configured checkpoint frequency really had no effect >> before 1.11. I confirmed that both commits perform comparably by setting >> the checkpoint frequency and min pause to 60s. >> >> I still have to verify with the final 1.11.0 release commit. >> >> It's probably good to take a look at the Kinesis producer. Is it really >> necessary to have 500ms sleep time? What's responsible for the ~30s >> duration in snapshotState? >> >> As things stand it doesn't make sense to use checkpoint intervals < 30s >> when using the Kinesis producer. >> >> Thanks, >> Thomas >> >> On Sat, Aug 1, 2020 at 2:53 PM Roman Khachatryan <ro...@data-artisans.com >> > >> wrote: >> >> > Hi Thomas, >> > >> > Thanks a lot for the analysis. >> > >> > The first thing that I'd check is whether checkpoints became more >> frequent >> > with this commit (as each of them adds at least 500ms if there is at >> least >> > one not sent record, according to FlinkKinesisProducer.snapshotState). >> > >> > Can you share checkpointing statistics (1.10 vs 1.11 or last "good" vs >> > first "bad" commits)? >> > >> > On Fri, Jul 31, 2020 at 5:29 AM Thomas Weise <thomas.we...@gmail.com> >> > wrote: >> > >> > > I run git bisect and the first commit that shows the regression is: >> > > >> > > >> > > >> > >> https://github.com/apache/flink/commit/355184d69a8519d29937725c8d85e8465d7e3a90 >> > > >> > > >> > > On Thu, Jul 23, 2020 at 6:46 PM Kurt Young <ykt...@gmail.com> wrote: >> > > >> > > > From my experience, java profilers are sometimes not accurate >> enough to >> > > > find out the performance regression >> > > > root cause. In this case, I would suggest you try out intel vtune >> > > amplifier >> > > > to watch more detailed metrics. >> > > > >> > > > Best, >> > > > Kurt >> > > > >> > > > >> > > > On Fri, Jul 24, 2020 at 8:51 AM Thomas Weise <t...@apache.org> >> wrote: >> > > > >> > > > > The cause of the issue is all but clear. >> > > > > >> > > > > Previously I had mentioned that there is no suspect change to the >> > > Kinesis >> > > > > connector and that I had reverted the AWS SDK change to no effect. >> > > > > >> > > > > https://issues.apache.org/jira/browse/FLINK-17496 actually fixed >> > > another >> > > > > regression in the previous release and is present before and >> after. >> > > > > >> > > > > I repeated the run with 1.11.0 core and downgraded the entire >> Kinesis >> > > > > connector to 1.10.1: Nothing changes, i.e. the regression is still >> > > > present. >> > > > > Therefore we will need to look elsewhere for the root cause. >> > > > > >> > > > > Regarding the time spent in snapshotState, repeat runs reveal a >> wide >> > > > range >> > > > > for both versions, 1.10 and 1.11. So again this is nothing >> pointing >> > to >> > > a >> > > > > root cause. >> > > > > >> > > > > At this point, I have no ideas remaining other than doing a >> bisect to >> > > > find >> > > > > the culprit. Any other suggestions? >> > > > > >> > > > > Thomas >> > > > > >> > > > > >> > > > > On Thu, Jul 16, 2020 at 9:19 PM Zhijiang < >> wangzhijiang...@aliyun.com >> > > > > .invalid> >> > > > > wrote: >> > > > > >> > > > > > Hi Thomas, >> > > > > > >> > > > > > Thanks for your further profiling information and glad to see we >> > > > already >> > > > > > finalized the location to cause the regression. >> > > > > > Actually I was also suspicious of the point of #snapshotState in >> > > > previous >> > > > > > discussions since it indeed cost much time to block normal >> operator >> > > > > > processing. >> > > > > > >> > > > > > Based on your below feedback, the sleep time during >> #snapshotState >> > > > might >> > > > > > be the main concern, and I also digged into the implementation >> of >> > > > > > FlinkKinesisProducer#snapshotState. >> > > > > > while (producer.getOutstandingRecordsCount() > 0) { >> > > > > > producer.flush(); >> > > > > > try { >> > > > > > Thread.sleep(500); >> > > > > > } catch (InterruptedException e) { >> > > > > > LOG.warn("Flushing was interrupted."); >> > > > > > break; >> > > > > > } >> > > > > > } >> > > > > > It seems that the sleep time is mainly affected by the internal >> > > > > operations >> > > > > > inside KinesisProducer implementation provided by amazonaws, >> which >> > I >> > > am >> > > > > not >> > > > > > quite familiar with. >> > > > > > But I noticed there were two upgrades related to it in >> > > release-1.11.0. >> > > > > One >> > > > > > is for upgrading amazon-kinesis-producer to 0.14.0 [1] and >> another >> > is >> > > > for >> > > > > > upgrading aws-sdk-version to 1.11.754 [2]. >> > > > > > You mentioned that you already reverted the SDK upgrade to >> verify >> > no >> > > > > > changes. Did you also revert the [1] to verify? >> > > > > > [1] https://issues.apache.org/jira/browse/FLINK-17496 >> > > > > > [2] https://issues.apache.org/jira/browse/FLINK-14881 >> > > > > > >> > > > > > Best, >> > > > > > Zhijiang >> > > > > > >> ------------------------------------------------------------------ >> > > > > > From:Thomas Weise <t...@apache.org> >> > > > > > Send Time:2020年7月17日(星期五) 05:29 >> > > > > > To:dev <dev@flink.apache.org> >> > > > > > Cc:Zhijiang <wangzhijiang...@aliyun.com>; Stephan Ewen < >> > > > se...@apache.org >> > > > > >; >> > > > > > Arvid Heise <ar...@ververica.com>; Aljoscha Krettek < >> > > > aljos...@apache.org >> > > > > > >> > > > > > Subject:Re: Kinesis Performance Issue (was [VOTE] Release >> 1.11.0, >> > > > release >> > > > > > candidate #4) >> > > > > > >> > > > > > Sorry for the delay. >> > > > > > >> > > > > > I confirmed that the regression is due to the sink >> (unsurprising, >> > > since >> > > > > > another job with the same consumer, but not the producer, runs >> as >> > > > > > expected). >> > > > > > >> > > > > > As promised I did CPU profiling on the problematic application, >> > which >> > > > > gives >> > > > > > more insight into the regression [1] >> > > > > > >> > > > > > The screenshots show that the average time for snapshotState >> > > increases >> > > > > from >> > > > > > ~9s to ~28s. The data also shows the increase in sleep time >> during >> > > > > > snapshotState. >> > > > > > >> > > > > > Does anyone, based on changes made in 1.11, have a theory why? >> > > > > > >> > > > > > I had previously looked at the changes to the Kinesis connector >> and >> > > > also >> > > > > > reverted the SDK upgrade, which did not change the situation. >> > > > > > >> > > > > > It will likely be necessary to drill into the sink / >> checkpointing >> > > > > details >> > > > > > to understand the cause of the problem. >> > > > > > >> > > > > > Let me know if anyone has specific questions that I can answer >> from >> > > the >> > > > > > profiling results. >> > > > > > >> > > > > > Thomas >> > > > > > >> > > > > > [1] >> > > > > > >> > > > > > >> > > > > >> > > > >> > > >> > >> https://docs.google.com/presentation/d/159IVXQGXabjnYJk3oVm3UP2UW_5G-TGs_u9yzYb030I/edit?usp=sharing >> > > > > > >> > > > > > On Mon, Jul 13, 2020 at 11:14 AM Thomas Weise <t...@apache.org> >> > > wrote: >> > > > > > >> > > > > > > + dev@ for visibility >> > > > > > > >> > > > > > > I will investigate further today. >> > > > > > > >> > > > > > > >> > > > > > > On Wed, Jul 8, 2020 at 4:42 AM Aljoscha Krettek < >> > > aljos...@apache.org >> > > > > >> > > > > > > wrote: >> > > > > > > >> > > > > > >> On 06.07.20 20:39, Stephan Ewen wrote: >> > > > > > >> > - Did sink checkpoint notifications change in a relevant >> > way, >> > > > for >> > > > > > >> example >> > > > > > >> > due to some Kafka issues we addressed in 1.11 (@Aljoscha >> > maybe?) >> > > > > > >> >> > > > > > >> I think that's unrelated: the Kafka fixes were isolated in >> Kafka >> > > and >> > > > > the >> > > > > > >> one bug I discovered on the way was about the Task reaper. >> > > > > > >> >> > > > > > >> >> > > > > > >> On 07.07.20 17:51, Zhijiang wrote: >> > > > > > >> > Sorry for my misunderstood of the previous information, >> > Thomas. >> > > I >> > > > > was >> > > > > > >> assuming that the sync checkpoint duration increased after >> > upgrade >> > > > as >> > > > > it >> > > > > > >> was mentioned before. >> > > > > > >> > >> > > > > > >> > If I remembered correctly, the memory state backend also >> has >> > the >> > > > > same >> > > > > > >> issue? If so, we can dismiss the rocksDB state changes. As >> the >> > > slot >> > > > > > sharing >> > > > > > >> enabled, the downstream and upstream should >> > > > > > >> > probably deployed into the same slot, then no network >> shuffle >> > > > > effect. >> > > > > > >> > >> > > > > > >> > I think we need to find out whether it has other symptoms >> > > changed >> > > > > > >> besides the performance regression to further figure out the >> > > scope. >> > > > > > >> > E.g. any metrics changes, the number of TaskManager and the >> > > number >> > > > > of >> > > > > > >> slots per TaskManager from deployment changes. >> > > > > > >> > 40% regression is really big, I guess the changes should >> also >> > be >> > > > > > >> reflected in other places. >> > > > > > >> > >> > > > > > >> > I am not sure whether we can reproduce the regression in >> our >> > AWS >> > > > > > >> environment by writing any Kinesis jobs, since there are also >> > > normal >> > > > > > >> Kinesis jobs as Thomas mentioned after upgrade. >> > > > > > >> > So it probably looks like to touch some corner case. I am >> very >> > > > > willing >> > > > > > >> to provide any help for debugging if possible. >> > > > > > >> > >> > > > > > >> > >> > > > > > >> > Best, >> > > > > > >> > Zhijiang >> > > > > > >> > >> > > > > > >> > >> > > > > > >> > >> > > ------------------------------------------------------------------ >> > > > > > >> > From:Thomas Weise <t...@apache.org> >> > > > > > >> > Send Time:2020年7月7日(星期二) 23:01 >> > > > > > >> > To:Stephan Ewen <se...@apache.org> >> > > > > > >> > Cc:Aljoscha Krettek <aljos...@apache.org>; Arvid Heise < >> > > > > > >> ar...@ververica.com>; Zhijiang <wangzhijiang...@aliyun.com> >> > > > > > >> > Subject:Re: Kinesis Performance Issue (was [VOTE] Release >> > > 1.11.0, >> > > > > > >> release candidate #4) >> > > > > > >> > >> > > > > > >> > We are deploying our apps with FlinkK8sOperator. We have >> one >> > job >> > > > > that >> > > > > > >> works as expected after the upgrade and the one discussed >> here >> > > that >> > > > > has >> > > > > > the >> > > > > > >> performance regression. >> > > > > > >> > >> > > > > > >> > "The performance regression is obvious caused by long >> duration >> > > of >> > > > > sync >> > > > > > >> checkpoint process in Kinesis sink operator, which would >> block >> > the >> > > > > > normal >> > > > > > >> data processing until back pressure the source." >> > > > > > >> > >> > > > > > >> > That's a constant. Before (1.10) and upgrade have the same >> > sync >> > > > > > >> checkpointing time. The question is what change came in with >> the >> > > > > > upgrade. >> > > > > > >> > >> > > > > > >> > >> > > > > > >> > >> > > > > > >> > On Tue, Jul 7, 2020 at 7:33 AM Stephan Ewen < >> se...@apache.org >> > > >> > > > > wrote: >> > > > > > >> > >> > > > > > >> > @Thomas Just one thing real quick: Are you using the >> > standalone >> > > > > setup >> > > > > > >> scripts (like start-cluster.sh, and the former "slaves" >> file) ? >> > > > > > >> > Be aware that this is now called "workers" because of >> avoiding >> > > > > > >> sensitive names. >> > > > > > >> > In one internal benchmark we saw quite a lot of slowdown >> > > > initially, >> > > > > > >> before seeing that the cluster was not a distributed cluster >> any >> > > > more >> > > > > > ;-) >> > > > > > >> > >> > > > > > >> > >> > > > > > >> > On Tue, Jul 7, 2020 at 9:08 AM Zhijiang < >> > > > wangzhijiang...@aliyun.com >> > > > > > >> > > > > > >> wrote: >> > > > > > >> > Thanks for this kickoff and help analysis, Stephan! >> > > > > > >> > Thanks for the further feedback and investigation, Thomas! >> > > > > > >> > >> > > > > > >> > The performance regression is obvious caused by long >> duration >> > of >> > > > > sync >> > > > > > >> checkpoint process in Kinesis sink operator, which would >> block >> > the >> > > > > > normal >> > > > > > >> data processing until back pressure the source. >> > > > > > >> > Maybe we could dig into the process of sync execution in >> > > > checkpoint. >> > > > > > >> E.g. break down the steps inside respective >> > operator#snapshotState >> > > > to >> > > > > > >> statistic which operation cost most of the time, then >> > > > > > >> > we might probably find the root cause to bring such cost. >> > > > > > >> > >> > > > > > >> > Look forward to the further progress. :) >> > > > > > >> > >> > > > > > >> > Best, >> > > > > > >> > Zhijiang >> > > > > > >> > >> > > > > > >> > >> > > ------------------------------------------------------------------ >> > > > > > >> > From:Stephan Ewen <se...@apache.org> >> > > > > > >> > Send Time:2020年7月7日(星期二) 14:52 >> > > > > > >> > To:Thomas Weise <t...@apache.org> >> > > > > > >> > Cc:Stephan Ewen <se...@apache.org>; Zhijiang < >> > > > > > >> wangzhijiang...@aliyun.com>; Aljoscha Krettek < >> > > aljos...@apache.org >> > > > >; >> > > > > > >> Arvid Heise <ar...@ververica.com> >> > > > > > >> > Subject:Re: Kinesis Performance Issue (was [VOTE] Release >> > > 1.11.0, >> > > > > > >> release candidate #4) >> > > > > > >> > >> > > > > > >> > Thank you for the digging so deeply. >> > > > > > >> > Mysterious think this regression. >> > > > > > >> > >> > > > > > >> > On Mon, Jul 6, 2020, 22:56 Thomas Weise <t...@apache.org> >> > wrote: >> > > > > > >> > @Stephan: yes, I refer to sync time in the web UI (it is >> > > unchanged >> > > > > > >> between 1.10 and 1.11 for the specific pipeline). >> > > > > > >> > >> > > > > > >> > I verified that increasing the checkpointing interval does >> not >> > > > make >> > > > > a >> > > > > > >> difference. >> > > > > > >> > >> > > > > > >> > I looked at the Kinesis connector changes since 1.10.1 and >> > don't >> > > > see >> > > > > > >> anything that could cause this. >> > > > > > >> > >> > > > > > >> > Another pipeline that is using the Kinesis consumer (but >> not >> > the >> > > > > > >> producer) performs as expected. >> > > > > > >> > >> > > > > > >> > I tried reverting the AWS SDK version change, symptoms >> remain >> > > > > > unchanged: >> > > > > > >> > >> > > > > > >> > diff --git >> a/flink-connectors/flink-connector-kinesis/pom.xml >> > > > > > >> b/flink-connectors/flink-connector-kinesis/pom.xml >> > > > > > >> > index a6abce23ba..741743a05e 100644 >> > > > > > >> > --- a/flink-connectors/flink-connector-kinesis/pom.xml >> > > > > > >> > +++ b/flink-connectors/flink-connector-kinesis/pom.xml >> > > > > > >> > @@ -33,7 +33,7 @@ under the License. >> > > > > > >> > >> > > > > > >> >> > > > > >> > > >> <artifactId>flink-connector-kinesis_${scala.binary.version}</artifactId> >> > > > > > >> > <name>flink-connector-kinesis</name> >> > > > > > >> > <properties> >> > > > > > >> > - <aws.sdk.version>1.11.754</aws.sdk.version> >> > > > > > >> > + <aws.sdk.version>1.11.603</aws.sdk.version> >> > > > > > >> > >> > > > > > >> <aws.kinesis-kcl.version>1.11.2</aws.kinesis-kcl.version> >> > > > > > >> > >> > > > > > >> <aws.kinesis-kpl.version>0.14.0</aws.kinesis-kpl.version> >> > > > > > >> > >> > > > > > >> >> > > > > > >> > > > > >> > > > >> > > >> > >> <aws.dynamodbstreams-kinesis-adapter.version>1.5.0</aws.dynamodbstreams-kinesis-adapter.version> >> > > > > > >> > >> > > > > > >> > I'm planning to take a look with a profiler next. >> > > > > > >> > >> > > > > > >> > Thomas >> > > > > > >> > >> > > > > > >> > >> > > > > > >> > On Mon, Jul 6, 2020 at 11:40 AM Stephan Ewen < >> > se...@apache.org> >> > > > > > wrote: >> > > > > > >> > Hi all! >> > > > > > >> > >> > > > > > >> > Forking this thread out of the release vote thread. >> > > > > > >> > From what Thomas describes, it really sounds like a >> > > sink-specific >> > > > > > >> issue. >> > > > > > >> > >> > > > > > >> > @Thomas: When you say sink has a long synchronous >> checkpoint >> > > time, >> > > > > you >> > > > > > >> mean the time that is shown as "sync time" on the metrics and >> > web >> > > > UI? >> > > > > > That >> > > > > > >> is not including any network buffer related operations. It is >> > > purely >> > > > > the >> > > > > > >> operator's time. >> > > > > > >> > >> > > > > > >> > Can we dig into the changes we did in sinks: >> > > > > > >> > - Kinesis version upgrade, AWS library updates >> > > > > > >> > >> > > > > > >> > - Could it be that some call (checkpoint complete) that >> was >> > > > > > >> previously (1.10) in a separate thread is not in the mailbox >> and >> > > > this >> > > > > > >> simply reduces the number of threads that do the work? >> > > > > > >> > >> > > > > > >> > - Did sink checkpoint notifications change in a relevant >> > way, >> > > > for >> > > > > > >> example due to some Kafka issues we addressed in 1.11 >> (@Aljoscha >> > > > > maybe?) >> > > > > > >> > >> > > > > > >> > Best, >> > > > > > >> > Stephan >> > > > > > >> > >> > > > > > >> > >> > > > > > >> > On Sun, Jul 5, 2020 at 7:10 AM Zhijiang < >> > > > wangzhijiang...@aliyun.com >> > > > > > .invalid> >> > > > > > >> wrote: >> > > > > > >> > Hi Thomas, >> > > > > > >> > >> > > > > > >> > Regarding [2], it has more detail infos in the Jira >> > > description >> > > > ( >> > > > > > >> https://issues.apache.org/jira/browse/FLINK-16404). >> > > > > > >> > >> > > > > > >> > I can also give some basic explanations here to dismiss >> the >> > > > > concern. >> > > > > > >> > 1. In the past, the following buffers after the barrier >> will >> > > be >> > > > > > >> cached on downstream side before alignment. >> > > > > > >> > 2. In 1.11, the upstream would not send the buffers after >> > the >> > > > > > >> barrier. When the downstream finishes the alignment, it will >> > > notify >> > > > > the >> > > > > > >> downstream of continuing sending following buffers, since it >> can >> > > > > process >> > > > > > >> them after alignment. >> > > > > > >> > 3. The only difference is that the temporary blocked >> buffers >> > > are >> > > > > > >> cached either on downstream side or on upstream side before >> > > > alignment. >> > > > > > >> > 4. The side effect would be the additional notification >> cost >> > > for >> > > > > > >> every barrier alignment. If the downstream and upstream are >> > > deployed >> > > > > in >> > > > > > >> separate TaskManager, the cost is network transport delay >> (the >> > > > effect >> > > > > > can >> > > > > > >> be ignored based on our testing with 1s checkpoint interval). >> > For >> > > > > > sharing >> > > > > > >> slot in your case, the cost is only one method call in >> > processor, >> > > > can >> > > > > be >> > > > > > >> ignored also. >> > > > > > >> > >> > > > > > >> > You mentioned "In this case, the downstream task has a >> high >> > > > > average >> > > > > > >> checkpoint duration(~30s, sync part)." This duration is not >> > > > reflecting >> > > > > > the >> > > > > > >> changes above, and it is only indicating the duration for >> > calling >> > > > > > >> `Operation.snapshotState`. >> > > > > > >> > If this duration is beyond your expectation, you can >> check >> > or >> > > > > debug >> > > > > > >> whether the source/sink operations might take more time to >> > finish >> > > > > > >> `snapshotState` in practice. E.g. you can >> > > > > > >> > make the implementation of this method as empty to >> further >> > > > verify >> > > > > > the >> > > > > > >> effect. >> > > > > > >> > >> > > > > > >> > Best, >> > > > > > >> > Zhijiang >> > > > > > >> > >> > > > > > >> > >> > > > > > >> > >> > > > ------------------------------------------------------------------ >> > > > > > >> > From:Thomas Weise <t...@apache.org> >> > > > > > >> > Send Time:2020年7月5日(星期日) 12:22 >> > > > > > >> > To:dev <dev@flink.apache.org>; Zhijiang < >> > > > > wangzhijiang...@aliyun.com >> > > > > > > >> > > > > > >> > Cc:Yingjie Cao <kevin.ying...@gmail.com> >> > > > > > >> > Subject:Re: [VOTE] Release 1.11.0, release candidate #4 >> > > > > > >> > >> > > > > > >> > Hi Zhijiang, >> > > > > > >> > >> > > > > > >> > Could you please point me to more details regarding: >> "[2]: >> > > Delay >> > > > > > send >> > > > > > >> the >> > > > > > >> > following buffers after checkpoint barrier on upstream >> side >> > > > until >> > > > > > >> barrier >> > > > > > >> > alignment on downstream side." >> > > > > > >> > >> > > > > > >> > In this case, the downstream task has a high average >> > > checkpoint >> > > > > > >> duration >> > > > > > >> > (~30s, sync part). If there was a change to hold buffers >> > > > depending >> > > > > > on >> > > > > > >> > downstream performance, could this possibly apply to this >> > case >> > > > > (even >> > > > > > >> when >> > > > > > >> > there is no shuffle that would require alignment)? >> > > > > > >> > >> > > > > > >> > Thanks, >> > > > > > >> > Thomas >> > > > > > >> > >> > > > > > >> > >> > > > > > >> > On Sat, Jul 4, 2020 at 7:39 AM Zhijiang < >> > > > > wangzhijiang...@aliyun.com >> > > > > > >> .invalid> >> > > > > > >> > wrote: >> > > > > > >> > >> > > > > > >> > > Hi Thomas, >> > > > > > >> > > >> > > > > > >> > > Thanks for the further update information. >> > > > > > >> > > >> > > > > > >> > > I guess we can dismiss the network stack changes, >> since in >> > > > your >> > > > > > >> case the >> > > > > > >> > > downstream and upstream would probably be deployed in >> the >> > > same >> > > > > > slot >> > > > > > >> > > bypassing the network data shuffle. >> > > > > > >> > > Also I guess release-1.11 will not bring general >> > performance >> > > > > > >> regression in >> > > > > > >> > > runtime engine, as we also did the performance testing >> for >> > > all >> > > > > > >> general >> > > > > > >> > > cases by [1] in real cluster before and the testing >> > results >> > > > > should >> > > > > > >> fit the >> > > > > > >> > > expectation. But we indeed did not test the specific >> > source >> > > > and >> > > > > > sink >> > > > > > >> > > connectors yet as I known. >> > > > > > >> > > >> > > > > > >> > > Regarding your performance regression with 40%, I >> wonder >> > it >> > > is >> > > > > > >> probably >> > > > > > >> > > related to specific source/sink changes (e.g. kinesis) >> or >> > > > > > >> environment >> > > > > > >> > > issues with corner case. >> > > > > > >> > > If possible, it would be helpful to further locate >> whether >> > > the >> > > > > > >> regression >> > > > > > >> > > is caused by kinesis, by replacing the kinesis source & >> > sink >> > > > and >> > > > > > >> keeping >> > > > > > >> > > the others same. >> > > > > > >> > > >> > > > > > >> > > As you said, it would be efficient to contact with you >> > > > directly >> > > > > > >> next week >> > > > > > >> > > to further discuss this issue. And we are >> willing/eager to >> > > > > provide >> > > > > > >> any help >> > > > > > >> > > to resolve this issue soon. >> > > > > > >> > > >> > > > > > >> > > Besides that, I guess this issue should not be the >> blocker >> > > for >> > > > > the >> > > > > > >> > > release, since it is probably a corner case based on >> the >> > > > current >> > > > > > >> analysis. >> > > > > > >> > > If we really conclude anything need to be resolved >> after >> > the >> > > > > final >> > > > > > >> > > release, then we can also make the next minor >> > release-1.11.1 >> > > > > come >> > > > > > >> soon. >> > > > > > >> > > >> > > > > > >> > > [1] https://issues.apache.org/jira/browse/FLINK-18433 >> > > > > > >> > > >> > > > > > >> > > Best, >> > > > > > >> > > Zhijiang >> > > > > > >> > > >> > > > > > >> > > >> > > > > > >> > > >> > > > > ------------------------------------------------------------------ >> > > > > > >> > > From:Thomas Weise <t...@apache.org> >> > > > > > >> > > Send Time:2020年7月4日(星期六) 12:26 >> > > > > > >> > > To:dev <dev@flink.apache.org>; Zhijiang < >> > > > > > wangzhijiang...@aliyun.com >> > > > > > >> > >> > > > > > >> > > Cc:Yingjie Cao <kevin.ying...@gmail.com> >> > > > > > >> > > Subject:Re: [VOTE] Release 1.11.0, release candidate #4 >> > > > > > >> > > >> > > > > > >> > > Hi Zhijiang, >> > > > > > >> > > >> > > > > > >> > > It will probably be best if we connect next week and >> > discuss >> > > > the >> > > > > > >> issue >> > > > > > >> > > directly since this could be quite difficult to >> reproduce. >> > > > > > >> > > >> > > > > > >> > > Before the testing result on our side comes out for >> your >> > > > > > respective >> > > > > > >> job >> > > > > > >> > > case, I have some other questions to confirm for >> further >> > > > > analysis: >> > > > > > >> > > - How much percentage regression you found after >> > > > switching >> > > > > to >> > > > > > >> 1.11? >> > > > > > >> > > >> > > > > > >> > > ~40% throughput decline >> > > > > > >> > > >> > > > > > >> > > - Are there any network bottleneck in your >> cluster? >> > > E.g. >> > > > > the >> > > > > > >> network >> > > > > > >> > > bandwidth is full caused by other jobs? If so, it might >> > have >> > > > > more >> > > > > > >> effects >> > > > > > >> > > by above [2] >> > > > > > >> > > >> > > > > > >> > > The test runs on a k8s cluster that is also used for >> other >> > > > > > >> production jobs. >> > > > > > >> > > There is no reason be believe network is the >> bottleneck. >> > > > > > >> > > >> > > > > > >> > > - Did you adjust the default network buffer >> setting? >> > > E.g. >> > > > > > >> > > "taskmanager.network.memory.floating-buffers-per-gate" >> or >> > > > > > >> > > "taskmanager.network.memory.buffers-per-channel" >> > > > > > >> > > >> > > > > > >> > > The job is using the defaults, i.e we don't configure >> the >> > > > > > settings. >> > > > > > >> If you >> > > > > > >> > > want me to try specific settings in the hope that it >> will >> > > help >> > > > > to >> > > > > > >> isolate >> > > > > > >> > > the issue please let me know. >> > > > > > >> > > >> > > > > > >> > > - I guess the topology has three vertexes >> > > > "KinesisConsumer >> > > > > -> >> > > > > > >> Chained >> > > > > > >> > > FlatMap -> KinesisProducer", and the partition mode for >> > > > > > >> "KinesisConsumer -> >> > > > > > >> > > FlatMap" and "FlatMap->KinesisProducer" are both >> > "forward"? >> > > If >> > > > > so, >> > > > > > >> the edge >> > > > > > >> > > connection is one-to-one, not all-to-all, then the >> above >> > > > [1][2] >> > > > > > >> should no >> > > > > > >> > > effects in theory with default network buffer setting. >> > > > > > >> > > >> > > > > > >> > > There are only 2 vertices and the edge is "forward". >> > > > > > >> > > >> > > > > > >> > > - By slot sharing, I guess these three vertex >> > > parallelism >> > > > > task >> > > > > > >> would >> > > > > > >> > > probably be deployed into the same slot, then the data >> > > shuffle >> > > > > is >> > > > > > >> by memory >> > > > > > >> > > queue, not network stack. If so, the above [2] should >> no >> > > > effect. >> > > > > > >> > > >> > > > > > >> > > Yes, vertices share slots. >> > > > > > >> > > >> > > > > > >> > > - I also saw some Jira changes for kinesis in this >> > > > release, >> > > > > > >> could you >> > > > > > >> > > confirm that these changes would not effect the >> > performance? >> > > > > > >> > > >> > > > > > >> > > I will need to take a look. 1.10 already had a >> regression >> > > > > > >> introduced by the >> > > > > > >> > > Kinesis producer update. >> > > > > > >> > > >> > > > > > >> > > >> > > > > > >> > > Thanks, >> > > > > > >> > > Thomas >> > > > > > >> > > >> > > > > > >> > > >> > > > > > >> > > On Thu, Jul 2, 2020 at 11:46 PM Zhijiang < >> > > > > > >> wangzhijiang...@aliyun.com >> > > > > > >> > > .invalid> >> > > > > > >> > > wrote: >> > > > > > >> > > >> > > > > > >> > > > Hi Thomas, >> > > > > > >> > > > >> > > > > > >> > > > Thanks for your reply with rich information! >> > > > > > >> > > > >> > > > > > >> > > > We are trying to reproduce your case in our cluster >> to >> > > > further >> > > > > > >> verify it, >> > > > > > >> > > > and @Yingjie Cao is working on it now. >> > > > > > >> > > > As we have not kinesis consumer and producer >> > internally, >> > > so >> > > > > we >> > > > > > >> will >> > > > > > >> > > > construct the common source and sink instead in the >> case >> > > of >> > > > > > >> backpressure. >> > > > > > >> > > > >> > > > > > >> > > > Firstly, we can dismiss the rockdb factor in this >> > release, >> > > > > since >> > > > > > >> you also >> > > > > > >> > > > mentioned that "filesystem leads to same symptoms". >> > > > > > >> > > > >> > > > > > >> > > > Secondly, if my understanding is right, you emphasis >> > that >> > > > the >> > > > > > >> regression >> > > > > > >> > > > only exists for the jobs with low checkpoint interval >> > > (10s). >> > > > > > >> > > > Based on that, I have two suspicions with the network >> > > > related >> > > > > > >> changes in >> > > > > > >> > > > this release: >> > > > > > >> > > > - [1]: Limited the maximum backlog value (default >> > 10) >> > > in >> > > > > > >> subpartition >> > > > > > >> > > > queue. >> > > > > > >> > > > - [2]: Delay send the following buffers after >> > > checkpoint >> > > > > > >> barrier on >> > > > > > >> > > > upstream side until barrier alignment on downstream >> > side. >> > > > > > >> > > > >> > > > > > >> > > > These changes are motivated for reducing the >> in-flight >> > > > buffers >> > > > > > to >> > > > > > >> speedup >> > > > > > >> > > > checkpoint especially in the case of backpressure. >> > > > > > >> > > > In theory they should have very minor performance >> effect >> > > and >> > > > > > >> actually we >> > > > > > >> > > > also tested in cluster to verify within expectation >> > before >> > > > > > >> merging them, >> > > > > > >> > > > but maybe there are other corner cases we have not >> > > thought >> > > > of >> > > > > > >> before. >> > > > > > >> > > > >> > > > > > >> > > > Before the testing result on our side comes out for >> your >> > > > > > >> respective job >> > > > > > >> > > > case, I have some other questions to confirm for >> further >> > > > > > analysis: >> > > > > > >> > > > - How much percentage regression you found after >> > > > > switching >> > > > > > >> to 1.11? >> > > > > > >> > > > - Are there any network bottleneck in your >> cluster? >> > > > E.g. >> > > > > > the >> > > > > > >> network >> > > > > > >> > > > bandwidth is full caused by other jobs? If so, it >> might >> > > have >> > > > > > more >> > > > > > >> effects >> > > > > > >> > > > by above [2] >> > > > > > >> > > > - Did you adjust the default network buffer >> > setting? >> > > > E.g. >> > > > > > >> > > > >> "taskmanager.network.memory.floating-buffers-per-gate" >> > or >> > > > > > >> > > > "taskmanager.network.memory.buffers-per-channel" >> > > > > > >> > > > - I guess the topology has three vertexes >> > > > > "KinesisConsumer >> > > > > > -> >> > > > > > >> > > Chained >> > > > > > >> > > > FlatMap -> KinesisProducer", and the partition mode >> for >> > > > > > >> "KinesisConsumer >> > > > > > >> > > -> >> > > > > > >> > > > FlatMap" and "FlatMap->KinesisProducer" are both >> > > "forward"? >> > > > If >> > > > > > >> so, the >> > > > > > >> > > edge >> > > > > > >> > > > connection is one-to-one, not all-to-all, then the >> above >> > > > > [1][2] >> > > > > > >> should no >> > > > > > >> > > > effects in theory with default network buffer >> setting. >> > > > > > >> > > > - By slot sharing, I guess these three vertex >> > > > parallelism >> > > > > > >> task would >> > > > > > >> > > > probably be deployed into the same slot, then the >> data >> > > > shuffle >> > > > > > is >> > > > > > >> by >> > > > > > >> > > memory >> > > > > > >> > > > queue, not network stack. If so, the above [2] >> should no >> > > > > effect. >> > > > > > >> > > > - I also saw some Jira changes for kinesis in >> this >> > > > > release, >> > > > > > >> could you >> > > > > > >> > > > confirm that these changes would not effect the >> > > performance? >> > > > > > >> > > > >> > > > > > >> > > > Best, >> > > > > > >> > > > Zhijiang >> > > > > > >> > > > >> > > > > > >> > > > >> > > > > > >> > > > >> > > > > > >> ------------------------------------------------------------------ >> > > > > > >> > > > From:Thomas Weise <t...@apache.org> >> > > > > > >> > > > Send Time:2020年7月3日(星期五) 01:07 >> > > > > > >> > > > To:dev <dev@flink.apache.org>; Zhijiang < >> > > > > > >> wangzhijiang...@aliyun.com> >> > > > > > >> > > > Subject:Re: [VOTE] Release 1.11.0, release candidate >> #4 >> > > > > > >> > > > >> > > > > > >> > > > Hi Zhijiang, >> > > > > > >> > > > >> > > > > > >> > > > The performance degradation manifests in backpressure >> > > which >> > > > > > leads >> > > > > > >> to >> > > > > > >> > > > growing backlog in the source. I switched a few times >> > > > between >> > > > > > >> 1.10 and >> > > > > > >> > > 1.11 >> > > > > > >> > > > and the behavior is consistent. >> > > > > > >> > > > >> > > > > > >> > > > The DAG is: >> > > > > > >> > > > >> > > > > > >> > > > KinesisConsumer -> (Flat Map, Flat Map, Flat Map) >> > > -------- >> > > > > > >> forward >> > > > > > >> > > > ---------> KinesisProducer >> > > > > > >> > > > >> > > > > > >> > > > Parallelism: 160 >> > > > > > >> > > > No shuffle/rebalance. >> > > > > > >> > > > >> > > > > > >> > > > Checkpointing config: >> > > > > > >> > > > >> > > > > > >> > > > Checkpointing Mode Exactly Once >> > > > > > >> > > > Interval 10s >> > > > > > >> > > > Timeout 10m 0s >> > > > > > >> > > > Minimum Pause Between Checkpoints 10s >> > > > > > >> > > > Maximum Concurrent Checkpoints 1 >> > > > > > >> > > > Persist Checkpoints Externally Enabled (delete on >> > > > > cancellation) >> > > > > > >> > > > >> > > > > > >> > > > State backend: rocksdb (filesystem leads to same >> > > symptoms) >> > > > > > >> > > > Checkpoint size is tiny (500KB) >> > > > > > >> > > > >> > > > > > >> > > > An interesting difference to another job that I had >> > > upgraded >> > > > > > >> successfully >> > > > > > >> > > > is the low checkpointing interval. >> > > > > > >> > > > >> > > > > > >> > > > Thanks, >> > > > > > >> > > > Thomas >> > > > > > >> > > > >> > > > > > >> > > > >> > > > > > >> > > > On Wed, Jul 1, 2020 at 9:02 PM Zhijiang < >> > > > > > >> wangzhijiang...@aliyun.com >> > > > > > >> > > > .invalid> >> > > > > > >> > > > wrote: >> > > > > > >> > > > >> > > > > > >> > > > > Hi Thomas, >> > > > > > >> > > > > >> > > > > > >> > > > > Thanks for the efficient feedback. >> > > > > > >> > > > > >> > > > > > >> > > > > Regarding the suggestion of adding the release >> notes >> > > > > document, >> > > > > > >> I agree >> > > > > > >> > > > > with your point. Maybe we should adjust the vote >> > > template >> > > > > > >> accordingly >> > > > > > >> > > in >> > > > > > >> > > > > the respective wiki to guide the following release >> > > > > processes. >> > > > > > >> > > > > >> > > > > > >> > > > > Regarding the performance regression, could you >> > provide >> > > > some >> > > > > > >> more >> > > > > > >> > > details >> > > > > > >> > > > > for our better measurement or reproducing on our >> > sides? >> > > > > > >> > > > > E.g. I guess the topology only includes two >> vertexes >> > > > source >> > > > > > and >> > > > > > >> sink? >> > > > > > >> > > > > What is the parallelism for every vertex? >> > > > > > >> > > > > The upstream shuffles data to the downstream via >> > > rebalance >> > > > > > >> partitioner >> > > > > > >> > > or >> > > > > > >> > > > > other? >> > > > > > >> > > > > The checkpoint mode is exactly-once with rocksDB >> state >> > > > > > backend? >> > > > > > >> > > > > The backpressure happened in this case? >> > > > > > >> > > > > How much percentage regression in this case? >> > > > > > >> > > > > >> > > > > > >> > > > > Best, >> > > > > > >> > > > > Zhijiang >> > > > > > >> > > > > >> > > > > > >> > > > > >> > > > > > >> > > > > >> > > > > > >> > > > > >> > > > > > >> >> > ------------------------------------------------------------------ >> > > > > > >> > > > > From:Thomas Weise <t...@apache.org> >> > > > > > >> > > > > Send Time:2020年7月2日(星期四) 09:54 >> > > > > > >> > > > > To:dev <dev@flink.apache.org> >> > > > > > >> > > > > Subject:Re: [VOTE] Release 1.11.0, release >> candidate >> > #4 >> > > > > > >> > > > > >> > > > > > >> > > > > Hi Till, >> > > > > > >> > > > > >> > > > > > >> > > > > Yes, we don't have the setting in flink-conf.yaml. >> > > > > > >> > > > > >> > > > > > >> > > > > Generally, we carry forward the existing >> configuration >> > > and >> > > > > any >> > > > > > >> change >> > > > > > >> > > to >> > > > > > >> > > > > default configuration values would impact the >> upgrade. >> > > > > > >> > > > > >> > > > > > >> > > > > Yes, since it is an incompatible change I would >> state >> > it >> > > > in >> > > > > > the >> > > > > > >> release >> > > > > > >> > > > > notes. >> > > > > > >> > > > > >> > > > > > >> > > > > Thanks, >> > > > > > >> > > > > Thomas >> > > > > > >> > > > > >> > > > > > >> > > > > BTW I found a performance regression while trying >> to >> > > > upgrade >> > > > > > >> another >> > > > > > >> > > > > pipeline with this RC. It is a simple Kinesis to >> > Kinesis >> > > > > job. >> > > > > > >> Wasn't >> > > > > > >> > > able >> > > > > > >> > > > > to pin it down yet, symptoms include increased >> > > checkpoint >> > > > > > >> alignment >> > > > > > >> > > time. >> > > > > > >> > > > > >> > > > > > >> > > > > On Wed, Jul 1, 2020 at 12:04 AM Till Rohrmann < >> > > > > > >> trohrm...@apache.org> >> > > > > > >> > > > > wrote: >> > > > > > >> > > > > >> > > > > > >> > > > > > Hi Thomas, >> > > > > > >> > > > > > >> > > > > > >> > > > > > just to confirm: When starting the image in local >> > > mode, >> > > > > then >> > > > > > >> you >> > > > > > >> > > don't >> > > > > > >> > > > > have >> > > > > > >> > > > > > any of the JobManager memory configuration >> settings >> > > > > > >> configured in the >> > > > > > >> > > > > > effective flink-conf.yaml, right? Does this mean >> > that >> > > > you >> > > > > > have >> > > > > > >> > > > explicitly >> > > > > > >> > > > > > removed `jobmanager.heap.size: 1024m` from the >> > default >> > > > > > >> configuration? >> > > > > > >> > > > If >> > > > > > >> > > > > > this is the case, then I believe it was more of >> an >> > > > > > >> unintentional >> > > > > > >> > > > artifact >> > > > > > >> > > > > > that it worked before and it has been corrected >> now >> > so >> > > > > that >> > > > > > >> one needs >> > > > > > >> > > > to >> > > > > > >> > > > > > specify the memory of the JM process explicitly. >> Do >> > > you >> > > > > > think >> > > > > > >> it >> > > > > > >> > > would >> > > > > > >> > > > > help >> > > > > > >> > > > > > to explicitly state this in the release notes? >> > > > > > >> > > > > > >> > > > > > >> > > > > > Cheers, >> > > > > > >> > > > > > Till >> > > > > > >> > > > > > >> > > > > > >> > > > > > On Wed, Jul 1, 2020 at 7:01 AM Thomas Weise < >> > > > > t...@apache.org >> > > > > > > >> > > > > > >> wrote: >> > > > > > >> > > > > > >> > > > > > >> > > > > > > Thanks for preparing another RC! >> > > > > > >> > > > > > > >> > > > > > >> > > > > > > As mentioned in the previous RC thread, it >> would >> > be >> > > > > super >> > > > > > >> helpful >> > > > > > >> > > if >> > > > > > >> > > > > the >> > > > > > >> > > > > > > release notes that are part of the >> documentation >> > can >> > > > be >> > > > > > >> included >> > > > > > >> > > [1]. >> > > > > > >> > > > > > It's >> > > > > > >> > > > > > > a significant time-saver to have read those >> first. >> > > > > > >> > > > > > > >> > > > > > >> > > > > > > I found one more non-backward compatible change >> > that >> > > > > would >> > > > > > >> be worth >> > > > > > >> > > > > > > addressing/mentioning: >> > > > > > >> > > > > > > >> > > > > > >> > > > > > > It is now necessary to configure the jobmanager >> > heap >> > > > > size >> > > > > > in >> > > > > > >> > > > > > > flink-conf.yaml (with either >> jobmanager.heap.size >> > > > > > >> > > > > > > or jobmanager.memory.heap.size). Why would I >> not >> > > want >> > > > to >> > > > > > do >> > > > > > >> that >> > > > > > >> > > > > anyways? >> > > > > > >> > > > > > > Well, we set it dynamically for a cluster >> > deployment >> > > > via >> > > > > > the >> > > > > > >> > > > > > > flinkk8soperator, but the container image can >> also >> > > be >> > > > > used >> > > > > > >> for >> > > > > > >> > > > testing >> > > > > > >> > > > > > with >> > > > > > >> > > > > > > local mode (./bin/jobmanager.sh >> start-foreground >> > > > local). >> > > > > > >> That will >> > > > > > >> > > > fail >> > > > > > >> > > > > > if >> > > > > > >> > > > > > > the heap wasn't configured and that's how I >> > noticed >> > > > it. >> > > > > > >> > > > > > > >> > > > > > >> > > > > > > Thanks, >> > > > > > >> > > > > > > Thomas >> > > > > > >> > > > > > > >> > > > > > >> > > > > > > [1] >> > > > > > >> > > > > > > >> > > > > > >> > > > > > > >> > > > > > >> > > > > > >> > > > > > >> > > > > >> > > > > > >> > > > >> > > > > > >> > > >> > > > > > >> >> > > > > > >> > > > > >> > > > >> > > >> > >> https://ci.apache.org/projects/flink/flink-docs-release-1.11/release-notes/flink-1.11.html >> > > > > > >> > > > > > > >> > > > > > >> > > > > > > On Tue, Jun 30, 2020 at 3:18 AM Zhijiang < >> > > > > > >> > > wangzhijiang...@aliyun.com >> > > > > > >> > > > > > > .invalid> >> > > > > > >> > > > > > > wrote: >> > > > > > >> > > > > > > >> > > > > > >> > > > > > > > Hi everyone, >> > > > > > >> > > > > > > > >> > > > > > >> > > > > > > > Please review and vote on the release >> candidate >> > #4 >> > > > for >> > > > > > the >> > > > > > >> > > version >> > > > > > >> > > > > > > 1.11.0, >> > > > > > >> > > > > > > > as follows: >> > > > > > >> > > > > > > > [ ] +1, Approve the release >> > > > > > >> > > > > > > > [ ] -1, Do not approve the release (please >> > provide >> > > > > > >> specific >> > > > > > >> > > > comments) >> > > > > > >> > > > > > > > >> > > > > > >> > > > > > > > The complete staging area is available for >> your >> > > > > review, >> > > > > > >> which >> > > > > > >> > > > > includes: >> > > > > > >> > > > > > > > * JIRA release notes [1], >> > > > > > >> > > > > > > > * the official Apache source release and >> binary >> > > > > > >> convenience >> > > > > > >> > > > releases >> > > > > > >> > > > > to >> > > > > > >> > > > > > > be >> > > > > > >> > > > > > > > deployed to dist.apache.org [2], which are >> > signed >> > > > > with >> > > > > > >> the key >> > > > > > >> > > > with >> > > > > > >> > > > > > > > fingerprint >> > > 2DA85B93244FDFA19A6244500653C0A2CEA00D0E >> > > > > > [3], >> > > > > > >> > > > > > > > * all artifacts to be deployed to the Maven >> > > Central >> > > > > > >> Repository >> > > > > > >> > > [4], >> > > > > > >> > > > > > > > * source code tag "release-1.11.0-rc4" [5], >> > > > > > >> > > > > > > > * website pull request listing the new >> release >> > and >> > > > > > adding >> > > > > > >> > > > > announcement >> > > > > > >> > > > > > > > blog post [6]. >> > > > > > >> > > > > > > > >> > > > > > >> > > > > > > > The vote will be open for at least 72 hours. >> It >> > is >> > > > > > >> adopted by >> > > > > > >> > > > > majority >> > > > > > >> > > > > > > > approval, with at least 3 PMC affirmative >> votes. >> > > > > > >> > > > > > > > >> > > > > > >> > > > > > > > Thanks, >> > > > > > >> > > > > > > > Release Manager >> > > > > > >> > > > > > > > >> > > > > > >> > > > > > > > [1] >> > > > > > >> > > > > > > > >> > > > > > >> > > > > > > >> > > > > > >> > > > > > >> > > > > > >> > > > > >> > > > > > >> > > > >> > > > > > >> > > >> > > > > > >> >> > > > > > >> > > > > >> > > > >> > > >> > >> https://issues.apache.org/jira/secure/ReleaseNote.jspa?projectId=12315522&version=12346364 >> > > > > > >> > > > > > > > [2] >> > > > > > >> > > >> > > > https://dist.apache.org/repos/dist/dev/flink/flink-1.11.0-rc4/ >> > > > > > >> > > > > > > > [3] >> > > > > > https://dist.apache.org/repos/dist/release/flink/KEYS >> > > > > > >> > > > > > > > [4] >> > > > > > >> > > > > > > > >> > > > > > >> > > > > > >> > > > > > >> > > > >> > > > > > >> >> > > > > >> > > >> https://repository.apache.org/content/repositories/orgapacheflink-1377/ >> > > > > > >> > > > > > > > [5] >> > > > > > >> > > > >> > > > > https://github.com/apache/flink/releases/tag/release-1.11.0-rc4 >> > > > > > >> > > > > > > > [6] >> > https://github.com/apache/flink-web/pull/352 >> > > > > > >> > > > > > > > >> > > > > > >> > > > > > > > >> > > > > > >> > > > > > > >> > > > > > >> > > > > > >> > > > > > >> > > > > >> > > > > > >> > > > > >> > > > > > >> > > > >> > > > > > >> > > > >> > > > > > >> > > >> > > > > > >> > > >> > > > > > >> > >> > > > > > >> > >> > > > > > >> > >> > > > > > >> >> > > > > > >> >> > > > > > >> > > > > > >> > > > > >> > > > >> > > >> > >> > >> > -- >> > Regards, >> > Roman >> > >> > > > -- > Regards, > Roman >