Hi all, There are quite a few instabilities in our builds right now (master + release-1.9), some of which are directed or suspiciously related to the 1.9 release.
I'll categorize the instabilities into ones which we were already tracking in the 1.9 Burndown Kanban board [1] prior to this email, and which ones seems to be new or were not monitored so that we draw additional attention to them: *Instabilities that were already being tracked* - FLINK-13242: StandaloneResourceManagerTest.testStartupPeriod fails on Travis [2] A fix for this is coming with FLINK-13408 (Schedule StandaloneResourceManager.setFailUnfulfillableRequest whenever the leadership is acquired) [3] *New discovered instabilities that we should also start monitoring* - FLINK-13484: ConnectedComponents E2E fails with ResourceNotAvailableException [4] - FLINK-13487: TaskExecutorPartitionLifecycleTest.testPartitionReleaseAfterReleaseCall failed on Travis [5]. FLINK-13476 (Partitions not being properly released on cancel) could be the cause [6]. - FLINK-13488: flink-python fails to build on Travis due to Python 3.3 install failure [7] - FLINK-13489: Heavy deployment E2E fails quite consistently on Travis with TM heartbeat timeout [8] - FLINK-9900: ZooKeeperHighAvailabilityITCase.testRestoreBehaviourWithFaultyStateHandles deadlocks [9] - FLINK-13377: Streaming SQ E2E fails on Travis with mismatching outputs (could just be that the SQL query tested on Travis is indeterministic) [10] Cheers, Gordon [1] https://issues.apache.org/jira/secure/RapidBoard.jspa?projectKey=FLINK&rapidView=328 [2] https://issues.apache.org/jira/browse/FLINK-13242 [3] https://issues.apache.org/jira/browse/FLINK-13408 [4] https://issues.apache.org/jira/browse/FLINK-13484 [5] https://issues.apache.org/jira/browse/FLINK-13487 [6] https://issues.apache.org/jira/browse/FLINK-13476 [7] https://issues.apache.org/jira/browse/FLINK-13488 [8] https://issues.apache.org/jira/browse/FLINK-13489 [9] https://issues.apache.org/jira/browse/FLINK-9900 [10] https://issues.apache.org/jira/browse/FLINK-13377 On Sun, Jul 28, 2019 at 6:14 AM zhijiang <wangzhijiang...@aliyun.com.invalid> wrote: > Hi Gordon, > > Thanks for the following updates of current progress. > In addition, it might be better to also cover the fix of network resource > leak in jira ticket [1] which would be merged soon I think. > > [1] FLINK-13245: This fixes the leak of releasing reader/view with > partition in network stack. > > Best, > Zhijiang > ------------------------------------------------------------------ > From:Tzu-Li (Gordon) Tai <tzuli...@apache.org> > Send Time:2019年7月27日(星期六) 10:41 > To:dev <dev@flink.apache.org> > Subject:Re: [ANNOUNCE] Progress updates for Apache Flink 1.9.0 release > > Hi all, > > It's been a while since our last update for the release testing of 1.9.0, > so I want to bring attention to the current status of the release. > > We are approaching RC1 soon, waiting on the following specific last ongoing > threads to be closed: > - FLINK-13241: This fixes a problem where when using YARN, slot allocation > requests may be ignored [1] > - FLINK-13371: Potential partitions resource leak in case of producer > restarts [2] > - FLINK-13350: Distinguish between temporary tables and persisted tables > [3]. Strictly speaking this would be a new feature, but there was a > discussion here [4] to include a workaround for now in 1.9.0, and a proper > solution later on in 1.10.x. > - FLINK-12858: Potential distributed deadlock in case of synchronous > savepoint failure [5] > > The above is the critical path for moving forward with an RC1 for official > voting. > All of them have PRs already, and are currently being reviewed or close to > being merged. > > Cheers, > Gordon > > [1] https://issues.apache.org/jira/browse/FLINK-13241 > [2] https://issues.apache.org/jira/browse/FLINK-13371 > [3] https://issues.apache.org/jira/browse/FLINK-13350 > [4] > > http://apache-flink-mailing-list-archive.1008284.n3.nabble.com/DISCUSS-Support-temporary-tables-in-SQL-API-td30831.html > [5] https://issues.apache.org/jira/browse/FLINK-12858 > > On Tue, Jul 16, 2019 at 5:26 AM Tzu-Li (Gordon) Tai <tzuli...@apache.org> > wrote: > > > Update: RC0 for 1.9.0 has been created. Please see [1] for the preview > > source / binary releases and Maven artifacts. > > > > Cheers, > > Gordon > > > > [1] > > > http://apache-flink-mailing-list-archive.1008284.n3.nabble.com/PREVIEW-Apache-Flink-1-9-0-release-candidate-0-td30583.html > > > > On Mon, Jul 15, 2019 at 6:39 PM Tzu-Li (Gordon) Tai <tzuli...@apache.org > > > > wrote: > > > >> Hi Flink devs, > >> > >> As previously announced by Kurt [1], the release branch for 1.9.0 has > >> been cut [2] and we've now started the testing phase for this release, > as > >> well as resolving remaining blockers. > >> > >> I want to quickly provide an overview of our progress here. > >> Also, over the course of the testing phase, we will update this mail > >> thread every 2-3 days with the overall progress of the release to keep > you > >> updated. > >> > >> *1. Remaining blockers and critical issues* > >> You can find a link here [3] for a release Kanban board that provides an > >> overview of the remaining blockers and critical issues for releasing > 1.9.0. > >> The issues listed there are high priority for the release, so any help > >> with reviewing or fixing them is highly appreciated! > >> If you do assign yourself to any unassigned issue and start working on > >> it, please make sure to pull it to the "In Progress" column to let > others > >> be aware of this. > >> > >> *2. Creating RC 0 for 1.9.0* > >> We will create RC0 now to drive forward the testing efforts. > >> This should be ready by tomorrow morning (July 16, 8am CET). > >> Note that we will not have an official vote for RC0, as this is mainly > to > >> drive testing efforts. > >> RC1 with an official vote will be created once the blockers listed in > [3] > >> are resolved. > >> > >> Cheers, > >> Gordon > >> > >> [1] > >> > http://apache-flink-mailing-list-archive.1008284.n3.nabble.com/ANNOUNCE-Flink-1-9-release-branch-has-been-created-td30500.html > >> [2] > >> > https://gitbox.apache.org/repos/asf?p=flink.git;a=shortlog;h=refs/heads/release-1.9 > >> [3] > >> > https://issues.apache.org/jira/secure/RapidBoard.jspa?projectKey=FLINK&rapidView=328 > >> > > > >