Thanks for monitoring the release progress and kindly reminding us Robert! Minor: the below link shows the complete list of existing blockers: https://issues.apache.org/jira/issues/?filter=12349334
Best Regards, Yu On Tue, 13 Oct 2020 at 03:03, Robert Metzger <rmetz...@apache.org> wrote: > Hi all! > > According to the plan > <https://cwiki.apache.org/confluence/display/FLINK/1.12+Release> discussed > earlier in the release cycle, the feature freeze is expected to happen in > the week of October 26th. That's in 2.5 weeks from now. > > I believe now is the time to discuss if we want to postpone the feature > freeze. > In my opinion, I would prefer to stick to the original schedule and rather > delay features to the 1.13 release if they are not ready yet. > > From a stability perspective, we currently have the following situation: > - 6 blockers: > https://issues.apache.org/jira/browse/FLINK-19154?filter=12349334, most of > them are making progress, I notified people on those where the status is > unclear. > - 80 test instabilities: > > https://issues.apache.org/jira/browse/FLINK-18117?filter=12348580&jql=project%20%3D%20FLINK%20AND%20resolution%20%3D%20Unresolved%20AND%20labels%20%3D%20test-stability%20ORDER%20BY%20updated%20DESC%2C%20created%20DESC > - The CI system is a bit unstable these days: The e2e tests are often > timing out. I will look into options to mitigate this. > > > > Drilling deeper into the test instabilities, these are some notable > clusters of test instabilities (with recent failures, usually more than > once) [tests marked with >> have nobody assigned] > > E2E tests, probably all test infrastructure > >> "Kerberized YARN per-job on Docker test" fails with "Could not start > hadoop cluster." https://issues.apache.org/jira/browse/FLINK-18117 > >> SQL Client end-to-end test (Old planner) Elasticsearch (v7.5.1) failed > due to download error https://issues.apache.org/jira/browse/FLINK-17424 > - "ES6 ElasticsearchSinkITCase unstable" > https://issues.apache.org/jira/browse/FLINK-17159 > - "Avro Confluent Schema Registry nightly end-to-end test failed with > "Register operation timed out; error code: 50002"" > https://issues.apache.org/jira/browse/FLINK-19422 > - "SQLClientHBaseITCase.testHBase fails on azure" > https://issues.apache.org/jira/browse/FLINK-18570 > > New Source API > - "SplitFetcherTest.testNotifiesWhenGoingIdleConcurrent is instable" > https://issues.apache.org/jira/browse/FLINK-19427 > >> "CoordinatedSourceITCase.testEnumeratorReaderCommunication hangs" > https://issues.apache.org/jira/browse/FLINK-19448 > - "SplitFetcherTest.testNotifiesWhenGoingIdleConcurrent gets stuck" > https://issues.apache.org/jira/browse/FLINK-19489 > > > Distributed Coordination > - "LeaderChangeClusterComponentsTest.testReelectionOfJobMaster failed with > "NoResourceAvailableException: Could not allocate the required slot within > slot request timeout" https://issues.apache.org/jira/browse/FLINK-19237 > - "TaskExecutorSubmissionTest#testFailingScheduleOrUpdateConsumers" > https://issues.apache.org/jira/browse/FLINK-17458 > - "ZooKeeperLeaderElectionITCase.testJobExecutionOnClusterWithLeaderChange > times out" https://issues.apache.org/jira/browse/FLINK-19514 > - "ZooKeeperLeaderElectionITCase.testJobExecutionOnClusterWithLeaderChange: > ZooKeeper unexpectedly modified" > https://issues.apache.org/jira/browse/FLINK-19458 > > Kafka > >> "KafkaITCase failing with "Failed to send data to Kafka: This server > does not host this topic-partition"" > https://issues.apache.org/jira/browse/FLINK-18444 > >> "KafkaShuffleITCase.testSerDeIngestionTime:156->testRecordSerDe:388 > expected:<310> but was:<0>" > https://issues.apache.org/jira/browse/FLINK-17949 > - "KafkaITCase.testKeyValueSupport failure due to assertion error."" > https://issues.apache.org/jira/browse/FLINK-15745 > - "KafkaITCase.testStartFromGroupOffsets times out on azure" > https://issues.apache.org/jira/browse/FLINK-18648 > - "FlinkKafkaInternalProducerITCase.testHappyPath fails on Travis" > https://issues.apache.org/jira/browse/FLINK-13733 > > > > On Tue, Sep 29, 2020 at 11:49 AM Dian Fu <dian0511...@gmail.com> wrote: > > > Hi all, > > > > I'd like to update the status about the blocker issues and build > > instabilities as there is only one month left and the number of blocker > > issues increases a lot compared to last week. > > > > == Blockers: > > https://issues.apache.org/jira/browse/FLINK-18682?filter=12349334 < > > https://issues.apache.org/jira/browse/FLINK-18682?filter=12349334> > > > > Currently there are 10 blocker issues > > - 3 performance regression ( > > https://issues.apache.org/jira/browse/FLINK-19439 < > > https://issues.apache.org/jira/browse/FLINK-19439>, > > https://issues.apache.org/jira/browse/FLINK-19440 < > > https://issues.apache.org/jira/browse/FLINK-19440>, > > https://issues.apache.org/jira/browse/FLINK-19441 < > > https://issues.apache.org/jira/browse/FLINK-19441>) > > - 3 Runtime (https://issues.apache.org/jira/browse/FLINK-19264 < > > https://issues.apache.org/jira/browse/FLINK-19264>, > > https://issues.apache.org/jira/browse/FLINK-19388 < > > https://issues.apache.org/jira/browse/FLINK-19388>, > > https://issues.apache.org/jira/browse/FLINK-19249 < > > https://issues.apache.org/jira/browse/FLINK-19249>) > > - 1 HBase connector (https://issues.apache.org/jira/browse/FLINK-19445 < > > https://issues.apache.org/jira/browse/FLINK-19445>) > > - 1 Application mode (https://issues.apache.org/jira/browse/FLINK-19154 > < > > https://issues.apache.org/jira/browse/FLINK-19154>) > > - 1 New source API (https://issues.apache.org/jira/browse/FLINK-19384 < > > https://issues.apache.org/jira/browse/FLINK-19384>) > > - 1 Kinesis (https://issues.apache.org/jira/browse/FLINK-19332 < > > https://issues.apache.org/jira/browse/FLINK-19332>) > > > > == Recent notable build instabilities which still have no owners: > > - New source API > > https://issues.apache.org/jira/browse/FLINK-19253 < > > https://issues.apache.org/jira/browse/FLINK-19253> > > SourceReaderTestBase.testAddSplitToExistingFetcher hangs > > https://issues.apache.org/jira/browse/FLINK-19370 < > > https://issues.apache.org/jira/browse/FLINK-19370> > > FileSourceTextLinesITCase.testContinuousTextFileSource failed as results > > mismatch > > https://issues.apache.org/jira/browse/FLINK-19427 < > > https://issues.apache.org/jira/browse/FLINK-19427> > > SplitFetcherTest.testNotifiesWhenGoingIdleConcurrent is instable, > > https://issues.apache.org/jira/browse/FLINK-19437 < > > https://issues.apache.org/jira/browse/FLINK-19437> > > FileSourceTextLinesITCase.testContinuousTextFileSource failed with > > "SimpleStreamFormat is not splittable, but found split end (0) different > > from file length (198)" > > https://issues.apache.org/jira/browse/FLINK-19448 < > > https://issues.apache.org/jira/browse/FLINK-19448> > > CoordinatedSourceITCase.testEnumeratorReaderCommunication hangs > > - Runtime/Network > > https://issues.apache.org/jira/browse/FLINK-19426 < > > https://issues.apache.org/jira/browse/FLINK-19426> End-to-end test > > sometimes fails with PartitionConnectionException > > - Unaligned Checkpoint > > https://issues.apache.org/jira/browse/FLINK-19027 < > > https://issues.apache.org/jira/browse/FLINK-19027> > > > UnalignedCheckpointITCase.shouldPerformUnalignedCheckpointOnParallelRemoteChannel > > failed because of test timeout > > - Table > > https://issues.apache.org/jira/browse/FLINK-19340 < > > https://issues.apache.org/jira/browse/FLINK-19340> > > AggregateITCase.testListAggWithDistinct failed with "expected:<List(1,A, > > 2,B, 3,C#A, 4,EF)> but was:<List(1,A, 2,B, 3,C#A, 4,EF#EF)>" > > - HBase connector > > https://issues.apache.org/jira/browse/FLINK-18570 < > > https://issues.apache.org/jira/browse/FLINK-18570> > > SQLClientHBaseITCase.testHBase fails on azure > > https://issues.apache.org/jira/browse/FLINK-19447 < > > https://issues.apache.org/jira/browse/FLINK-19447> > > HBaseConnectorITCase.HBaseTestingClusterAutoStarter failed with "Master > not > > initialized after 200000ms" > > - Avro > > https://issues.apache.org/jira/browse/FLINK-19422 < > > https://issues.apache.org/jira/browse/FLINK-19422> Avro Confluent > Schema > > Registry nightly end-to-end test failed with "Register operation timed > out; > > error code: 50002" > > > > Regards, > > Dian > > > > > 在 2020年9月21日,下午2:32,Robert Metzger <rmetz...@apache.org> 写道: > > > > > > Hi all, > > > > > > An update on the release status: > > > 1. We have 35 days = *5 weeks left until feature freeze* > > > 2. There are currently 2 blockers for Flink > > > <https://issues.apache.org/jira/browse/FLINK-19264?filter=12349334>, > all > > > making progress > > > 3. We have 72 test instabilities > > > <https://issues.apache.org/jira/browse/FLINK-19237> (down 7 from 2 > weeks > > > ago). I have pinged people to help addressing frequent or critical > > issues. > > > > > > Best, > > > Robert > > > > > > > > > On Mon, Sep 7, 2020 at 10:37 AM Robert Metzger <rmetz...@apache.org> > > wrote: > > > > > >> Hi all, > > >> > > >> another two weeks have passed. We now have 5 blockers > > >> <https://issues.apache.org/jira/browse/FLINK-18682?filter=12349334> > (Up > > >> 3 from 2 weeks ago), but they are all making progress. > > >> > > >> We currently have 79 test-instabilities > > >> <https://issues.apache.org/jira/browse/FLINK-18869?filter=12348580>, > > >> since the last report, a few have been resolved, and some others have > > been > > >> added. > > >> I have checked the tickets, closed some old ones and pinged people to > > help > > >> resolve new or frequent ones. > > >> Except for Kafka, there are no major clusters of test instabilities. > > Most > > >> failures are rarely failing tests across the entire system. > > >> > > >> > > >> On Tue, Aug 25, 2020 at 9:05 AM Rui Li <lirui.fu...@gmail.com> wrote: > > >> > > >>> Thanks Dian for the pointer. I'll take a look. > > >>> > > >>> On Tue, Aug 25, 2020 at 3:02 PM Dian Fu <dian0511...@gmail.com> > wrote: > > >>> > > >>>> Thanks Rui for the info. This issue(hive related) > > >>>> https://issues.apache.org/jira/browse/FLINK-19025 < > > >>>> https://issues.apache.org/jira/browse/FLINK-19025> is marked as a > > >>> blocker. > > >>>> > > >>>> Regards, > > >>>> Dian > > >>>> > > >>>>> 在 2020年8月25日,下午2:58,Rui Li <lirui.fu...@gmail.com> 写道: > > >>>>> > > >>>>> Hi Dian, > > >>>>> > > >>>>> FLINK-18682 has been fixed. Is there any other blocker in the hive > > >>>>> connector? > > >>>>> > > >>>>> On Tue, Aug 25, 2020 at 2:41 PM Dian Fu <dian0511...@gmail.com > > >>> <mailto: > > >>>> dian0511...@gmail.com>> wrote: > > >>>>> > > >>>>>> Hi all, > > >>>>>> > > >>>>>> Two weeks have passed and it seems that none of the test > stabilities > > >>>>>> issues have been addressed since then. > > >>>>>> > > >>>>>> Here is an updated status report of Blockers and Test > instabilities: > > >>>>>> > > >>>>>> Blockers < > > >>>>>> https://issues.apache.org/jira/browse/FLINK-18682?filter=12349334 > < > > >>>> https://issues.apache.org/jira/browse/FLINK-18682?filter=12349334> > < > > >>>>>> https://issues.apache.org/jira/browse/FLINK-18682?filter=12349334 > < > > >>>> https://issues.apache.org/jira/browse/FLINK-18682?filter=12349334 > >>>: > > >>>>>> Currently 2 blockers (1x Hive, 1x CI Infra) > > >>>>>> > > >>>>>> Test-Instabilities < > > >>>>>> https://issues.apache.org/jira/browse/FLINK-18869?filter=12348580 > < > > >>>> https://issues.apache.org/jira/browse/FLINK-18869?filter=12348580> > < > > >>>>>> https://issues.apache.org/jira/browse/FLINK-18869?filter=12348580 > < > > >>>> https://issues.apache.org/jira/browse/FLINK-18869?filter=12348580 > >>>: > > >>>>>> (total 80) > > >>>>>> > > >>>>>> Besides the issues already posted in previous mail, here are the > new > > >>>>>> instability issues which should be taken care of: > > >>>>>> > > >>>>>> - FLINK-19012 (https://issues.apache.org/jira/browse/FLINK-19012 > < > > >>>> https://issues.apache.org/jira/browse/FLINK-19012> < > > >>>>>> https://issues.apache.org/jira/browse/FLINK-19012 < > > >>>> https://issues.apache.org/jira/browse/FLINK-19012>>) > > >>>>>> E2E test fails with "Cannot register Closeable, this > > >>>>>> subtaskCheckpointCoordinator is already closed. Closing argument." > > >>>>>> > > >>>>>> -> This is a new issue occurred recently. It has occurred several > > >>> times > > >>>>>> and may indicate a bug somewhere and should be taken care of. > > >>>>>> > > >>>>>> - FLINK-9992 (https://issues.apache.org/jira/browse/FLINK-9992 < > > >>>> https://issues.apache.org/jira/browse/FLINK-9992> < > > >>>>>> https://issues.apache.org/jira/browse/FLINK-9992 < > > >>>> https://issues.apache.org/jira/browse/FLINK-9992>>) > > >>>>>> FsStorageLocationReferenceTest#testEncodeAndDecode failed in CI > > >>>>>> > > >>>>>> -> There is already a PR for it and needs review. > > >>>>>> > > >>>>>> - FLINK-18842 (https://issues.apache.org/jira/browse/FLINK-18842 > < > > >>>> https://issues.apache.org/jira/browse/FLINK-18842> < > > >>>>>> https://issues.apache.org/jira/browse/FLINK-18842 < > > >>>> https://issues.apache.org/jira/browse/FLINK-18842>>) > > >>>>>> e2e test failed to download "localhost:9999/flink.tgz" in > "Wordcount > > >>> on > > >>>>>> Docker test" > > >>>>>> > > >>>>>> > > >>>>>>> 在 2020年8月11日,下午2:08,Robert Metzger <rmetz...@apache.org> 写道: > > >>>>>>> > > >>>>>>> Hi team, > > >>>>>>> > > >>>>>>> 2 weeks have passed since the last update. None of the test > > >>> stabilities > > >>>>>>> I've mentioned have been addressed since then. > > >>>>>>> > > >>>>>>> Here's an updated status report of Blockers and Test > instabilities: > > >>>>>>> > > >>>>>>> Blockers < > > >>>>>> https://issues.apache.org/jira/browse/FLINK-18682?filter=12349334 > >: > > >>>>>>> Currently 3 blockers (2x Hive, 1x CI Infra) > > >>>>>>> > > >>>>>>> Test-Instabilities > > >>>>>>> < > https://issues.apache.org/jira/browse/FLINK-18869?filter=12348580 > > > > > >>>>>> (total > > >>>>>>> 79) which failed recently or frequently: > > >>>>>>> > > >>>>>>> > > >>>>>>> - FLINK-18807 <https://issues.apache.org/jira/browse/FLINK-18807 > > > > >>>>>>> FlinkKafkaProducerITCase.testScaleUpAfterScalingDown > > >>>>>>> failed with "Timeout expired after 60000milliseconds while > awaiting > > >>>>>>> EndTxn(COMMIT)" > > >>>>>>> > > >>>>>>> - FLINK-18634 <https://issues.apache.org/jira/browse/FLINK-18634 > > > > >>>>>>> FlinkKafkaProducerITCase.testRecoverCommittedTransaction > > >>>>>>> failed with "Timeout expired after 60000milliseconds while > awaiting > > >>>>>>> InitProducerId" > > >>>>>>> > > >>>>>>> - FLINK-16908 <https://issues.apache.org/jira/browse/FLINK-16908 > > > > >>>>>>> FlinkKafkaProducerITCase > > >>>>>>> testScaleUpAfterScalingDown Timeout expired while initializing > > >>>>>>> transactional state in 60000ms. > > >>>>>>> > > >>>>>>> - FLINK-13733 <https://issues.apache.org/jira/browse/FLINK-13733 > > > > >>>>>>> FlinkKafkaInternalProducerITCase.testHappyPath fails on Travis > > >>>>>>> > > >>>>>>> --> The first three tickets seem related. > > >>>>>>> > > >>>>>>> > > >>>>>>> - FLINK-17260 <https://issues.apache.org/jira/browse/FLINK-17260 > > > > >>>>>>> StreamingKafkaITCase failure on Azure > > >>>>>>> > > >>>>>>> --> This one seems really hard to reproduce > > >>>>>>> > > >>>>>>> > > >>>>>>> - FLINK-16768 <https://issues.apache.org/jira/browse/FLINK-16768 > > > > >>>>>>> HadoopS3RecoverableWriterITCase.testRecoverWithStateWithMultiPart > > >>>>>>> hangs > > >>>>>>> > > >>>>>>> - FLINK-18374 <https://issues.apache.org/jira/browse/FLINK-18374 > > > > >>>>>>> > > >>>>>> > > >>>> > > >>> > > > HadoopS3RecoverableWriterITCase.testRecoverAfterMultiplePersistsStateWithMultiPart > > >>>>>>> produced no output for 900 seconds > > >>>>>>> > > >>>>>>> --> nobody seems to feel responsible for these tickets. My guess > is > > >>>> that > > >>>>>>> the S3 connector should have shorter timeouts / faster retries to > > >>>> finish > > >>>>>>> within the 15 minutes test timeout. OR there is really something > > >>> wrong > > >>>>>> with > > >>>>>>> the code. > > >>>>>>> > > >>>>>>> > > >>>>>>> - FLINK-18333 UnsignedTypeConversionITCase failed caused by > > >>> MariaDB4j > > >>>>>>> "Asked to waitFor Program" > > >>>>>>> <https://issues.apache.org/jira/browse/FLINK-18333> > > >>>>>>> <https://issues.apache.org/jira/browse/FLINK-18333>- FLINK-17159 > > >>>>>>> <https://issues.apache.org/jira/browse/FLINK-17159> ES6 > > >>>>>>> ElasticsearchSinkITCase unstable > > >>>>>>> > > >>>>>>> - FLINK-17949 <https://issues.apache.org/jira/browse/FLINK-17949 > > > > >>>>>>> > KafkaShuffleITCase.testSerDeIngestionTime:156->testRecordSerDe:388 > > >>>>>>> expected:<310> but was:<0> > > >>>>>>> > > >>>>>>> - FLINK-18222 <https://issues.apache.org/jira/browse/FLINK-18222 > > > > >>>> "Avro > > >>>>>>> Confluent Schema Registry nightly end-to-end test" unstable with > > >>> "Kafka > > >>>>>>> cluster did not start after 120 seconds" > > >>>>>>> > > >>>>>>> - FLINK-17511 <https://issues.apache.org/jira/browse/FLINK-17511 > > > > >>>>>> "RocksDB > > >>>>>>> Memory Management end-to-end test" fails with "Current block > cache > > >>>> usage > > >>>>>>> 202123272 larger than expected memory limit 200000000" > > >>>>>>> > > >>>>>>> > > >>>>>>> > > >>>>>>> > > >>>>>>> On Mon, Jul 27, 2020 at 8:42 PM Robert Metzger < > > rmetz...@apache.org > > >>>> > > >>>>>> wrote: > > >>>>>>> > > >>>>>>>> Hi team, > > >>>>>>>> > > >>>>>>>> We would like to use this thread as a permanent thread for > > >>>>>>>> regularly syncing on stale blockers (need to have somebody > > assigned > > >>>>>> within > > >>>>>>>> a week and progress, or a good plan) and build instabilities > (need > > >>> to > > >>>>>> check > > >>>>>>>> if its a blocker). > > >>>>>>>> > > >>>>>>>> Recent test-instabilities: > > >>>>>>>> > > >>>>>>>> - https://issues.apache.org/jira/browse/FLINK-17159 (ES6 test) > > >>>>>>>> - https://issues.apache.org/jira/browse/FLINK-16768 (s3 test > > >>>>>> unstable) > > >>>>>>>> - https://issues.apache.org/jira/browse/FLINK-18374 (s3 test > > >>>>>> unstable) > > >>>>>>>> - https://issues.apache.org/jira/browse/FLINK-17949 > > >>>>>>>> (KafkaShuffleITCase) > > >>>>>>>> - https://issues.apache.org/jira/browse/FLINK-18634 (Kafka > > >>>>>>>> transactions) > > >>>>>>>> > > >>>>>>>> > > >>>>>>>> It would be nice if the committers taking care of these > components > > >>>> could > > >>>>>>>> look into the test failures. > > >>>>>>>> If nothing happens, we'll personally reach out to people I > believe > > >>>> they > > >>>>>>>> could look into the ticket. > > >>>>>>>> > > >>>>>>>> Best, > > >>>>>>>> Dian & Robert > > >>>>>>>> > > >>>>>> > > >>>>>> > > >>>>> > > >>>>> -- > > >>>>> Best regards! > > >>>>> Rui Li > > >>>> > > >>>> > > >>> > > >>> -- > > >>> Best regards! > > >>> Rui Li > > >>> > > >> > > > > >