Hi all, I'd like to update the status about the blocker issues and build instabilities as there is only one month left and the number of blocker issues increases a lot compared to last week.
== Blockers: https://issues.apache.org/jira/browse/FLINK-18682?filter=12349334 <https://issues.apache.org/jira/browse/FLINK-18682?filter=12349334> Currently there are 10 blocker issues - 3 performance regression (https://issues.apache.org/jira/browse/FLINK-19439 <https://issues.apache.org/jira/browse/FLINK-19439>, https://issues.apache.org/jira/browse/FLINK-19440 <https://issues.apache.org/jira/browse/FLINK-19440>, https://issues.apache.org/jira/browse/FLINK-19441 <https://issues.apache.org/jira/browse/FLINK-19441>) - 3 Runtime (https://issues.apache.org/jira/browse/FLINK-19264 <https://issues.apache.org/jira/browse/FLINK-19264>, https://issues.apache.org/jira/browse/FLINK-19388 <https://issues.apache.org/jira/browse/FLINK-19388>, https://issues.apache.org/jira/browse/FLINK-19249 <https://issues.apache.org/jira/browse/FLINK-19249>) - 1 HBase connector (https://issues.apache.org/jira/browse/FLINK-19445 <https://issues.apache.org/jira/browse/FLINK-19445>) - 1 Application mode (https://issues.apache.org/jira/browse/FLINK-19154 <https://issues.apache.org/jira/browse/FLINK-19154>) - 1 New source API (https://issues.apache.org/jira/browse/FLINK-19384 <https://issues.apache.org/jira/browse/FLINK-19384>) - 1 Kinesis (https://issues.apache.org/jira/browse/FLINK-19332 <https://issues.apache.org/jira/browse/FLINK-19332>) == Recent notable build instabilities which still have no owners: - New source API https://issues.apache.org/jira/browse/FLINK-19253 <https://issues.apache.org/jira/browse/FLINK-19253> SourceReaderTestBase.testAddSplitToExistingFetcher hangs https://issues.apache.org/jira/browse/FLINK-19370 <https://issues.apache.org/jira/browse/FLINK-19370> FileSourceTextLinesITCase.testContinuousTextFileSource failed as results mismatch https://issues.apache.org/jira/browse/FLINK-19427 <https://issues.apache.org/jira/browse/FLINK-19427> SplitFetcherTest.testNotifiesWhenGoingIdleConcurrent is instable, https://issues.apache.org/jira/browse/FLINK-19437 <https://issues.apache.org/jira/browse/FLINK-19437> FileSourceTextLinesITCase.testContinuousTextFileSource failed with "SimpleStreamFormat is not splittable, but found split end (0) different from file length (198)" https://issues.apache.org/jira/browse/FLINK-19448 <https://issues.apache.org/jira/browse/FLINK-19448> CoordinatedSourceITCase.testEnumeratorReaderCommunication hangs - Runtime/Network https://issues.apache.org/jira/browse/FLINK-19426 <https://issues.apache.org/jira/browse/FLINK-19426> End-to-end test sometimes fails with PartitionConnectionException - Unaligned Checkpoint https://issues.apache.org/jira/browse/FLINK-19027 <https://issues.apache.org/jira/browse/FLINK-19027> UnalignedCheckpointITCase.shouldPerformUnalignedCheckpointOnParallelRemoteChannel failed because of test timeout - Table https://issues.apache.org/jira/browse/FLINK-19340 <https://issues.apache.org/jira/browse/FLINK-19340> AggregateITCase.testListAggWithDistinct failed with "expected:<List(1,A, 2,B, 3,C#A, 4,EF)> but was:<List(1,A, 2,B, 3,C#A, 4,EF#EF)>" - HBase connector https://issues.apache.org/jira/browse/FLINK-18570 <https://issues.apache.org/jira/browse/FLINK-18570> SQLClientHBaseITCase.testHBase fails on azure https://issues.apache.org/jira/browse/FLINK-19447 <https://issues.apache.org/jira/browse/FLINK-19447> HBaseConnectorITCase.HBaseTestingClusterAutoStarter failed with "Master not initialized after 200000ms" - Avro https://issues.apache.org/jira/browse/FLINK-19422 <https://issues.apache.org/jira/browse/FLINK-19422> Avro Confluent Schema Registry nightly end-to-end test failed with "Register operation timed out; error code: 50002" Regards, Dian > 在 2020年9月21日,下午2:32,Robert Metzger <rmetz...@apache.org> 写道: > > Hi all, > > An update on the release status: > 1. We have 35 days = *5 weeks left until feature freeze* > 2. There are currently 2 blockers for Flink > <https://issues.apache.org/jira/browse/FLINK-19264?filter=12349334>, all > making progress > 3. We have 72 test instabilities > <https://issues.apache.org/jira/browse/FLINK-19237> (down 7 from 2 weeks > ago). I have pinged people to help addressing frequent or critical issues. > > Best, > Robert > > > On Mon, Sep 7, 2020 at 10:37 AM Robert Metzger <rmetz...@apache.org> wrote: > >> Hi all, >> >> another two weeks have passed. We now have 5 blockers >> <https://issues.apache.org/jira/browse/FLINK-18682?filter=12349334> (Up >> 3 from 2 weeks ago), but they are all making progress. >> >> We currently have 79 test-instabilities >> <https://issues.apache.org/jira/browse/FLINK-18869?filter=12348580>, >> since the last report, a few have been resolved, and some others have been >> added. >> I have checked the tickets, closed some old ones and pinged people to help >> resolve new or frequent ones. >> Except for Kafka, there are no major clusters of test instabilities. Most >> failures are rarely failing tests across the entire system. >> >> >> On Tue, Aug 25, 2020 at 9:05 AM Rui Li <lirui.fu...@gmail.com> wrote: >> >>> Thanks Dian for the pointer. I'll take a look. >>> >>> On Tue, Aug 25, 2020 at 3:02 PM Dian Fu <dian0511...@gmail.com> wrote: >>> >>>> Thanks Rui for the info. This issue(hive related) >>>> https://issues.apache.org/jira/browse/FLINK-19025 < >>>> https://issues.apache.org/jira/browse/FLINK-19025> is marked as a >>> blocker. >>>> >>>> Regards, >>>> Dian >>>> >>>>> 在 2020年8月25日,下午2:58,Rui Li <lirui.fu...@gmail.com> 写道: >>>>> >>>>> Hi Dian, >>>>> >>>>> FLINK-18682 has been fixed. Is there any other blocker in the hive >>>>> connector? >>>>> >>>>> On Tue, Aug 25, 2020 at 2:41 PM Dian Fu <dian0511...@gmail.com >>> <mailto: >>>> dian0511...@gmail.com>> wrote: >>>>> >>>>>> Hi all, >>>>>> >>>>>> Two weeks have passed and it seems that none of the test stabilities >>>>>> issues have been addressed since then. >>>>>> >>>>>> Here is an updated status report of Blockers and Test instabilities: >>>>>> >>>>>> Blockers < >>>>>> https://issues.apache.org/jira/browse/FLINK-18682?filter=12349334 < >>>> https://issues.apache.org/jira/browse/FLINK-18682?filter=12349334> < >>>>>> https://issues.apache.org/jira/browse/FLINK-18682?filter=12349334 < >>>> https://issues.apache.org/jira/browse/FLINK-18682?filter=12349334>>>: >>>>>> Currently 2 blockers (1x Hive, 1x CI Infra) >>>>>> >>>>>> Test-Instabilities < >>>>>> https://issues.apache.org/jira/browse/FLINK-18869?filter=12348580 < >>>> https://issues.apache.org/jira/browse/FLINK-18869?filter=12348580> < >>>>>> https://issues.apache.org/jira/browse/FLINK-18869?filter=12348580 < >>>> https://issues.apache.org/jira/browse/FLINK-18869?filter=12348580>>>: >>>>>> (total 80) >>>>>> >>>>>> Besides the issues already posted in previous mail, here are the new >>>>>> instability issues which should be taken care of: >>>>>> >>>>>> - FLINK-19012 (https://issues.apache.org/jira/browse/FLINK-19012 < >>>> https://issues.apache.org/jira/browse/FLINK-19012> < >>>>>> https://issues.apache.org/jira/browse/FLINK-19012 < >>>> https://issues.apache.org/jira/browse/FLINK-19012>>) >>>>>> E2E test fails with "Cannot register Closeable, this >>>>>> subtaskCheckpointCoordinator is already closed. Closing argument." >>>>>> >>>>>> -> This is a new issue occurred recently. It has occurred several >>> times >>>>>> and may indicate a bug somewhere and should be taken care of. >>>>>> >>>>>> - FLINK-9992 (https://issues.apache.org/jira/browse/FLINK-9992 < >>>> https://issues.apache.org/jira/browse/FLINK-9992> < >>>>>> https://issues.apache.org/jira/browse/FLINK-9992 < >>>> https://issues.apache.org/jira/browse/FLINK-9992>>) >>>>>> FsStorageLocationReferenceTest#testEncodeAndDecode failed in CI >>>>>> >>>>>> -> There is already a PR for it and needs review. >>>>>> >>>>>> - FLINK-18842 (https://issues.apache.org/jira/browse/FLINK-18842 < >>>> https://issues.apache.org/jira/browse/FLINK-18842> < >>>>>> https://issues.apache.org/jira/browse/FLINK-18842 < >>>> https://issues.apache.org/jira/browse/FLINK-18842>>) >>>>>> e2e test failed to download "localhost:9999/flink.tgz" in "Wordcount >>> on >>>>>> Docker test" >>>>>> >>>>>> >>>>>>> 在 2020年8月11日,下午2:08,Robert Metzger <rmetz...@apache.org> 写道: >>>>>>> >>>>>>> Hi team, >>>>>>> >>>>>>> 2 weeks have passed since the last update. None of the test >>> stabilities >>>>>>> I've mentioned have been addressed since then. >>>>>>> >>>>>>> Here's an updated status report of Blockers and Test instabilities: >>>>>>> >>>>>>> Blockers < >>>>>> https://issues.apache.org/jira/browse/FLINK-18682?filter=12349334>: >>>>>>> Currently 3 blockers (2x Hive, 1x CI Infra) >>>>>>> >>>>>>> Test-Instabilities >>>>>>> <https://issues.apache.org/jira/browse/FLINK-18869?filter=12348580> >>>>>> (total >>>>>>> 79) which failed recently or frequently: >>>>>>> >>>>>>> >>>>>>> - FLINK-18807 <https://issues.apache.org/jira/browse/FLINK-18807> >>>>>>> FlinkKafkaProducerITCase.testScaleUpAfterScalingDown >>>>>>> failed with "Timeout expired after 60000milliseconds while awaiting >>>>>>> EndTxn(COMMIT)" >>>>>>> >>>>>>> - FLINK-18634 <https://issues.apache.org/jira/browse/FLINK-18634> >>>>>>> FlinkKafkaProducerITCase.testRecoverCommittedTransaction >>>>>>> failed with "Timeout expired after 60000milliseconds while awaiting >>>>>>> InitProducerId" >>>>>>> >>>>>>> - FLINK-16908 <https://issues.apache.org/jira/browse/FLINK-16908> >>>>>>> FlinkKafkaProducerITCase >>>>>>> testScaleUpAfterScalingDown Timeout expired while initializing >>>>>>> transactional state in 60000ms. >>>>>>> >>>>>>> - FLINK-13733 <https://issues.apache.org/jira/browse/FLINK-13733> >>>>>>> FlinkKafkaInternalProducerITCase.testHappyPath fails on Travis >>>>>>> >>>>>>> --> The first three tickets seem related. >>>>>>> >>>>>>> >>>>>>> - FLINK-17260 <https://issues.apache.org/jira/browse/FLINK-17260> >>>>>>> StreamingKafkaITCase failure on Azure >>>>>>> >>>>>>> --> This one seems really hard to reproduce >>>>>>> >>>>>>> >>>>>>> - FLINK-16768 <https://issues.apache.org/jira/browse/FLINK-16768> >>>>>>> HadoopS3RecoverableWriterITCase.testRecoverWithStateWithMultiPart >>>>>>> hangs >>>>>>> >>>>>>> - FLINK-18374 <https://issues.apache.org/jira/browse/FLINK-18374> >>>>>>> >>>>>> >>>> >>> HadoopS3RecoverableWriterITCase.testRecoverAfterMultiplePersistsStateWithMultiPart >>>>>>> produced no output for 900 seconds >>>>>>> >>>>>>> --> nobody seems to feel responsible for these tickets. My guess is >>>> that >>>>>>> the S3 connector should have shorter timeouts / faster retries to >>>> finish >>>>>>> within the 15 minutes test timeout. OR there is really something >>> wrong >>>>>> with >>>>>>> the code. >>>>>>> >>>>>>> >>>>>>> - FLINK-18333 UnsignedTypeConversionITCase failed caused by >>> MariaDB4j >>>>>>> "Asked to waitFor Program" >>>>>>> <https://issues.apache.org/jira/browse/FLINK-18333> >>>>>>> <https://issues.apache.org/jira/browse/FLINK-18333>- FLINK-17159 >>>>>>> <https://issues.apache.org/jira/browse/FLINK-17159> ES6 >>>>>>> ElasticsearchSinkITCase unstable >>>>>>> >>>>>>> - FLINK-17949 <https://issues.apache.org/jira/browse/FLINK-17949> >>>>>>> KafkaShuffleITCase.testSerDeIngestionTime:156->testRecordSerDe:388 >>>>>>> expected:<310> but was:<0> >>>>>>> >>>>>>> - FLINK-18222 <https://issues.apache.org/jira/browse/FLINK-18222> >>>> "Avro >>>>>>> Confluent Schema Registry nightly end-to-end test" unstable with >>> "Kafka >>>>>>> cluster did not start after 120 seconds" >>>>>>> >>>>>>> - FLINK-17511 <https://issues.apache.org/jira/browse/FLINK-17511> >>>>>> "RocksDB >>>>>>> Memory Management end-to-end test" fails with "Current block cache >>>> usage >>>>>>> 202123272 larger than expected memory limit 200000000" >>>>>>> >>>>>>> >>>>>>> >>>>>>> >>>>>>> On Mon, Jul 27, 2020 at 8:42 PM Robert Metzger <rmetz...@apache.org >>>> >>>>>> wrote: >>>>>>> >>>>>>>> Hi team, >>>>>>>> >>>>>>>> We would like to use this thread as a permanent thread for >>>>>>>> regularly syncing on stale blockers (need to have somebody assigned >>>>>> within >>>>>>>> a week and progress, or a good plan) and build instabilities (need >>> to >>>>>> check >>>>>>>> if its a blocker). >>>>>>>> >>>>>>>> Recent test-instabilities: >>>>>>>> >>>>>>>> - https://issues.apache.org/jira/browse/FLINK-17159 (ES6 test) >>>>>>>> - https://issues.apache.org/jira/browse/FLINK-16768 (s3 test >>>>>> unstable) >>>>>>>> - https://issues.apache.org/jira/browse/FLINK-18374 (s3 test >>>>>> unstable) >>>>>>>> - https://issues.apache.org/jira/browse/FLINK-17949 >>>>>>>> (KafkaShuffleITCase) >>>>>>>> - https://issues.apache.org/jira/browse/FLINK-18634 (Kafka >>>>>>>> transactions) >>>>>>>> >>>>>>>> >>>>>>>> It would be nice if the committers taking care of these components >>>> could >>>>>>>> look into the test failures. >>>>>>>> If nothing happens, we'll personally reach out to people I believe >>>> they >>>>>>>> could look into the ticket. >>>>>>>> >>>>>>>> Best, >>>>>>>> Dian & Robert >>>>>>>> >>>>>> >>>>>> >>>>> >>>>> -- >>>>> Best regards! >>>>> Rui Li >>>> >>>> >>> >>> -- >>> Best regards! >>> Rui Li >>> >>