Hi all, An update on the release status: 1. We have 35 days = *5 weeks left until feature freeze* 2. There are currently 2 blockers for Flink <https://issues.apache.org/jira/browse/FLINK-19264?filter=12349334>, all making progress 3. We have 72 test instabilities <https://issues.apache.org/jira/browse/FLINK-19237> (down 7 from 2 weeks ago). I have pinged people to help addressing frequent or critical issues.
Best, Robert On Mon, Sep 7, 2020 at 10:37 AM Robert Metzger <rmetz...@apache.org> wrote: > Hi all, > > another two weeks have passed. We now have 5 blockers > <https://issues.apache.org/jira/browse/FLINK-18682?filter=12349334> (Up > 3 from 2 weeks ago), but they are all making progress. > > We currently have 79 test-instabilities > <https://issues.apache.org/jira/browse/FLINK-18869?filter=12348580>, > since the last report, a few have been resolved, and some others have been > added. > I have checked the tickets, closed some old ones and pinged people to help > resolve new or frequent ones. > Except for Kafka, there are no major clusters of test instabilities. Most > failures are rarely failing tests across the entire system. > > > On Tue, Aug 25, 2020 at 9:05 AM Rui Li <lirui.fu...@gmail.com> wrote: > >> Thanks Dian for the pointer. I'll take a look. >> >> On Tue, Aug 25, 2020 at 3:02 PM Dian Fu <dian0511...@gmail.com> wrote: >> >> > Thanks Rui for the info. This issue(hive related) >> > https://issues.apache.org/jira/browse/FLINK-19025 < >> > https://issues.apache.org/jira/browse/FLINK-19025> is marked as a >> blocker. >> > >> > Regards, >> > Dian >> > >> > > 在 2020年8月25日,下午2:58,Rui Li <lirui.fu...@gmail.com> 写道: >> > > >> > > Hi Dian, >> > > >> > > FLINK-18682 has been fixed. Is there any other blocker in the hive >> > > connector? >> > > >> > > On Tue, Aug 25, 2020 at 2:41 PM Dian Fu <dian0511...@gmail.com >> <mailto: >> > dian0511...@gmail.com>> wrote: >> > > >> > >> Hi all, >> > >> >> > >> Two weeks have passed and it seems that none of the test stabilities >> > >> issues have been addressed since then. >> > >> >> > >> Here is an updated status report of Blockers and Test instabilities: >> > >> >> > >> Blockers < >> > >> https://issues.apache.org/jira/browse/FLINK-18682?filter=12349334 < >> > https://issues.apache.org/jira/browse/FLINK-18682?filter=12349334> < >> > >> https://issues.apache.org/jira/browse/FLINK-18682?filter=12349334 < >> > https://issues.apache.org/jira/browse/FLINK-18682?filter=12349334>>>: >> > >> Currently 2 blockers (1x Hive, 1x CI Infra) >> > >> >> > >> Test-Instabilities < >> > >> https://issues.apache.org/jira/browse/FLINK-18869?filter=12348580 < >> > https://issues.apache.org/jira/browse/FLINK-18869?filter=12348580> < >> > >> https://issues.apache.org/jira/browse/FLINK-18869?filter=12348580 < >> > https://issues.apache.org/jira/browse/FLINK-18869?filter=12348580>>>: >> > >> (total 80) >> > >> >> > >> Besides the issues already posted in previous mail, here are the new >> > >> instability issues which should be taken care of: >> > >> >> > >> - FLINK-19012 (https://issues.apache.org/jira/browse/FLINK-19012 < >> > https://issues.apache.org/jira/browse/FLINK-19012> < >> > >> https://issues.apache.org/jira/browse/FLINK-19012 < >> > https://issues.apache.org/jira/browse/FLINK-19012>>) >> > >> E2E test fails with "Cannot register Closeable, this >> > >> subtaskCheckpointCoordinator is already closed. Closing argument." >> > >> >> > >> -> This is a new issue occurred recently. It has occurred several >> times >> > >> and may indicate a bug somewhere and should be taken care of. >> > >> >> > >> - FLINK-9992 (https://issues.apache.org/jira/browse/FLINK-9992 < >> > https://issues.apache.org/jira/browse/FLINK-9992> < >> > >> https://issues.apache.org/jira/browse/FLINK-9992 < >> > https://issues.apache.org/jira/browse/FLINK-9992>>) >> > >> FsStorageLocationReferenceTest#testEncodeAndDecode failed in CI >> > >> >> > >> -> There is already a PR for it and needs review. >> > >> >> > >> - FLINK-18842 (https://issues.apache.org/jira/browse/FLINK-18842 < >> > https://issues.apache.org/jira/browse/FLINK-18842> < >> > >> https://issues.apache.org/jira/browse/FLINK-18842 < >> > https://issues.apache.org/jira/browse/FLINK-18842>>) >> > >> e2e test failed to download "localhost:9999/flink.tgz" in "Wordcount >> on >> > >> Docker test" >> > >> >> > >> >> > >>> 在 2020年8月11日,下午2:08,Robert Metzger <rmetz...@apache.org> 写道: >> > >>> >> > >>> Hi team, >> > >>> >> > >>> 2 weeks have passed since the last update. None of the test >> stabilities >> > >>> I've mentioned have been addressed since then. >> > >>> >> > >>> Here's an updated status report of Blockers and Test instabilities: >> > >>> >> > >>> Blockers < >> > >> https://issues.apache.org/jira/browse/FLINK-18682?filter=12349334>: >> > >>> Currently 3 blockers (2x Hive, 1x CI Infra) >> > >>> >> > >>> Test-Instabilities >> > >>> <https://issues.apache.org/jira/browse/FLINK-18869?filter=12348580> >> > >> (total >> > >>> 79) which failed recently or frequently: >> > >>> >> > >>> >> > >>> - FLINK-18807 <https://issues.apache.org/jira/browse/FLINK-18807> >> > >>> FlinkKafkaProducerITCase.testScaleUpAfterScalingDown >> > >>> failed with "Timeout expired after 60000milliseconds while awaiting >> > >>> EndTxn(COMMIT)" >> > >>> >> > >>> - FLINK-18634 <https://issues.apache.org/jira/browse/FLINK-18634> >> > >>> FlinkKafkaProducerITCase.testRecoverCommittedTransaction >> > >>> failed with "Timeout expired after 60000milliseconds while awaiting >> > >>> InitProducerId" >> > >>> >> > >>> - FLINK-16908 <https://issues.apache.org/jira/browse/FLINK-16908> >> > >>> FlinkKafkaProducerITCase >> > >>> testScaleUpAfterScalingDown Timeout expired while initializing >> > >>> transactional state in 60000ms. >> > >>> >> > >>> - FLINK-13733 <https://issues.apache.org/jira/browse/FLINK-13733> >> > >>> FlinkKafkaInternalProducerITCase.testHappyPath fails on Travis >> > >>> >> > >>> --> The first three tickets seem related. >> > >>> >> > >>> >> > >>> - FLINK-17260 <https://issues.apache.org/jira/browse/FLINK-17260> >> > >>> StreamingKafkaITCase failure on Azure >> > >>> >> > >>> --> This one seems really hard to reproduce >> > >>> >> > >>> >> > >>> - FLINK-16768 <https://issues.apache.org/jira/browse/FLINK-16768> >> > >>> HadoopS3RecoverableWriterITCase.testRecoverWithStateWithMultiPart >> > >>> hangs >> > >>> >> > >>> - FLINK-18374 <https://issues.apache.org/jira/browse/FLINK-18374> >> > >>> >> > >> >> > >> HadoopS3RecoverableWriterITCase.testRecoverAfterMultiplePersistsStateWithMultiPart >> > >>> produced no output for 900 seconds >> > >>> >> > >>> --> nobody seems to feel responsible for these tickets. My guess is >> > that >> > >>> the S3 connector should have shorter timeouts / faster retries to >> > finish >> > >>> within the 15 minutes test timeout. OR there is really something >> wrong >> > >> with >> > >>> the code. >> > >>> >> > >>> >> > >>> - FLINK-18333 UnsignedTypeConversionITCase failed caused by >> MariaDB4j >> > >>> "Asked to waitFor Program" >> > >>> <https://issues.apache.org/jira/browse/FLINK-18333> >> > >>> <https://issues.apache.org/jira/browse/FLINK-18333>- FLINK-17159 >> > >>> <https://issues.apache.org/jira/browse/FLINK-17159> ES6 >> > >>> ElasticsearchSinkITCase unstable >> > >>> >> > >>> - FLINK-17949 <https://issues.apache.org/jira/browse/FLINK-17949> >> > >>> KafkaShuffleITCase.testSerDeIngestionTime:156->testRecordSerDe:388 >> > >>> expected:<310> but was:<0> >> > >>> >> > >>> - FLINK-18222 <https://issues.apache.org/jira/browse/FLINK-18222> >> > "Avro >> > >>> Confluent Schema Registry nightly end-to-end test" unstable with >> "Kafka >> > >>> cluster did not start after 120 seconds" >> > >>> >> > >>> - FLINK-17511 <https://issues.apache.org/jira/browse/FLINK-17511> >> > >> "RocksDB >> > >>> Memory Management end-to-end test" fails with "Current block cache >> > usage >> > >>> 202123272 larger than expected memory limit 200000000" >> > >>> >> > >>> >> > >>> >> > >>> >> > >>> On Mon, Jul 27, 2020 at 8:42 PM Robert Metzger <rmetz...@apache.org >> > >> > >> wrote: >> > >>> >> > >>>> Hi team, >> > >>>> >> > >>>> We would like to use this thread as a permanent thread for >> > >>>> regularly syncing on stale blockers (need to have somebody assigned >> > >> within >> > >>>> a week and progress, or a good plan) and build instabilities (need >> to >> > >> check >> > >>>> if its a blocker). >> > >>>> >> > >>>> Recent test-instabilities: >> > >>>> >> > >>>> - https://issues.apache.org/jira/browse/FLINK-17159 (ES6 test) >> > >>>> - https://issues.apache.org/jira/browse/FLINK-16768 (s3 test >> > >> unstable) >> > >>>> - https://issues.apache.org/jira/browse/FLINK-18374 (s3 test >> > >> unstable) >> > >>>> - https://issues.apache.org/jira/browse/FLINK-17949 >> > >>>> (KafkaShuffleITCase) >> > >>>> - https://issues.apache.org/jira/browse/FLINK-18634 (Kafka >> > >>>> transactions) >> > >>>> >> > >>>> >> > >>>> It would be nice if the committers taking care of these components >> > could >> > >>>> look into the test failures. >> > >>>> If nothing happens, we'll personally reach out to people I believe >> > they >> > >>>> could look into the ticket. >> > >>>> >> > >>>> Best, >> > >>>> Dian & Robert >> > >>>> >> > >> >> > >> >> > > >> > > -- >> > > Best regards! >> > > Rui Li >> > >> > >> >> -- >> Best regards! >> Rui Li >> >