Hi all, another two weeks have passed. We now have 5 blockers <https://issues.apache.org/jira/browse/FLINK-18682?filter=12349334> (Up 3 from 2 weeks ago), but they are all making progress.
We currently have 79 test-instabilities <https://issues.apache.org/jira/browse/FLINK-18869?filter=12348580>, since the last report, a few have been resolved, and some others have been added. I have checked the tickets, closed some old ones and pinged people to help resolve new or frequent ones. Except for Kafka, there are no major clusters of test instabilities. Most failures are rarely failing tests across the entire system. On Tue, Aug 25, 2020 at 9:05 AM Rui Li <lirui.fu...@gmail.com> wrote: > Thanks Dian for the pointer. I'll take a look. > > On Tue, Aug 25, 2020 at 3:02 PM Dian Fu <dian0511...@gmail.com> wrote: > > > Thanks Rui for the info. This issue(hive related) > > https://issues.apache.org/jira/browse/FLINK-19025 < > > https://issues.apache.org/jira/browse/FLINK-19025> is marked as a > blocker. > > > > Regards, > > Dian > > > > > 在 2020年8月25日,下午2:58,Rui Li <lirui.fu...@gmail.com> 写道: > > > > > > Hi Dian, > > > > > > FLINK-18682 has been fixed. Is there any other blocker in the hive > > > connector? > > > > > > On Tue, Aug 25, 2020 at 2:41 PM Dian Fu <dian0511...@gmail.com > <mailto: > > dian0511...@gmail.com>> wrote: > > > > > >> Hi all, > > >> > > >> Two weeks have passed and it seems that none of the test stabilities > > >> issues have been addressed since then. > > >> > > >> Here is an updated status report of Blockers and Test instabilities: > > >> > > >> Blockers < > > >> https://issues.apache.org/jira/browse/FLINK-18682?filter=12349334 < > > https://issues.apache.org/jira/browse/FLINK-18682?filter=12349334> < > > >> https://issues.apache.org/jira/browse/FLINK-18682?filter=12349334 < > > https://issues.apache.org/jira/browse/FLINK-18682?filter=12349334>>>: > > >> Currently 2 blockers (1x Hive, 1x CI Infra) > > >> > > >> Test-Instabilities < > > >> https://issues.apache.org/jira/browse/FLINK-18869?filter=12348580 < > > https://issues.apache.org/jira/browse/FLINK-18869?filter=12348580> < > > >> https://issues.apache.org/jira/browse/FLINK-18869?filter=12348580 < > > https://issues.apache.org/jira/browse/FLINK-18869?filter=12348580>>>: > > >> (total 80) > > >> > > >> Besides the issues already posted in previous mail, here are the new > > >> instability issues which should be taken care of: > > >> > > >> - FLINK-19012 (https://issues.apache.org/jira/browse/FLINK-19012 < > > https://issues.apache.org/jira/browse/FLINK-19012> < > > >> https://issues.apache.org/jira/browse/FLINK-19012 < > > https://issues.apache.org/jira/browse/FLINK-19012>>) > > >> E2E test fails with "Cannot register Closeable, this > > >> subtaskCheckpointCoordinator is already closed. Closing argument." > > >> > > >> -> This is a new issue occurred recently. It has occurred several > times > > >> and may indicate a bug somewhere and should be taken care of. > > >> > > >> - FLINK-9992 (https://issues.apache.org/jira/browse/FLINK-9992 < > > https://issues.apache.org/jira/browse/FLINK-9992> < > > >> https://issues.apache.org/jira/browse/FLINK-9992 < > > https://issues.apache.org/jira/browse/FLINK-9992>>) > > >> FsStorageLocationReferenceTest#testEncodeAndDecode failed in CI > > >> > > >> -> There is already a PR for it and needs review. > > >> > > >> - FLINK-18842 (https://issues.apache.org/jira/browse/FLINK-18842 < > > https://issues.apache.org/jira/browse/FLINK-18842> < > > >> https://issues.apache.org/jira/browse/FLINK-18842 < > > https://issues.apache.org/jira/browse/FLINK-18842>>) > > >> e2e test failed to download "localhost:9999/flink.tgz" in "Wordcount > on > > >> Docker test" > > >> > > >> > > >>> 在 2020年8月11日,下午2:08,Robert Metzger <rmetz...@apache.org> 写道: > > >>> > > >>> Hi team, > > >>> > > >>> 2 weeks have passed since the last update. None of the test > stabilities > > >>> I've mentioned have been addressed since then. > > >>> > > >>> Here's an updated status report of Blockers and Test instabilities: > > >>> > > >>> Blockers < > > >> https://issues.apache.org/jira/browse/FLINK-18682?filter=12349334>: > > >>> Currently 3 blockers (2x Hive, 1x CI Infra) > > >>> > > >>> Test-Instabilities > > >>> <https://issues.apache.org/jira/browse/FLINK-18869?filter=12348580> > > >> (total > > >>> 79) which failed recently or frequently: > > >>> > > >>> > > >>> - FLINK-18807 <https://issues.apache.org/jira/browse/FLINK-18807> > > >>> FlinkKafkaProducerITCase.testScaleUpAfterScalingDown > > >>> failed with "Timeout expired after 60000milliseconds while awaiting > > >>> EndTxn(COMMIT)" > > >>> > > >>> - FLINK-18634 <https://issues.apache.org/jira/browse/FLINK-18634> > > >>> FlinkKafkaProducerITCase.testRecoverCommittedTransaction > > >>> failed with "Timeout expired after 60000milliseconds while awaiting > > >>> InitProducerId" > > >>> > > >>> - FLINK-16908 <https://issues.apache.org/jira/browse/FLINK-16908> > > >>> FlinkKafkaProducerITCase > > >>> testScaleUpAfterScalingDown Timeout expired while initializing > > >>> transactional state in 60000ms. > > >>> > > >>> - FLINK-13733 <https://issues.apache.org/jira/browse/FLINK-13733> > > >>> FlinkKafkaInternalProducerITCase.testHappyPath fails on Travis > > >>> > > >>> --> The first three tickets seem related. > > >>> > > >>> > > >>> - FLINK-17260 <https://issues.apache.org/jira/browse/FLINK-17260> > > >>> StreamingKafkaITCase failure on Azure > > >>> > > >>> --> This one seems really hard to reproduce > > >>> > > >>> > > >>> - FLINK-16768 <https://issues.apache.org/jira/browse/FLINK-16768> > > >>> HadoopS3RecoverableWriterITCase.testRecoverWithStateWithMultiPart > > >>> hangs > > >>> > > >>> - FLINK-18374 <https://issues.apache.org/jira/browse/FLINK-18374> > > >>> > > >> > > > HadoopS3RecoverableWriterITCase.testRecoverAfterMultiplePersistsStateWithMultiPart > > >>> produced no output for 900 seconds > > >>> > > >>> --> nobody seems to feel responsible for these tickets. My guess is > > that > > >>> the S3 connector should have shorter timeouts / faster retries to > > finish > > >>> within the 15 minutes test timeout. OR there is really something > wrong > > >> with > > >>> the code. > > >>> > > >>> > > >>> - FLINK-18333 UnsignedTypeConversionITCase failed caused by MariaDB4j > > >>> "Asked to waitFor Program" > > >>> <https://issues.apache.org/jira/browse/FLINK-18333> > > >>> <https://issues.apache.org/jira/browse/FLINK-18333>- FLINK-17159 > > >>> <https://issues.apache.org/jira/browse/FLINK-17159> ES6 > > >>> ElasticsearchSinkITCase unstable > > >>> > > >>> - FLINK-17949 <https://issues.apache.org/jira/browse/FLINK-17949> > > >>> KafkaShuffleITCase.testSerDeIngestionTime:156->testRecordSerDe:388 > > >>> expected:<310> but was:<0> > > >>> > > >>> - FLINK-18222 <https://issues.apache.org/jira/browse/FLINK-18222> > > "Avro > > >>> Confluent Schema Registry nightly end-to-end test" unstable with > "Kafka > > >>> cluster did not start after 120 seconds" > > >>> > > >>> - FLINK-17511 <https://issues.apache.org/jira/browse/FLINK-17511> > > >> "RocksDB > > >>> Memory Management end-to-end test" fails with "Current block cache > > usage > > >>> 202123272 larger than expected memory limit 200000000" > > >>> > > >>> > > >>> > > >>> > > >>> On Mon, Jul 27, 2020 at 8:42 PM Robert Metzger <rmetz...@apache.org> > > >> wrote: > > >>> > > >>>> Hi team, > > >>>> > > >>>> We would like to use this thread as a permanent thread for > > >>>> regularly syncing on stale blockers (need to have somebody assigned > > >> within > > >>>> a week and progress, or a good plan) and build instabilities (need > to > > >> check > > >>>> if its a blocker). > > >>>> > > >>>> Recent test-instabilities: > > >>>> > > >>>> - https://issues.apache.org/jira/browse/FLINK-17159 (ES6 test) > > >>>> - https://issues.apache.org/jira/browse/FLINK-16768 (s3 test > > >> unstable) > > >>>> - https://issues.apache.org/jira/browse/FLINK-18374 (s3 test > > >> unstable) > > >>>> - https://issues.apache.org/jira/browse/FLINK-17949 > > >>>> (KafkaShuffleITCase) > > >>>> - https://issues.apache.org/jira/browse/FLINK-18634 (Kafka > > >>>> transactions) > > >>>> > > >>>> > > >>>> It would be nice if the committers taking care of these components > > could > > >>>> look into the test failures. > > >>>> If nothing happens, we'll personally reach out to people I believe > > they > > >>>> could look into the ticket. > > >>>> > > >>>> Best, > > >>>> Dian & Robert > > >>>> > > >> > > >> > > > > > > -- > > > Best regards! > > > Rui Li > > > > > > -- > Best regards! > Rui Li >