Hi team, 2 weeks have passed since the last update. None of the test stabilities I've mentioned have been addressed since then.
Here's an updated status report of Blockers and Test instabilities: Blockers <https://issues.apache.org/jira/browse/FLINK-18682?filter=12349334>: Currently 3 blockers (2x Hive, 1x CI Infra) Test-Instabilities <https://issues.apache.org/jira/browse/FLINK-18869?filter=12348580> (total 79) which failed recently or frequently: - FLINK-18807 <https://issues.apache.org/jira/browse/FLINK-18807> FlinkKafkaProducerITCase.testScaleUpAfterScalingDown failed with "Timeout expired after 60000milliseconds while awaiting EndTxn(COMMIT)" - FLINK-18634 <https://issues.apache.org/jira/browse/FLINK-18634> FlinkKafkaProducerITCase.testRecoverCommittedTransaction failed with "Timeout expired after 60000milliseconds while awaiting InitProducerId" - FLINK-16908 <https://issues.apache.org/jira/browse/FLINK-16908> FlinkKafkaProducerITCase testScaleUpAfterScalingDown Timeout expired while initializing transactional state in 60000ms. - FLINK-13733 <https://issues.apache.org/jira/browse/FLINK-13733> FlinkKafkaInternalProducerITCase.testHappyPath fails on Travis --> The first three tickets seem related. - FLINK-17260 <https://issues.apache.org/jira/browse/FLINK-17260> StreamingKafkaITCase failure on Azure --> This one seems really hard to reproduce - FLINK-16768 <https://issues.apache.org/jira/browse/FLINK-16768> HadoopS3RecoverableWriterITCase.testRecoverWithStateWithMultiPart hangs - FLINK-18374 <https://issues.apache.org/jira/browse/FLINK-18374> HadoopS3RecoverableWriterITCase.testRecoverAfterMultiplePersistsStateWithMultiPart produced no output for 900 seconds --> nobody seems to feel responsible for these tickets. My guess is that the S3 connector should have shorter timeouts / faster retries to finish within the 15 minutes test timeout. OR there is really something wrong with the code. - FLINK-18333 UnsignedTypeConversionITCase failed caused by MariaDB4j "Asked to waitFor Program" <https://issues.apache.org/jira/browse/FLINK-18333> <https://issues.apache.org/jira/browse/FLINK-18333>- FLINK-17159 <https://issues.apache.org/jira/browse/FLINK-17159> ES6 ElasticsearchSinkITCase unstable - FLINK-17949 <https://issues.apache.org/jira/browse/FLINK-17949> KafkaShuffleITCase.testSerDeIngestionTime:156->testRecordSerDe:388 expected:<310> but was:<0> - FLINK-18222 <https://issues.apache.org/jira/browse/FLINK-18222> "Avro Confluent Schema Registry nightly end-to-end test" unstable with "Kafka cluster did not start after 120 seconds" - FLINK-17511 <https://issues.apache.org/jira/browse/FLINK-17511> "RocksDB Memory Management end-to-end test" fails with "Current block cache usage 202123272 larger than expected memory limit 200000000" On Mon, Jul 27, 2020 at 8:42 PM Robert Metzger <rmetz...@apache.org> wrote: > Hi team, > > We would like to use this thread as a permanent thread for > regularly syncing on stale blockers (need to have somebody assigned within > a week and progress, or a good plan) and build instabilities (need to check > if its a blocker). > > Recent test-instabilities: > > - https://issues.apache.org/jira/browse/FLINK-17159 (ES6 test) > - https://issues.apache.org/jira/browse/FLINK-16768 (s3 test unstable) > - https://issues.apache.org/jira/browse/FLINK-18374 (s3 test unstable) > - https://issues.apache.org/jira/browse/FLINK-17949 > (KafkaShuffleITCase) > - https://issues.apache.org/jira/browse/FLINK-18634 (Kafka > transactions) > > > It would be nice if the committers taking care of these components could > look into the test failures. > If nothing happens, we'll personally reach out to people I believe they > could look into the ticket. > > Best, > Dian & Robert >