Hi all,

Two weeks have passed and it seems that none of the test stabilities issues 
have been addressed since then.

Here is an updated status report of Blockers and Test instabilities:

Blockers <https://issues.apache.org/jira/browse/FLINK-18682?filter=12349334 
<https://issues.apache.org/jira/browse/FLINK-18682?filter=12349334>>:
Currently 2 blockers (1x Hive, 1x CI Infra)

Test-Instabilities 
<https://issues.apache.org/jira/browse/FLINK-18869?filter=12348580 
<https://issues.apache.org/jira/browse/FLINK-18869?filter=12348580>>:
(total 80)

Besides the issues already posted in previous mail, here are the new 
instability issues which should be taken care of:

- FLINK-19012 (https://issues.apache.org/jira/browse/FLINK-19012 
<https://issues.apache.org/jira/browse/FLINK-19012>)
E2E test fails with "Cannot register Closeable, this 
subtaskCheckpointCoordinator is already closed. Closing argument."

-> This is a new issue occurred recently. It has occurred several times and may 
indicate a bug somewhere and should be taken care of.

- FLINK-9992 (https://issues.apache.org/jira/browse/FLINK-9992 
<https://issues.apache.org/jira/browse/FLINK-9992>)
FsStorageLocationReferenceTest#testEncodeAndDecode failed in CI

-> There is already a PR for it and needs review.

- FLINK-18842 (https://issues.apache.org/jira/browse/FLINK-18842 
<https://issues.apache.org/jira/browse/FLINK-18842>) 
e2e test failed to download "localhost:9999/flink.tgz" in "Wordcount on Docker 
test"


> 在 2020年8月11日,下午2:08,Robert Metzger <rmetz...@apache.org> 写道:
> 
> Hi team,
> 
> 2 weeks have passed since the last update. None of the test stabilities
> I've mentioned have been addressed since then.
> 
> Here's an updated status report of Blockers and Test instabilities:
> 
> Blockers <https://issues.apache.org/jira/browse/FLINK-18682?filter=12349334>:
> Currently 3 blockers (2x Hive, 1x CI Infra)
> 
> Test-Instabilities
> <https://issues.apache.org/jira/browse/FLINK-18869?filter=12348580> (total
> 79) which failed recently or frequently:
> 
> 
> - FLINK-18807 <https://issues.apache.org/jira/browse/FLINK-18807>
> FlinkKafkaProducerITCase.testScaleUpAfterScalingDown
> failed with "Timeout expired after 60000milliseconds while awaiting
> EndTxn(COMMIT)"
> 
> - FLINK-18634 <https://issues.apache.org/jira/browse/FLINK-18634>
> FlinkKafkaProducerITCase.testRecoverCommittedTransaction
> failed with "Timeout expired after 60000milliseconds while awaiting
> InitProducerId"
> 
> - FLINK-16908 <https://issues.apache.org/jira/browse/FLINK-16908>
> FlinkKafkaProducerITCase
> testScaleUpAfterScalingDown Timeout expired while initializing
> transactional state in 60000ms.
> 
> - FLINK-13733 <https://issues.apache.org/jira/browse/FLINK-13733>
> FlinkKafkaInternalProducerITCase.testHappyPath fails on Travis
> 
> --> The first three tickets seem related.
> 
> 
> - FLINK-17260 <https://issues.apache.org/jira/browse/FLINK-17260>
> StreamingKafkaITCase failure on Azure
> 
> --> This one seems really hard to reproduce
> 
> 
> - FLINK-16768 <https://issues.apache.org/jira/browse/FLINK-16768>
> HadoopS3RecoverableWriterITCase.testRecoverWithStateWithMultiPart
> hangs
> 
> - FLINK-18374 <https://issues.apache.org/jira/browse/FLINK-18374>
> HadoopS3RecoverableWriterITCase.testRecoverAfterMultiplePersistsStateWithMultiPart
> produced no output for 900 seconds
> 
> --> nobody seems to feel responsible for these tickets. My guess is that
> the S3 connector should have shorter timeouts / faster retries to finish
> within the 15 minutes test timeout. OR there is really something wrong with
> the code.
> 
> 
> - FLINK-18333 UnsignedTypeConversionITCase failed caused by MariaDB4j
> "Asked to waitFor Program"
> <https://issues.apache.org/jira/browse/FLINK-18333>
> <https://issues.apache.org/jira/browse/FLINK-18333>- FLINK-17159
> <https://issues.apache.org/jira/browse/FLINK-17159> ES6
> ElasticsearchSinkITCase unstable
> 
> - FLINK-17949 <https://issues.apache.org/jira/browse/FLINK-17949>
> KafkaShuffleITCase.testSerDeIngestionTime:156->testRecordSerDe:388
> expected:<310> but was:<0>
> 
> - FLINK-18222 <https://issues.apache.org/jira/browse/FLINK-18222> "Avro
> Confluent Schema Registry nightly end-to-end test" unstable with "Kafka
> cluster did not start after 120 seconds"
> 
> - FLINK-17511 <https://issues.apache.org/jira/browse/FLINK-17511> "RocksDB
> Memory Management end-to-end test" fails with "Current block cache usage
> 202123272 larger than expected memory limit 200000000"
> 
> 
> 
> 
> On Mon, Jul 27, 2020 at 8:42 PM Robert Metzger <rmetz...@apache.org> wrote:
> 
>> Hi team,
>> 
>> We would like to use this thread as a permanent thread for
>> regularly syncing on stale blockers (need to have somebody assigned within
>> a week and progress, or a good plan) and build instabilities (need to check
>> if its a blocker).
>> 
>> Recent test-instabilities:
>> 
>>   - https://issues.apache.org/jira/browse/FLINK-17159 (ES6 test)
>>   - https://issues.apache.org/jira/browse/FLINK-16768 (s3 test unstable)
>>   - https://issues.apache.org/jira/browse/FLINK-18374 (s3 test unstable)
>>   - https://issues.apache.org/jira/browse/FLINK-17949
>>   (KafkaShuffleITCase)
>>   - https://issues.apache.org/jira/browse/FLINK-18634 (Kafka
>>   transactions)
>> 
>> 
>> It would be nice if the committers taking care of these components could
>> look into the test failures.
>> If nothing happens, we'll personally reach out to people I believe they
>> could look into the ticket.
>> 
>> Best,
>> Dian & Robert
>> 

Reply via email to