Re: [DISCUSS][Release 1.12] Stale blockers and build instabilities

Robert Metzger Mon, 10 Aug 2020 23:09:22 -0700

Hi team,

2 weeks have passed since the last update. None of the test stabilities
I've mentioned have been addressed since then.


Here's an updated status report of Blockers and Test instabilities:

Blockers <https://issues.apache.org/jira/browse/FLINK-18682?filter=12349334>:
Currently 3 blockers (2x Hive, 1x CI Infra)

Test-Instabilities
<https://issues.apache.org/jira/browse/FLINK-18869?filter=12348580> (total
79) which failed recently or frequently:


- FLINK-18807 <https://issues.apache.org/jira/browse/FLINK-18807>
FlinkKafkaProducerITCase.testScaleUpAfterScalingDown
failed with "Timeout expired after 60000milliseconds while awaiting
EndTxn(COMMIT)"

- FLINK-18634 <https://issues.apache.org/jira/browse/FLINK-18634>
FlinkKafkaProducerITCase.testRecoverCommittedTransaction
failed with "Timeout expired after 60000milliseconds while awaiting
InitProducerId"

- FLINK-16908 <https://issues.apache.org/jira/browse/FLINK-16908>
FlinkKafkaProducerITCase
testScaleUpAfterScalingDown Timeout expired while initializing
transactional state in 60000ms.

- FLINK-13733 <https://issues.apache.org/jira/browse/FLINK-13733>
FlinkKafkaInternalProducerITCase.testHappyPath fails on Travis

--> The first three tickets seem related.


- FLINK-17260 <https://issues.apache.org/jira/browse/FLINK-17260>
StreamingKafkaITCase failure on Azure

--> This one seems really hard to reproduce


- FLINK-16768 <https://issues.apache.org/jira/browse/FLINK-16768>
HadoopS3RecoverableWriterITCase.testRecoverWithStateWithMultiPart
hangs

- FLINK-18374 <https://issues.apache.org/jira/browse/FLINK-18374>
HadoopS3RecoverableWriterITCase.testRecoverAfterMultiplePersistsStateWithMultiPart
produced no output for 900 seconds

--> nobody seems to feel responsible for these tickets. My guess is that
the S3 connector should have shorter timeouts / faster retries to finish
within the 15 minutes test timeout. OR there is really something wrong with
the code.


- FLINK-18333 UnsignedTypeConversionITCase failed caused by MariaDB4j
"Asked to waitFor Program"
<https://issues.apache.org/jira/browse/FLINK-18333>
<https://issues.apache.org/jira/browse/FLINK-18333>- FLINK-17159
<https://issues.apache.org/jira/browse/FLINK-17159> ES6
ElasticsearchSinkITCase unstable

- FLINK-17949 <https://issues.apache.org/jira/browse/FLINK-17949>
KafkaShuffleITCase.testSerDeIngestionTime:156->testRecordSerDe:388
expected:<310> but was:<0>

- FLINK-18222 <https://issues.apache.org/jira/browse/FLINK-18222> "Avro
Confluent Schema Registry nightly end-to-end test" unstable with "Kafka
cluster did not start after 120 seconds"

- FLINK-17511 <https://issues.apache.org/jira/browse/FLINK-17511> "RocksDB
Memory Management end-to-end test" fails with "Current block cache usage
202123272 larger than expected memory limit 200000000"




On Mon, Jul 27, 2020 at 8:42 PM Robert Metzger <rmetz...@apache.org> wrote:

> Hi team,
>
> We would like to use this thread as a permanent thread for
> regularly syncing on stale blockers (need to have somebody assigned within
> a week and progress, or a good plan) and build instabilities (need to check
> if its a blocker).
>
> Recent test-instabilities:
>
>    - https://issues.apache.org/jira/browse/FLINK-17159 (ES6 test)
>    - https://issues.apache.org/jira/browse/FLINK-16768 (s3 test unstable)
>    - https://issues.apache.org/jira/browse/FLINK-18374 (s3 test unstable)
>    - https://issues.apache.org/jira/browse/FLINK-17949
>    (KafkaShuffleITCase)
>    - https://issues.apache.org/jira/browse/FLINK-18634 (Kafka
>    transactions)
>
>
> It would be nice if the committers taking care of these components could
> look into the test failures.
> If nothing happens, we'll personally reach out to people I believe they
> could look into the ticket.
>
> Best,
> Dian & Robert
>

Re: [DISCUSS][Release 1.12] Stale blockers and build instabilities

Reply via email to