Hi all,

I'd like to update the status about the blocker issues and build instabilities 
as there is only one month left and the number of blocker issues increases a 
lot compared to last week.

== Blockers: https://issues.apache.org/jira/browse/FLINK-18682?filter=12349334 
<https://issues.apache.org/jira/browse/FLINK-18682?filter=12349334>

Currently there are 10 blocker issues 
- 3 performance regression (https://issues.apache.org/jira/browse/FLINK-19439 
<https://issues.apache.org/jira/browse/FLINK-19439>, 
https://issues.apache.org/jira/browse/FLINK-19440 
<https://issues.apache.org/jira/browse/FLINK-19440>, 
https://issues.apache.org/jira/browse/FLINK-19441 
<https://issues.apache.org/jira/browse/FLINK-19441>)
- 3 Runtime (https://issues.apache.org/jira/browse/FLINK-19264 
<https://issues.apache.org/jira/browse/FLINK-19264>, 
https://issues.apache.org/jira/browse/FLINK-19388 
<https://issues.apache.org/jira/browse/FLINK-19388>, 
https://issues.apache.org/jira/browse/FLINK-19249 
<https://issues.apache.org/jira/browse/FLINK-19249>)
- 1 HBase connector (https://issues.apache.org/jira/browse/FLINK-19445 
<https://issues.apache.org/jira/browse/FLINK-19445>)
- 1 Application mode (https://issues.apache.org/jira/browse/FLINK-19154 
<https://issues.apache.org/jira/browse/FLINK-19154>)
- 1 New source API (https://issues.apache.org/jira/browse/FLINK-19384 
<https://issues.apache.org/jira/browse/FLINK-19384>)
- 1 Kinesis (https://issues.apache.org/jira/browse/FLINK-19332 
<https://issues.apache.org/jira/browse/FLINK-19332>)

== Recent notable build instabilities which still have no owners:
- New source API
   https://issues.apache.org/jira/browse/FLINK-19253 
<https://issues.apache.org/jira/browse/FLINK-19253>  
SourceReaderTestBase.testAddSplitToExistingFetcher hangs
   https://issues.apache.org/jira/browse/FLINK-19370 
<https://issues.apache.org/jira/browse/FLINK-19370>  
FileSourceTextLinesITCase.testContinuousTextFileSource failed as results 
mismatch
   https://issues.apache.org/jira/browse/FLINK-19427 
<https://issues.apache.org/jira/browse/FLINK-19427>  
SplitFetcherTest.testNotifiesWhenGoingIdleConcurrent is instable, 
   https://issues.apache.org/jira/browse/FLINK-19437 
<https://issues.apache.org/jira/browse/FLINK-19437>  
FileSourceTextLinesITCase.testContinuousTextFileSource failed with 
"SimpleStreamFormat is not splittable, but found split end (0) different from 
file length (198)"
   https://issues.apache.org/jira/browse/FLINK-19448 
<https://issues.apache.org/jira/browse/FLINK-19448>  
CoordinatedSourceITCase.testEnumeratorReaderCommunication hangs
- Runtime/Network 
   https://issues.apache.org/jira/browse/FLINK-19426 
<https://issues.apache.org/jira/browse/FLINK-19426>  End-to-end test sometimes 
fails with PartitionConnectionException
- Unaligned Checkpoint
   https://issues.apache.org/jira/browse/FLINK-19027 
<https://issues.apache.org/jira/browse/FLINK-19027>  
UnalignedCheckpointITCase.shouldPerformUnalignedCheckpointOnParallelRemoteChannel
 failed because of test timeout
- Table 
   https://issues.apache.org/jira/browse/FLINK-19340 
<https://issues.apache.org/jira/browse/FLINK-19340> 
AggregateITCase.testListAggWithDistinct failed with "expected:<List(1,A, 2,B, 
3,C#A, 4,EF)> but was:<List(1,A, 2,B, 3,C#A, 4,EF#EF)>"
- HBase connector
   https://issues.apache.org/jira/browse/FLINK-18570 
<https://issues.apache.org/jira/browse/FLINK-18570>  
SQLClientHBaseITCase.testHBase fails on azure
    https://issues.apache.org/jira/browse/FLINK-19447 
<https://issues.apache.org/jira/browse/FLINK-19447>  
HBaseConnectorITCase.HBaseTestingClusterAutoStarter failed with "Master not 
initialized after 200000ms"
- Avro
   https://issues.apache.org/jira/browse/FLINK-19422 
<https://issues.apache.org/jira/browse/FLINK-19422>  Avro Confluent Schema 
Registry nightly end-to-end test failed with "Register operation timed out; 
error code: 50002"

Regards,
Dian

> 在 2020年9月21日,下午2:32,Robert Metzger <rmetz...@apache.org> 写道:
> 
> Hi all,
> 
> An update on the release status:
> 1. We have 35 days = *5 weeks left until feature freeze*
> 2. There are currently 2 blockers for Flink
> <https://issues.apache.org/jira/browse/FLINK-19264?filter=12349334>, all
> making progress
> 3. We have 72 test instabilities
> <https://issues.apache.org/jira/browse/FLINK-19237> (down 7 from 2 weeks
> ago). I have pinged people to help addressing frequent or critical issues.
> 
> Best,
> Robert
> 
> 
> On Mon, Sep 7, 2020 at 10:37 AM Robert Metzger <rmetz...@apache.org> wrote:
> 
>> Hi all,
>> 
>> another two weeks have passed. We now have 5 blockers
>> <https://issues.apache.org/jira/browse/FLINK-18682?filter=12349334> (Up
>> 3 from 2 weeks ago), but they are all making progress.
>> 
>> We currently have 79 test-instabilities
>> <https://issues.apache.org/jira/browse/FLINK-18869?filter=12348580>,
>> since the last report, a few have been resolved, and some others have been
>> added.
>> I have checked the tickets, closed some old ones and pinged people to help
>> resolve new or frequent ones.
>> Except for Kafka, there are no major clusters of test instabilities. Most
>> failures are rarely failing tests across the entire system.
>> 
>> 
>> On Tue, Aug 25, 2020 at 9:05 AM Rui Li <lirui.fu...@gmail.com> wrote:
>> 
>>> Thanks Dian for the pointer. I'll take a look.
>>> 
>>> On Tue, Aug 25, 2020 at 3:02 PM Dian Fu <dian0511...@gmail.com> wrote:
>>> 
>>>> Thanks Rui for the info. This issue(hive related)
>>>> https://issues.apache.org/jira/browse/FLINK-19025 <
>>>> https://issues.apache.org/jira/browse/FLINK-19025> is marked as a
>>> blocker.
>>>> 
>>>> Regards,
>>>> Dian
>>>> 
>>>>> 在 2020年8月25日,下午2:58,Rui Li <lirui.fu...@gmail.com> 写道:
>>>>> 
>>>>> Hi Dian,
>>>>> 
>>>>> FLINK-18682 has been fixed. Is there any other blocker in the hive
>>>>> connector?
>>>>> 
>>>>> On Tue, Aug 25, 2020 at 2:41 PM Dian Fu <dian0511...@gmail.com
>>> <mailto:
>>>> dian0511...@gmail.com>> wrote:
>>>>> 
>>>>>> Hi all,
>>>>>> 
>>>>>> Two weeks have passed and it seems that none of the test stabilities
>>>>>> issues have been addressed since then.
>>>>>> 
>>>>>> Here is an updated status report of Blockers and Test instabilities:
>>>>>> 
>>>>>> Blockers <
>>>>>> https://issues.apache.org/jira/browse/FLINK-18682?filter=12349334 <
>>>> https://issues.apache.org/jira/browse/FLINK-18682?filter=12349334> <
>>>>>> https://issues.apache.org/jira/browse/FLINK-18682?filter=12349334 <
>>>> https://issues.apache.org/jira/browse/FLINK-18682?filter=12349334>>>:
>>>>>> Currently 2 blockers (1x Hive, 1x CI Infra)
>>>>>> 
>>>>>> Test-Instabilities <
>>>>>> https://issues.apache.org/jira/browse/FLINK-18869?filter=12348580 <
>>>> https://issues.apache.org/jira/browse/FLINK-18869?filter=12348580> <
>>>>>> https://issues.apache.org/jira/browse/FLINK-18869?filter=12348580 <
>>>> https://issues.apache.org/jira/browse/FLINK-18869?filter=12348580>>>:
>>>>>> (total 80)
>>>>>> 
>>>>>> Besides the issues already posted in previous mail, here are the new
>>>>>> instability issues which should be taken care of:
>>>>>> 
>>>>>> - FLINK-19012 (https://issues.apache.org/jira/browse/FLINK-19012 <
>>>> https://issues.apache.org/jira/browse/FLINK-19012> <
>>>>>> https://issues.apache.org/jira/browse/FLINK-19012 <
>>>> https://issues.apache.org/jira/browse/FLINK-19012>>)
>>>>>> E2E test fails with "Cannot register Closeable, this
>>>>>> subtaskCheckpointCoordinator is already closed. Closing argument."
>>>>>> 
>>>>>> -> This is a new issue occurred recently. It has occurred several
>>> times
>>>>>> and may indicate a bug somewhere and should be taken care of.
>>>>>> 
>>>>>> - FLINK-9992 (https://issues.apache.org/jira/browse/FLINK-9992 <
>>>> https://issues.apache.org/jira/browse/FLINK-9992> <
>>>>>> https://issues.apache.org/jira/browse/FLINK-9992 <
>>>> https://issues.apache.org/jira/browse/FLINK-9992>>)
>>>>>> FsStorageLocationReferenceTest#testEncodeAndDecode failed in CI
>>>>>> 
>>>>>> -> There is already a PR for it and needs review.
>>>>>> 
>>>>>> - FLINK-18842 (https://issues.apache.org/jira/browse/FLINK-18842 <
>>>> https://issues.apache.org/jira/browse/FLINK-18842> <
>>>>>> https://issues.apache.org/jira/browse/FLINK-18842 <
>>>> https://issues.apache.org/jira/browse/FLINK-18842>>)
>>>>>> e2e test failed to download "localhost:9999/flink.tgz" in "Wordcount
>>> on
>>>>>> Docker test"
>>>>>> 
>>>>>> 
>>>>>>> 在 2020年8月11日,下午2:08,Robert Metzger <rmetz...@apache.org> 写道:
>>>>>>> 
>>>>>>> Hi team,
>>>>>>> 
>>>>>>> 2 weeks have passed since the last update. None of the test
>>> stabilities
>>>>>>> I've mentioned have been addressed since then.
>>>>>>> 
>>>>>>> Here's an updated status report of Blockers and Test instabilities:
>>>>>>> 
>>>>>>> Blockers <
>>>>>> https://issues.apache.org/jira/browse/FLINK-18682?filter=12349334>:
>>>>>>> Currently 3 blockers (2x Hive, 1x CI Infra)
>>>>>>> 
>>>>>>> Test-Instabilities
>>>>>>> <https://issues.apache.org/jira/browse/FLINK-18869?filter=12348580>
>>>>>> (total
>>>>>>> 79) which failed recently or frequently:
>>>>>>> 
>>>>>>> 
>>>>>>> - FLINK-18807 <https://issues.apache.org/jira/browse/FLINK-18807>
>>>>>>> FlinkKafkaProducerITCase.testScaleUpAfterScalingDown
>>>>>>> failed with "Timeout expired after 60000milliseconds while awaiting
>>>>>>> EndTxn(COMMIT)"
>>>>>>> 
>>>>>>> - FLINK-18634 <https://issues.apache.org/jira/browse/FLINK-18634>
>>>>>>> FlinkKafkaProducerITCase.testRecoverCommittedTransaction
>>>>>>> failed with "Timeout expired after 60000milliseconds while awaiting
>>>>>>> InitProducerId"
>>>>>>> 
>>>>>>> - FLINK-16908 <https://issues.apache.org/jira/browse/FLINK-16908>
>>>>>>> FlinkKafkaProducerITCase
>>>>>>> testScaleUpAfterScalingDown Timeout expired while initializing
>>>>>>> transactional state in 60000ms.
>>>>>>> 
>>>>>>> - FLINK-13733 <https://issues.apache.org/jira/browse/FLINK-13733>
>>>>>>> FlinkKafkaInternalProducerITCase.testHappyPath fails on Travis
>>>>>>> 
>>>>>>> --> The first three tickets seem related.
>>>>>>> 
>>>>>>> 
>>>>>>> - FLINK-17260 <https://issues.apache.org/jira/browse/FLINK-17260>
>>>>>>> StreamingKafkaITCase failure on Azure
>>>>>>> 
>>>>>>> --> This one seems really hard to reproduce
>>>>>>> 
>>>>>>> 
>>>>>>> - FLINK-16768 <https://issues.apache.org/jira/browse/FLINK-16768>
>>>>>>> HadoopS3RecoverableWriterITCase.testRecoverWithStateWithMultiPart
>>>>>>> hangs
>>>>>>> 
>>>>>>> - FLINK-18374 <https://issues.apache.org/jira/browse/FLINK-18374>
>>>>>>> 
>>>>>> 
>>>> 
>>> HadoopS3RecoverableWriterITCase.testRecoverAfterMultiplePersistsStateWithMultiPart
>>>>>>> produced no output for 900 seconds
>>>>>>> 
>>>>>>> --> nobody seems to feel responsible for these tickets. My guess is
>>>> that
>>>>>>> the S3 connector should have shorter timeouts / faster retries to
>>>> finish
>>>>>>> within the 15 minutes test timeout. OR there is really something
>>> wrong
>>>>>> with
>>>>>>> the code.
>>>>>>> 
>>>>>>> 
>>>>>>> - FLINK-18333 UnsignedTypeConversionITCase failed caused by
>>> MariaDB4j
>>>>>>> "Asked to waitFor Program"
>>>>>>> <https://issues.apache.org/jira/browse/FLINK-18333>
>>>>>>> <https://issues.apache.org/jira/browse/FLINK-18333>- FLINK-17159
>>>>>>> <https://issues.apache.org/jira/browse/FLINK-17159> ES6
>>>>>>> ElasticsearchSinkITCase unstable
>>>>>>> 
>>>>>>> - FLINK-17949 <https://issues.apache.org/jira/browse/FLINK-17949>
>>>>>>> KafkaShuffleITCase.testSerDeIngestionTime:156->testRecordSerDe:388
>>>>>>> expected:<310> but was:<0>
>>>>>>> 
>>>>>>> - FLINK-18222 <https://issues.apache.org/jira/browse/FLINK-18222>
>>>> "Avro
>>>>>>> Confluent Schema Registry nightly end-to-end test" unstable with
>>> "Kafka
>>>>>>> cluster did not start after 120 seconds"
>>>>>>> 
>>>>>>> - FLINK-17511 <https://issues.apache.org/jira/browse/FLINK-17511>
>>>>>> "RocksDB
>>>>>>> Memory Management end-to-end test" fails with "Current block cache
>>>> usage
>>>>>>> 202123272 larger than expected memory limit 200000000"
>>>>>>> 
>>>>>>> 
>>>>>>> 
>>>>>>> 
>>>>>>> On Mon, Jul 27, 2020 at 8:42 PM Robert Metzger <rmetz...@apache.org
>>>> 
>>>>>> wrote:
>>>>>>> 
>>>>>>>> Hi team,
>>>>>>>> 
>>>>>>>> We would like to use this thread as a permanent thread for
>>>>>>>> regularly syncing on stale blockers (need to have somebody assigned
>>>>>> within
>>>>>>>> a week and progress, or a good plan) and build instabilities (need
>>> to
>>>>>> check
>>>>>>>> if its a blocker).
>>>>>>>> 
>>>>>>>> Recent test-instabilities:
>>>>>>>> 
>>>>>>>> - https://issues.apache.org/jira/browse/FLINK-17159 (ES6 test)
>>>>>>>> - https://issues.apache.org/jira/browse/FLINK-16768 (s3 test
>>>>>> unstable)
>>>>>>>> - https://issues.apache.org/jira/browse/FLINK-18374 (s3 test
>>>>>> unstable)
>>>>>>>> - https://issues.apache.org/jira/browse/FLINK-17949
>>>>>>>> (KafkaShuffleITCase)
>>>>>>>> - https://issues.apache.org/jira/browse/FLINK-18634 (Kafka
>>>>>>>> transactions)
>>>>>>>> 
>>>>>>>> 
>>>>>>>> It would be nice if the committers taking care of these components
>>>> could
>>>>>>>> look into the test failures.
>>>>>>>> If nothing happens, we'll personally reach out to people I believe
>>>> they
>>>>>>>> could look into the ticket.
>>>>>>>> 
>>>>>>>> Best,
>>>>>>>> Dian & Robert
>>>>>>>> 
>>>>>> 
>>>>>> 
>>>>> 
>>>>> --
>>>>> Best regards!
>>>>> Rui Li
>>>> 
>>>> 
>>> 
>>> --
>>> Best regards!
>>> Rui Li
>>> 
>> 

Reply via email to