Thanks for monitoring the release progress and kindly reminding us Robert!

Minor: the below link shows the complete list of existing blockers:
https://issues.apache.org/jira/issues/?filter=12349334

Best Regards,
Yu


On Tue, 13 Oct 2020 at 03:03, Robert Metzger <rmetz...@apache.org> wrote:

> Hi all!
>
> According to the plan
> <https://cwiki.apache.org/confluence/display/FLINK/1.12+Release> discussed
> earlier in the release cycle, the feature freeze is expected to happen in
> the week of October 26th. That's in 2.5 weeks from now.
>
> I believe now is the time to discuss if we want to postpone the feature
> freeze.
> In my opinion, I would prefer to stick to the original schedule and rather
> delay features to the 1.13 release if they are not ready yet.
>
> From a stability perspective, we currently have the following situation:
> - 6 blockers:
> https://issues.apache.org/jira/browse/FLINK-19154?filter=12349334, most of
> them are making progress, I notified people on those where the status is
> unclear.
> - 80 test instabilities:
>
> https://issues.apache.org/jira/browse/FLINK-18117?filter=12348580&jql=project%20%3D%20FLINK%20AND%20resolution%20%3D%20Unresolved%20AND%20labels%20%3D%20test-stability%20ORDER%20BY%20updated%20DESC%2C%20created%20DESC
> - The CI system is a bit unstable these days: The e2e tests are often
> timing out. I will look into options to mitigate this.
>
>
>
> Drilling deeper into the test instabilities, these are some notable
> clusters of test instabilities  (with recent failures, usually more than
> once) [tests marked with >> have nobody assigned]
>
> E2E tests, probably all test infrastructure
> >> "Kerberized YARN per-job on Docker test" fails with "Could not start
> hadoop cluster." https://issues.apache.org/jira/browse/FLINK-18117
> >> SQL Client end-to-end test (Old planner) Elasticsearch (v7.5.1) failed
> due to download error https://issues.apache.org/jira/browse/FLINK-17424
> - "ES6 ElasticsearchSinkITCase unstable"
> https://issues.apache.org/jira/browse/FLINK-17159
> - "Avro Confluent Schema Registry nightly end-to-end test failed with
> "Register operation timed out; error code: 50002""
> https://issues.apache.org/jira/browse/FLINK-19422
> - "SQLClientHBaseITCase.testHBase fails on azure"
> https://issues.apache.org/jira/browse/FLINK-18570
>
> New Source API
> - "SplitFetcherTest.testNotifiesWhenGoingIdleConcurrent is instable"
> https://issues.apache.org/jira/browse/FLINK-19427
> >> "CoordinatedSourceITCase.testEnumeratorReaderCommunication hangs"
> https://issues.apache.org/jira/browse/FLINK-19448
> - "SplitFetcherTest.testNotifiesWhenGoingIdleConcurrent gets stuck"
> https://issues.apache.org/jira/browse/FLINK-19489
>
>
> Distributed Coordination
> - "LeaderChangeClusterComponentsTest.testReelectionOfJobMaster failed with
> "NoResourceAvailableException: Could not allocate the required slot within
> slot request timeout" https://issues.apache.org/jira/browse/FLINK-19237
> - "TaskExecutorSubmissionTest#testFailingScheduleOrUpdateConsumers"
> https://issues.apache.org/jira/browse/FLINK-17458
> - "ZooKeeperLeaderElectionITCase.testJobExecutionOnClusterWithLeaderChange
> times out" https://issues.apache.org/jira/browse/FLINK-19514
> - "ZooKeeperLeaderElectionITCase.testJobExecutionOnClusterWithLeaderChange:
> ZooKeeper unexpectedly modified"
> https://issues.apache.org/jira/browse/FLINK-19458
>
> Kafka
> >> "KafkaITCase failing with "Failed to send data to Kafka: This server
> does not host this topic-partition""
> https://issues.apache.org/jira/browse/FLINK-18444
> >> "KafkaShuffleITCase.testSerDeIngestionTime:156->testRecordSerDe:388
> expected:<310> but was:<0>"
> https://issues.apache.org/jira/browse/FLINK-17949
> - "KafkaITCase.testKeyValueSupport failure due to assertion error.""
> https://issues.apache.org/jira/browse/FLINK-15745
> - "KafkaITCase.testStartFromGroupOffsets times out on azure"
> https://issues.apache.org/jira/browse/FLINK-18648
> - "FlinkKafkaInternalProducerITCase.testHappyPath fails on Travis"
> https://issues.apache.org/jira/browse/FLINK-13733
>
>
>
> On Tue, Sep 29, 2020 at 11:49 AM Dian Fu <dian0511...@gmail.com> wrote:
>
> > Hi all,
> >
> > I'd like to update the status about the blocker issues and build
> > instabilities as there is only one month left and the number of blocker
> > issues increases a lot compared to last week.
> >
> > == Blockers:
> > https://issues.apache.org/jira/browse/FLINK-18682?filter=12349334 <
> > https://issues.apache.org/jira/browse/FLINK-18682?filter=12349334>
> >
> > Currently there are 10 blocker issues
> > - 3 performance regression (
> > https://issues.apache.org/jira/browse/FLINK-19439 <
> > https://issues.apache.org/jira/browse/FLINK-19439>,
> > https://issues.apache.org/jira/browse/FLINK-19440 <
> > https://issues.apache.org/jira/browse/FLINK-19440>,
> > https://issues.apache.org/jira/browse/FLINK-19441 <
> > https://issues.apache.org/jira/browse/FLINK-19441>)
> > - 3 Runtime (https://issues.apache.org/jira/browse/FLINK-19264 <
> > https://issues.apache.org/jira/browse/FLINK-19264>,
> > https://issues.apache.org/jira/browse/FLINK-19388 <
> > https://issues.apache.org/jira/browse/FLINK-19388>,
> > https://issues.apache.org/jira/browse/FLINK-19249 <
> > https://issues.apache.org/jira/browse/FLINK-19249>)
> > - 1 HBase connector (https://issues.apache.org/jira/browse/FLINK-19445 <
> > https://issues.apache.org/jira/browse/FLINK-19445>)
> > - 1 Application mode (https://issues.apache.org/jira/browse/FLINK-19154
> <
> > https://issues.apache.org/jira/browse/FLINK-19154>)
> > - 1 New source API (https://issues.apache.org/jira/browse/FLINK-19384 <
> > https://issues.apache.org/jira/browse/FLINK-19384>)
> > - 1 Kinesis (https://issues.apache.org/jira/browse/FLINK-19332 <
> > https://issues.apache.org/jira/browse/FLINK-19332>)
> >
> > == Recent notable build instabilities which still have no owners:
> > - New source API
> >    https://issues.apache.org/jira/browse/FLINK-19253 <
> > https://issues.apache.org/jira/browse/FLINK-19253>
> > SourceReaderTestBase.testAddSplitToExistingFetcher hangs
> >    https://issues.apache.org/jira/browse/FLINK-19370 <
> > https://issues.apache.org/jira/browse/FLINK-19370>
> > FileSourceTextLinesITCase.testContinuousTextFileSource failed as results
> > mismatch
> >    https://issues.apache.org/jira/browse/FLINK-19427 <
> > https://issues.apache.org/jira/browse/FLINK-19427>
> > SplitFetcherTest.testNotifiesWhenGoingIdleConcurrent is instable,
> >    https://issues.apache.org/jira/browse/FLINK-19437 <
> > https://issues.apache.org/jira/browse/FLINK-19437>
> > FileSourceTextLinesITCase.testContinuousTextFileSource failed with
> > "SimpleStreamFormat is not splittable, but found split end (0) different
> > from file length (198)"
> >    https://issues.apache.org/jira/browse/FLINK-19448 <
> > https://issues.apache.org/jira/browse/FLINK-19448>
> > CoordinatedSourceITCase.testEnumeratorReaderCommunication hangs
> > - Runtime/Network
> >    https://issues.apache.org/jira/browse/FLINK-19426 <
> > https://issues.apache.org/jira/browse/FLINK-19426>  End-to-end test
> > sometimes fails with PartitionConnectionException
> > - Unaligned Checkpoint
> >    https://issues.apache.org/jira/browse/FLINK-19027 <
> > https://issues.apache.org/jira/browse/FLINK-19027>
> >
> UnalignedCheckpointITCase.shouldPerformUnalignedCheckpointOnParallelRemoteChannel
> > failed because of test timeout
> > - Table
> >    https://issues.apache.org/jira/browse/FLINK-19340 <
> > https://issues.apache.org/jira/browse/FLINK-19340>
> > AggregateITCase.testListAggWithDistinct failed with "expected:<List(1,A,
> > 2,B, 3,C#A, 4,EF)> but was:<List(1,A, 2,B, 3,C#A, 4,EF#EF)>"
> > - HBase connector
> >    https://issues.apache.org/jira/browse/FLINK-18570 <
> > https://issues.apache.org/jira/browse/FLINK-18570>
> > SQLClientHBaseITCase.testHBase fails on azure
> >     https://issues.apache.org/jira/browse/FLINK-19447 <
> > https://issues.apache.org/jira/browse/FLINK-19447>
> > HBaseConnectorITCase.HBaseTestingClusterAutoStarter failed with "Master
> not
> > initialized after 200000ms"
> > - Avro
> >    https://issues.apache.org/jira/browse/FLINK-19422 <
> > https://issues.apache.org/jira/browse/FLINK-19422>  Avro Confluent
> Schema
> > Registry nightly end-to-end test failed with "Register operation timed
> out;
> > error code: 50002"
> >
> > Regards,
> > Dian
> >
> > > 在 2020年9月21日,下午2:32,Robert Metzger <rmetz...@apache.org> 写道:
> > >
> > > Hi all,
> > >
> > > An update on the release status:
> > > 1. We have 35 days = *5 weeks left until feature freeze*
> > > 2. There are currently 2 blockers for Flink
> > > <https://issues.apache.org/jira/browse/FLINK-19264?filter=12349334>,
> all
> > > making progress
> > > 3. We have 72 test instabilities
> > > <https://issues.apache.org/jira/browse/FLINK-19237> (down 7 from 2
> weeks
> > > ago). I have pinged people to help addressing frequent or critical
> > issues.
> > >
> > > Best,
> > > Robert
> > >
> > >
> > > On Mon, Sep 7, 2020 at 10:37 AM Robert Metzger <rmetz...@apache.org>
> > wrote:
> > >
> > >> Hi all,
> > >>
> > >> another two weeks have passed. We now have 5 blockers
> > >> <https://issues.apache.org/jira/browse/FLINK-18682?filter=12349334>
> (Up
> > >> 3 from 2 weeks ago), but they are all making progress.
> > >>
> > >> We currently have 79 test-instabilities
> > >> <https://issues.apache.org/jira/browse/FLINK-18869?filter=12348580>,
> > >> since the last report, a few have been resolved, and some others have
> > been
> > >> added.
> > >> I have checked the tickets, closed some old ones and pinged people to
> > help
> > >> resolve new or frequent ones.
> > >> Except for Kafka, there are no major clusters of test instabilities.
> > Most
> > >> failures are rarely failing tests across the entire system.
> > >>
> > >>
> > >> On Tue, Aug 25, 2020 at 9:05 AM Rui Li <lirui.fu...@gmail.com> wrote:
> > >>
> > >>> Thanks Dian for the pointer. I'll take a look.
> > >>>
> > >>> On Tue, Aug 25, 2020 at 3:02 PM Dian Fu <dian0511...@gmail.com>
> wrote:
> > >>>
> > >>>> Thanks Rui for the info. This issue(hive related)
> > >>>> https://issues.apache.org/jira/browse/FLINK-19025 <
> > >>>> https://issues.apache.org/jira/browse/FLINK-19025> is marked as a
> > >>> blocker.
> > >>>>
> > >>>> Regards,
> > >>>> Dian
> > >>>>
> > >>>>> 在 2020年8月25日,下午2:58,Rui Li <lirui.fu...@gmail.com> 写道:
> > >>>>>
> > >>>>> Hi Dian,
> > >>>>>
> > >>>>> FLINK-18682 has been fixed. Is there any other blocker in the hive
> > >>>>> connector?
> > >>>>>
> > >>>>> On Tue, Aug 25, 2020 at 2:41 PM Dian Fu <dian0511...@gmail.com
> > >>> <mailto:
> > >>>> dian0511...@gmail.com>> wrote:
> > >>>>>
> > >>>>>> Hi all,
> > >>>>>>
> > >>>>>> Two weeks have passed and it seems that none of the test
> stabilities
> > >>>>>> issues have been addressed since then.
> > >>>>>>
> > >>>>>> Here is an updated status report of Blockers and Test
> instabilities:
> > >>>>>>
> > >>>>>> Blockers <
> > >>>>>> https://issues.apache.org/jira/browse/FLINK-18682?filter=12349334
> <
> > >>>> https://issues.apache.org/jira/browse/FLINK-18682?filter=12349334>
> <
> > >>>>>> https://issues.apache.org/jira/browse/FLINK-18682?filter=12349334
> <
> > >>>> https://issues.apache.org/jira/browse/FLINK-18682?filter=12349334
> >>>:
> > >>>>>> Currently 2 blockers (1x Hive, 1x CI Infra)
> > >>>>>>
> > >>>>>> Test-Instabilities <
> > >>>>>> https://issues.apache.org/jira/browse/FLINK-18869?filter=12348580
> <
> > >>>> https://issues.apache.org/jira/browse/FLINK-18869?filter=12348580>
> <
> > >>>>>> https://issues.apache.org/jira/browse/FLINK-18869?filter=12348580
> <
> > >>>> https://issues.apache.org/jira/browse/FLINK-18869?filter=12348580
> >>>:
> > >>>>>> (total 80)
> > >>>>>>
> > >>>>>> Besides the issues already posted in previous mail, here are the
> new
> > >>>>>> instability issues which should be taken care of:
> > >>>>>>
> > >>>>>> - FLINK-19012 (https://issues.apache.org/jira/browse/FLINK-19012
> <
> > >>>> https://issues.apache.org/jira/browse/FLINK-19012> <
> > >>>>>> https://issues.apache.org/jira/browse/FLINK-19012 <
> > >>>> https://issues.apache.org/jira/browse/FLINK-19012>>)
> > >>>>>> E2E test fails with "Cannot register Closeable, this
> > >>>>>> subtaskCheckpointCoordinator is already closed. Closing argument."
> > >>>>>>
> > >>>>>> -> This is a new issue occurred recently. It has occurred several
> > >>> times
> > >>>>>> and may indicate a bug somewhere and should be taken care of.
> > >>>>>>
> > >>>>>> - FLINK-9992 (https://issues.apache.org/jira/browse/FLINK-9992 <
> > >>>> https://issues.apache.org/jira/browse/FLINK-9992> <
> > >>>>>> https://issues.apache.org/jira/browse/FLINK-9992 <
> > >>>> https://issues.apache.org/jira/browse/FLINK-9992>>)
> > >>>>>> FsStorageLocationReferenceTest#testEncodeAndDecode failed in CI
> > >>>>>>
> > >>>>>> -> There is already a PR for it and needs review.
> > >>>>>>
> > >>>>>> - FLINK-18842 (https://issues.apache.org/jira/browse/FLINK-18842
> <
> > >>>> https://issues.apache.org/jira/browse/FLINK-18842> <
> > >>>>>> https://issues.apache.org/jira/browse/FLINK-18842 <
> > >>>> https://issues.apache.org/jira/browse/FLINK-18842>>)
> > >>>>>> e2e test failed to download "localhost:9999/flink.tgz" in
> "Wordcount
> > >>> on
> > >>>>>> Docker test"
> > >>>>>>
> > >>>>>>
> > >>>>>>> 在 2020年8月11日,下午2:08,Robert Metzger <rmetz...@apache.org> 写道:
> > >>>>>>>
> > >>>>>>> Hi team,
> > >>>>>>>
> > >>>>>>> 2 weeks have passed since the last update. None of the test
> > >>> stabilities
> > >>>>>>> I've mentioned have been addressed since then.
> > >>>>>>>
> > >>>>>>> Here's an updated status report of Blockers and Test
> instabilities:
> > >>>>>>>
> > >>>>>>> Blockers <
> > >>>>>> https://issues.apache.org/jira/browse/FLINK-18682?filter=12349334
> >:
> > >>>>>>> Currently 3 blockers (2x Hive, 1x CI Infra)
> > >>>>>>>
> > >>>>>>> Test-Instabilities
> > >>>>>>> <
> https://issues.apache.org/jira/browse/FLINK-18869?filter=12348580
> > >
> > >>>>>> (total
> > >>>>>>> 79) which failed recently or frequently:
> > >>>>>>>
> > >>>>>>>
> > >>>>>>> - FLINK-18807 <https://issues.apache.org/jira/browse/FLINK-18807
> >
> > >>>>>>> FlinkKafkaProducerITCase.testScaleUpAfterScalingDown
> > >>>>>>> failed with "Timeout expired after 60000milliseconds while
> awaiting
> > >>>>>>> EndTxn(COMMIT)"
> > >>>>>>>
> > >>>>>>> - FLINK-18634 <https://issues.apache.org/jira/browse/FLINK-18634
> >
> > >>>>>>> FlinkKafkaProducerITCase.testRecoverCommittedTransaction
> > >>>>>>> failed with "Timeout expired after 60000milliseconds while
> awaiting
> > >>>>>>> InitProducerId"
> > >>>>>>>
> > >>>>>>> - FLINK-16908 <https://issues.apache.org/jira/browse/FLINK-16908
> >
> > >>>>>>> FlinkKafkaProducerITCase
> > >>>>>>> testScaleUpAfterScalingDown Timeout expired while initializing
> > >>>>>>> transactional state in 60000ms.
> > >>>>>>>
> > >>>>>>> - FLINK-13733 <https://issues.apache.org/jira/browse/FLINK-13733
> >
> > >>>>>>> FlinkKafkaInternalProducerITCase.testHappyPath fails on Travis
> > >>>>>>>
> > >>>>>>> --> The first three tickets seem related.
> > >>>>>>>
> > >>>>>>>
> > >>>>>>> - FLINK-17260 <https://issues.apache.org/jira/browse/FLINK-17260
> >
> > >>>>>>> StreamingKafkaITCase failure on Azure
> > >>>>>>>
> > >>>>>>> --> This one seems really hard to reproduce
> > >>>>>>>
> > >>>>>>>
> > >>>>>>> - FLINK-16768 <https://issues.apache.org/jira/browse/FLINK-16768
> >
> > >>>>>>> HadoopS3RecoverableWriterITCase.testRecoverWithStateWithMultiPart
> > >>>>>>> hangs
> > >>>>>>>
> > >>>>>>> - FLINK-18374 <https://issues.apache.org/jira/browse/FLINK-18374
> >
> > >>>>>>>
> > >>>>>>
> > >>>>
> > >>>
> >
> HadoopS3RecoverableWriterITCase.testRecoverAfterMultiplePersistsStateWithMultiPart
> > >>>>>>> produced no output for 900 seconds
> > >>>>>>>
> > >>>>>>> --> nobody seems to feel responsible for these tickets. My guess
> is
> > >>>> that
> > >>>>>>> the S3 connector should have shorter timeouts / faster retries to
> > >>>> finish
> > >>>>>>> within the 15 minutes test timeout. OR there is really something
> > >>> wrong
> > >>>>>> with
> > >>>>>>> the code.
> > >>>>>>>
> > >>>>>>>
> > >>>>>>> - FLINK-18333 UnsignedTypeConversionITCase failed caused by
> > >>> MariaDB4j
> > >>>>>>> "Asked to waitFor Program"
> > >>>>>>> <https://issues.apache.org/jira/browse/FLINK-18333>
> > >>>>>>> <https://issues.apache.org/jira/browse/FLINK-18333>- FLINK-17159
> > >>>>>>> <https://issues.apache.org/jira/browse/FLINK-17159> ES6
> > >>>>>>> ElasticsearchSinkITCase unstable
> > >>>>>>>
> > >>>>>>> - FLINK-17949 <https://issues.apache.org/jira/browse/FLINK-17949
> >
> > >>>>>>>
> KafkaShuffleITCase.testSerDeIngestionTime:156->testRecordSerDe:388
> > >>>>>>> expected:<310> but was:<0>
> > >>>>>>>
> > >>>>>>> - FLINK-18222 <https://issues.apache.org/jira/browse/FLINK-18222
> >
> > >>>> "Avro
> > >>>>>>> Confluent Schema Registry nightly end-to-end test" unstable with
> > >>> "Kafka
> > >>>>>>> cluster did not start after 120 seconds"
> > >>>>>>>
> > >>>>>>> - FLINK-17511 <https://issues.apache.org/jira/browse/FLINK-17511
> >
> > >>>>>> "RocksDB
> > >>>>>>> Memory Management end-to-end test" fails with "Current block
> cache
> > >>>> usage
> > >>>>>>> 202123272 larger than expected memory limit 200000000"
> > >>>>>>>
> > >>>>>>>
> > >>>>>>>
> > >>>>>>>
> > >>>>>>> On Mon, Jul 27, 2020 at 8:42 PM Robert Metzger <
> > rmetz...@apache.org
> > >>>>
> > >>>>>> wrote:
> > >>>>>>>
> > >>>>>>>> Hi team,
> > >>>>>>>>
> > >>>>>>>> We would like to use this thread as a permanent thread for
> > >>>>>>>> regularly syncing on stale blockers (need to have somebody
> > assigned
> > >>>>>> within
> > >>>>>>>> a week and progress, or a good plan) and build instabilities
> (need
> > >>> to
> > >>>>>> check
> > >>>>>>>> if its a blocker).
> > >>>>>>>>
> > >>>>>>>> Recent test-instabilities:
> > >>>>>>>>
> > >>>>>>>> - https://issues.apache.org/jira/browse/FLINK-17159 (ES6 test)
> > >>>>>>>> - https://issues.apache.org/jira/browse/FLINK-16768 (s3 test
> > >>>>>> unstable)
> > >>>>>>>> - https://issues.apache.org/jira/browse/FLINK-18374 (s3 test
> > >>>>>> unstable)
> > >>>>>>>> - https://issues.apache.org/jira/browse/FLINK-17949
> > >>>>>>>> (KafkaShuffleITCase)
> > >>>>>>>> - https://issues.apache.org/jira/browse/FLINK-18634 (Kafka
> > >>>>>>>> transactions)
> > >>>>>>>>
> > >>>>>>>>
> > >>>>>>>> It would be nice if the committers taking care of these
> components
> > >>>> could
> > >>>>>>>> look into the test failures.
> > >>>>>>>> If nothing happens, we'll personally reach out to people I
> believe
> > >>>> they
> > >>>>>>>> could look into the ticket.
> > >>>>>>>>
> > >>>>>>>> Best,
> > >>>>>>>> Dian & Robert
> > >>>>>>>>
> > >>>>>>
> > >>>>>>
> > >>>>>
> > >>>>> --
> > >>>>> Best regards!
> > >>>>> Rui Li
> > >>>>
> > >>>>
> > >>>
> > >>> --
> > >>> Best regards!
> > >>> Rui Li
> > >>>
> > >>
> >
> >
>

Reply via email to