> - For 1.10.1 I am not completely sure, because users expect to upgrade > that without config adjustments. That might not be possible with that > change.
Ok, makes sense, I will revert it for 1.10 and only try to improve error message and docs. > On 12 Mar 2020, at 13:15, Stephan Ewen <se...@apache.org> wrote: > > @Andrey about the increase in metaspace size > - I have no concerns for 1.11.0. > - For 1.10.1 I am not completely sure, because users expect to upgrade > that without config adjustments. That might not be possible with that > change. > > On Thu, Mar 12, 2020 at 12:55 PM Andrey Zagrebin <azagrebin.apa...@gmail.com> > wrote: > >> >>> About "FLINK-16142 Memory Leak causes Metaspace OOM error on repeated >> job” >> >> My understanding that the issue is basically covered by: >> >> - [FLINK-16225] Metaspace Out Of Memory should be handled as Fatal Error >> in TaskManager >> no full consensus there but improving error message for existing task >> thread fatal handling could be done at least >> >> - [FLINK-16406] Increase default value for JVM Metaspace to minimise its >> OutOfMemoryError >> see further >> >> - [FLINK-16246] Exclude "SdkMBeanRegistrySupport" from dynamically loaded >> AWS connectors >> not sure whether this is a blocker but looks close to be resolved >> >>> About "FLINK-16406 Increase default value for JVM Metaspace" >>> - Have we consensus that this is okay for a bugfix release? It changes >>> setups, takes away memory from heap / managed memory on existing setups >>> that keep their flink-conf.yaml. >> >> My understanding was that increasing to 256m resolved the reported problems >> and we decided to make the change so I have merged it today as there were >> no more concerns. >> If there are concerns I can revert it. >> >> On the other hand, I think improving the message error with reference to >> the metaspace option should help the most >> because user would not have to read all docs to fix it >> then maybe this change is not even needed. >> >> Best, >> Andrey >> >>> On 12 Mar 2020, at 12:28, Stephan Ewen <se...@apache.org> wrote: >>> >>> Good idea to go ahead with 1.10.1 >>> >>> About "FLINK-16142 Memory Leak causes Metaspace OOM error on repeated >> job" >>> - I don't think we have consensus on the exact solution, yet, and some >> of >>> the changes might also have side effects that are hard to predict, so I >> am >>> not sure we should rush this in. >>> >>> About "FLINK-16406 Increase default value for JVM Metaspace" >>> - Have we consensus that this is okay for a bugfix release? It changes >>> setups, takes away memory from heap / managed memory on existing setups >>> that keep their flink-conf.yaml. >>> >>> We may need to unblock the release form these two issues and think about >>> having 1.10.2 in the near future. >>> >>> On Thu, Mar 12, 2020 at 7:15 AM Yu Li <car...@gmail.com> wrote: >>> >>>> Thanks for the reminder Jark. Will keep an eye on these two. >>>> >>>> Best Regards, >>>> Yu >>>> >>>> >>>> On Thu, 12 Mar 2020 at 12:32, Jark Wu <imj...@gmail.com> wrote: >>>> >>>>> Thanks for driving this release, Yu! >>>>> +1 to start 1.10.1 release cycle. >>>>> >>>>> From the Table SQL module, I think we should also try to get in the >>>>> following issues: >>>>> - FLINK-16441: Allow users to override flink-conf parameters from SQL >> CLI >>>>> environment >>>>> this allows users to set e.g. statebackend, watermark interval, >>>>> exactly-once/at-least-once, in the SQL CLI >>>>> - FLINK-16217: SQL Client crashed when any uncatched exception is >> thrown >>>>> this will improve much experience when using SQL CLI >>>>> >>>>> Best, >>>>> Jark >>>>> >>>>> >>>>> On Wed, 11 Mar 2020 at 20:37, Yu Li <car...@gmail.com> wrote: >>>>> >>>>>> Thanks for the suggestion Andrey! I've added 1.10.1 into FLINK-16225 >>>> fix >>>>>> versions and promoted its priority to Critical. Will also watch the >>>>>> progress of FLINK-16108/FLINK-16408. >>>>>> >>>>>> Best Regards, >>>>>> Yu >>>>>> >>>>>> >>>>>> On Wed, 11 Mar 2020 at 18:18, Andrey Zagrebin <azagre...@apache.org> >>>>>> wrote: >>>>>> >>>>>>> Hi Yu, >>>>>>> >>>>>>> Thanks for kicking off the 1.10.1 release discussion! >>>>>>> >>>>>>> Apart from >>>>>>> - FLINK-16406 Increase default value for JVM Metaspace to minimise >>>> its >>>>>>> OutOfMemoryError >>>>>>> which should be merged soon >>>>>>> >>>>>>> I think we should also try to get in the following issues: >>>>>>> >>>>>>> - [FLINK-16225] Metaspace Out Of Memory should be handled as Fatal >>>>> Error >>>>>> in >>>>>>> TaskManager >>>>>>> This should solve the Metaspace problem even in a better way because >>>>> OOM >>>>>>> failure should point users to the docs immediately >>>>>>> >>>>>>> - [FLINK-16408] Bind user code class loader to lifetime of a slot >>>>>>> This should give a better protection against class loading leaks >>>>>>> >>>>>>> - [FLINK-16018] Improve error reporting when submitting batch job >>>>>> (instead >>>>>>> of AskTimeoutException) >>>>>>> This problem has recently happened for multiple users >>>>>>> >>>>>>> Best, >>>>>>> Andrey >>>>>>> >>>>>>> >>>>>>> On Wed, Mar 11, 2020 at 8:46 AM Jingsong Li <jingsongl...@gmail.com> >>>>>>> wrote: >>>>>>> >>>>>>>> Thanks for driving. Yu. +1 for starting the 1.10.1 release. >>>>>>>> >>>>>>>> Some issues are very important, Users are looking forward to them. >>>>>>>> >>>>>>>> Best, >>>>>>>> Jingsong Lee >>>>>>>> >>>>>>>> On Wed, Mar 11, 2020 at 2:52 PM Yangze Guo <karma...@gmail.com> >>>>> wrote: >>>>>>>> >>>>>>>>> Thanks for driving this release, Yu! >>>>>>>>> >>>>>>>>> +1 for starting the 1.10.1 release cycle. >>>>>>>>> >>>>>>>>> Best, >>>>>>>>> Yangze Guo >>>>>>>>> >>>>>>>>> On Wed, Mar 11, 2020 at 1:42 PM Xintong Song < >>>>> tonysong...@gmail.com> >>>>>>>>> wrote: >>>>>>>>>> >>>>>>>>>> Yu, >>>>>>>>>> Thanks for the explanation. >>>>>>>>>> I've no concerns. I was just trying to get some inputs for >>>>>>> prioritizing >>>>>>>>>> tasks on my side, and ~1month sounds good to me. >>>>>>>>>> >>>>>>>>>> >>>>>>>>>> Thank you~ >>>>>>>>>> >>>>>>>>>> Xintong Song >>>>>>>>>> >>>>>>>>>> >>>>>>>>>> >>>>>>>>>> On Wed, Mar 11, 2020 at 12:15 PM Yu Li <car...@gmail.com> >>>> wrote: >>>>>>>>>> >>>>>>>>>>> bq. what is the time plan for 1.10.1? >>>>>>>>>>> >>>>>>>>>>> According to the history, the first patch release of a major >>>>>>> version >>>>>>>>> will >>>>>>>>>>> take ~1month from discussion started, depending on the speed >>>> of >>>>>>>> blocker >>>>>>>>>>> issue resolving: >>>>>>>>>>> * 1.8.1: started discussion on May 28th [1], released on Jul >>>>> 3rd >>>>>>> [2] >>>>>>>>>>> * 1.9.1: started discussion on Sep 23rd [3], released on Oct >>>>> 19th >>>>>>> [4] >>>>>>>>>>> >>>>>>>>>>> We won't rush to match the history of course, but could use >>>> it >>>>>> as a >>>>>>>>>>> reference. And please feel free to let me know if any >>>> concerns >>>>>>>> Xintong. >>>>>>>>>>> Thanks. >>>>>>>>>>> >>>>>>>>>>> Best Regards, >>>>>>>>>>> Yu >>>>>>>>>>> >>>>>>>>>>> [1] >>>>>>>>>>> >>>>>>>>>>> >>>>>>>>> >>>>>>>> >>>>>>> >>>>>> >>>>> >>>> >> http://apache-flink-mailing-list-archive.1008284.n3.nabble.com/DISCUSS-Releasing-Flink-1-8-1-td29154.html >>>>>>>>>>> [2] >>>>>>>>>>> >>>>>>>>>>> >>>>>>>>> >>>>>>>> >>>>>>> >>>>>> >>>>> >>>> >> http://apache-flink-mailing-list-archive.1008284.n3.nabble.com/ANNOUNCE-Apache-Flink-1-8-1-released-td30124.html >>>>>>>>>>> [3] >>>>>>>>>>> >>>>>>>>>>> >>>>>>>>> >>>>>>>> >>>>>>> >>>>>> >>>>> >>>> >> http://apache-flink-mailing-list-archive.1008284.n3.nabble.com/DISCUSS-Releasing-Flink-1-9-1-td33343.html >>>>>>>>>>> [4] >>>>>>>>>>> >>>>>>>>>>> >>>>>>>>> >>>>>>>> >>>>>>> >>>>>> >>>>> >>>> >> http://apache-flink-mailing-list-archive.1008284.n3.nabble.com/ANNOUNCE-Apache-Flink-1-9-1-released-td34170.html >>>>>>>>>>> >>>>>>>>>>> >>>>>>>>>>> On Wed, 11 Mar 2020 at 11:54, Xintong Song < >>>>>> tonysong...@gmail.com> >>>>>>>>> wrote: >>>>>>>>>>> >>>>>>>>>>>> Thanks Yu, for the kick off and volunteering to be the >>>>> release >>>>>>>>> manager. >>>>>>>>>>>> >>>>>>>>>>>> +1 for the proposal. >>>>>>>>>>>> >>>>>>>>>>>> >>>>>>>>>>>> One quick question, what is the time plan for 1.10.1? >>>>>>>>>>>> >>>>>>>>>>>> >>>>>>>>>>>> Thank you~ >>>>>>>>>>>> >>>>>>>>>>>> Xintong Song >>>>>>>>>>>> >>>>>>>>>>>> >>>>>>>>>>>> >>>>>>>>>>>> On Wed, Mar 11, 2020 at 11:51 AM Zhijiang >>>>>>>>>>>> <wangzhijiang...@aliyun.com.invalid> wrote: >>>>>>>>>>>> >>>>>>>>>>>>> Thanks for driving this release, Yu! >>>>>>>>>>>>> +1 on my side >>>>>>>>>>>>> >>>>>>>>>>>>> >>>>>>>>>>>>> >>>>>>>>>>>>> >>>>>>>>>>>>> >>>>>>>>>>>>> >>>>>>>>>>>>> >>>>>>>>>>>>> Best, >>>>>>>>>>>>> Zhijiang >>>>>>>>>>>>> >>>>>>>>>>>>> >>>>>>>>>>>>> >>>>>>>> ------------------------------------------------------------------ >>>>>>>>>>>>> From:Yu Li <car...@gmail.com> >>>>>>>>>>>>> Send Time:2020 Mar. 10 (Tue.) 20:25 >>>>>>>>>>>>> To:dev <dev@flink.apache.org> >>>>>>>>>>>>> Subject:Re: [DISCUSS] Releasing Flink 1.10.1 >>>>>>>>>>>>> >>>>>>>>>>>>> Thanks for the supplement Hequn. Yes will also keep an >>>> eye >>>>> on >>>>>>>> these >>>>>>>>>>>>> existing blocker issues. >>>>>>>>>>>>> >>>>>>>>>>>>> Best Regards, >>>>>>>>>>>>> Yu >>>>>>>>>>>>> >>>>>>>>>>>>> >>>>>>>>>>>>> On Tue, 10 Mar 2020 at 19:10, Hequn Cheng < >>>>> he...@apache.org> >>>>>>>>> wrote: >>>>>>>>>>>>> >>>>>>>>>>>>>> Hi Yu, >>>>>>>>>>>>>> >>>>>>>>>>>>>> Thanks a lot for raising the discussion and volunteer >>>> as >>>>>> the >>>>>>>>> release >>>>>>>>>>>>>> manager! >>>>>>>>>>>>>> >>>>>>>>>>>>>> I found there are some other issues[1] which are marked >>>>> as >>>>>> a >>>>>>>>> blocker: >>>>>>>>>>>>>> - FLINK-16454 Update the copyright year in NOTICE files >>>>>>>>>>>>>> - FLINK-16262 Class loader problem with >>>>>>>>>>>>>> FlinkKafkaProducer.Semantic.EXACTLY_ONCE and usrlib >>>>>> directory >>>>>>>>>>>>>> - FLINK-16170 SearchTemplateRequest >>>>> ClassNotFoundException >>>>>>> when >>>>>>>>> use >>>>>>>>>>>>>> flink-sql-connector-elasticsearch7 >>>>>>>>>>>>>> - FLINK-16018 Improve error reporting when submitting >>>>> batch >>>>>>> job >>>>>>>>>>>> (instead >>>>>>>>>>>>> of >>>>>>>>>>>>>> AskTimeoutException) >>>>>>>>>>>>>> >>>>>>>>>>>>>> These may also need to be resolved in 1.10.1. >>>>>>>>>>>>>> >>>>>>>>>>>>>> Best, >>>>>>>>>>>>>> Hequn >>>>>>>>>>>>>> >>>>>>>>>>>>>> [1] >>>>>>>>> https://issues.apache.org/jira/projects/FLINK/versions/12346891 >>>>>>>>>>>>>> >>>>>>>>>>>>>> >>>>>>>>>>>>>> On Tue, Mar 10, 2020 at 6:48 PM Yu Li < >>>> car...@gmail.com> >>>>>>>> wrote: >>>>>>>>>>>>>> >>>>>>>>>>>>>>> Hi Jincheng, >>>>>>>>>>>>>>> >>>>>>>>>>>>>>> Yes, your help would be very helpful. Thanks a lot! >>>>>>>>>>>>>>> >>>>>>>>>>>>>>> Best Regards, >>>>>>>>>>>>>>> Yu >>>>>>>>>>>>>>> >>>>>>>>>>>>>>> >>>>>>>>>>>>>>> On Tue, 10 Mar 2020 at 18:24, jincheng sun < >>>>>>>>>>> sunjincheng...@gmail.com >>>>>>>>>>>>> >>>>>>>>>>>>>>> wrote: >>>>>>>>>>>>>>> >>>>>>>>>>>>>>>> Thanks for bring up the discussion Yu. I would like >>>>> to >>>>>>> give >>>>>>>>> you a >>>>>>>>>>>>> hand >>>>>>>>>>>>>> at >>>>>>>>>>>>>>>> the last stage when the RC is finished.(If you >>>> need) >>>>>> :) >>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>> Best, >>>>>>>>>>>>>>>> Jincheng >>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>> Yu Li <car...@gmail.com> 于2020年3月10日周二 下午5:49写道: >>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>> Hi All, >>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>> It has been almost one month since we released >>>>> Flink >>>>>>>>> 1.10.0. We >>>>>>>>>>>>>> already >>>>>>>>>>>>>>>>> have more than 40 resolved improvements/bugs in >>>> the >>>>>>>>>>> release-1.10 >>>>>>>>>>>>>>> branch, >>>>>>>>>>>>>>>>> and I propose to start the 1.10.1 release cycle. >>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>> Most noticeable fixes are: >>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>> - FLINK-16241 [legal] Remove the license and >>>> notice >>>>>>> file >>>>>>>> in >>>>>>>>>>>>>>> flink-ml-lib >>>>>>>>>>>>>>>>> module >>>>>>>>>>>>>>>>> - FLINK-16313 Fix RocksDB resource leak in >>>>>>>>>>>>> flink-state-processor-api >>>>>>>>>>>>>>>>> - FLINK-16161 Statistics zero should be known in >>>>>>>>> HiveCatalog >>>>>>>>>>>>>>>>> - FLINK-2336 ArrayIndexOufOBoundsException in >>>>>>>> TypeExtractor >>>>>>>>>>> when >>>>>>>>>>>>>>> mapping >>>>>>>>>>>>>>>>> - FLINK-16108 StreamSQLExample is failed if >>>> running >>>>>> in >>>>>>>>> blink >>>>>>>>>>>>> planner >>>>>>>>>>>>>>>>> - FLINK-16139 Co-location constraints are not >>>> reset >>>>>> on >>>>>>>> task >>>>>>>>>>>>> recovery >>>>>>>>>>>>>> in >>>>>>>>>>>>>>>>> DefaultScheduler >>>>>>>>>>>>>>>>> - FLINK-16414 Create udaf/udtf function using sql >>>>>>> casuing >>>>>>>>>>>>>>>>> ValidationException: SQL validation failed >>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>> Furthermore, I think the following issues should >>>> be >>>>>>>> merged >>>>>>>>>>> before >>>>>>>>>>>>>>> 1.10.1 >>>>>>>>>>>>>>>>> release (especially the Metaspace OOM issue): >>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>> - FLINK-16142 Memory Leak causes Metaspace OOM >>>>> error >>>>>> on >>>>>>>>>>> repeated >>>>>>>>>>>>> job >>>>>>>>>>>>>>>>> submission >>>>>>>>>>>>>>>>> - FLINK-16406 Increase default value for JVM >>>>>> Metaspace >>>>>>> to >>>>>>>>>>>> minimise >>>>>>>>>>>>>> its >>>>>>>>>>>>>>>>> OutOfMemoryError >>>>>>>>>>>>>>>>> - FLINK-16047 Blink planner produces wrong >>>>> aggregate >>>>>>>>> results >>>>>>>>>>> with >>>>>>>>>>>>>> state >>>>>>>>>>>>>>>>> clean up >>>>>>>>>>>>>>>>> - FLINK-16070 Blink planner can not extract >>>> correct >>>>>>>> unique >>>>>>>>> key >>>>>>>>>>>> for >>>>>>>>>>>>>>>>> UpsertStreamTableSink >>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>> I would volunteer as the release manager and kick >>>>> off >>>>>>> the >>>>>>>>>>> release >>>>>>>>>>>>>>> process >>>>>>>>>>>>>>>>> once blocker issues are merged. What do you >>>> think? >>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>> If there are any concerns or missing blocker >>>> issues >>>>>>> need >>>>>>>>> to be >>>>>>>>>>>>> fixed >>>>>>>>>>>>>> in >>>>>>>>>>>>>>>>> 1.10.1, please let me know. Thanks. >>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>> Best Regards, >>>>>>>>>>>>>>>>> Yu >>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>> >>>>>>>>>>>>>>> >>>>>>>>>>>>>> >>>>>>>>>>>>> >>>>>>>>>>>> >>>>>>>>>>> >>>>>>>>> >>>>>>>> >>>>>>>> >>>>>>>> -- >>>>>>>> Best, Jingsong Lee >>>>>>>> >>>>>>> >>>>>> >>>>> >>>> >> >>