@Andrey about the increase in metaspace size - I have no concerns for 1.11.0. - For 1.10.1 I am not completely sure, because users expect to upgrade that without config adjustments. That might not be possible with that change.
On Thu, Mar 12, 2020 at 12:55 PM Andrey Zagrebin <azagrebin.apa...@gmail.com> wrote: > > > About "FLINK-16142 Memory Leak causes Metaspace OOM error on repeated > job” > > My understanding that the issue is basically covered by: > > - [FLINK-16225] Metaspace Out Of Memory should be handled as Fatal Error > in TaskManager > no full consensus there but improving error message for existing task > thread fatal handling could be done at least > > - [FLINK-16406] Increase default value for JVM Metaspace to minimise its > OutOfMemoryError > see further > > - [FLINK-16246] Exclude "SdkMBeanRegistrySupport" from dynamically loaded > AWS connectors > not sure whether this is a blocker but looks close to be resolved > > > About "FLINK-16406 Increase default value for JVM Metaspace" > > - Have we consensus that this is okay for a bugfix release? It changes > > setups, takes away memory from heap / managed memory on existing setups > > that keep their flink-conf.yaml. > > My understanding was that increasing to 256m resolved the reported problems > and we decided to make the change so I have merged it today as there were > no more concerns. > If there are concerns I can revert it. > > On the other hand, I think improving the message error with reference to > the metaspace option should help the most > because user would not have to read all docs to fix it > then maybe this change is not even needed. > > Best, > Andrey > > > On 12 Mar 2020, at 12:28, Stephan Ewen <se...@apache.org> wrote: > > > > Good idea to go ahead with 1.10.1 > > > > About "FLINK-16142 Memory Leak causes Metaspace OOM error on repeated > job" > > - I don't think we have consensus on the exact solution, yet, and some > of > > the changes might also have side effects that are hard to predict, so I > am > > not sure we should rush this in. > > > > About "FLINK-16406 Increase default value for JVM Metaspace" > > - Have we consensus that this is okay for a bugfix release? It changes > > setups, takes away memory from heap / managed memory on existing setups > > that keep their flink-conf.yaml. > > > > We may need to unblock the release form these two issues and think about > > having 1.10.2 in the near future. > > > > On Thu, Mar 12, 2020 at 7:15 AM Yu Li <car...@gmail.com> wrote: > > > >> Thanks for the reminder Jark. Will keep an eye on these two. > >> > >> Best Regards, > >> Yu > >> > >> > >> On Thu, 12 Mar 2020 at 12:32, Jark Wu <imj...@gmail.com> wrote: > >> > >>> Thanks for driving this release, Yu! > >>> +1 to start 1.10.1 release cycle. > >>> > >>> From the Table SQL module, I think we should also try to get in the > >>> following issues: > >>> - FLINK-16441: Allow users to override flink-conf parameters from SQL > CLI > >>> environment > >>> this allows users to set e.g. statebackend, watermark interval, > >>> exactly-once/at-least-once, in the SQL CLI > >>> - FLINK-16217: SQL Client crashed when any uncatched exception is > thrown > >>> this will improve much experience when using SQL CLI > >>> > >>> Best, > >>> Jark > >>> > >>> > >>> On Wed, 11 Mar 2020 at 20:37, Yu Li <car...@gmail.com> wrote: > >>> > >>>> Thanks for the suggestion Andrey! I've added 1.10.1 into FLINK-16225 > >> fix > >>>> versions and promoted its priority to Critical. Will also watch the > >>>> progress of FLINK-16108/FLINK-16408. > >>>> > >>>> Best Regards, > >>>> Yu > >>>> > >>>> > >>>> On Wed, 11 Mar 2020 at 18:18, Andrey Zagrebin <azagre...@apache.org> > >>>> wrote: > >>>> > >>>>> Hi Yu, > >>>>> > >>>>> Thanks for kicking off the 1.10.1 release discussion! > >>>>> > >>>>> Apart from > >>>>> - FLINK-16406 Increase default value for JVM Metaspace to minimise > >> its > >>>>> OutOfMemoryError > >>>>> which should be merged soon > >>>>> > >>>>> I think we should also try to get in the following issues: > >>>>> > >>>>> - [FLINK-16225] Metaspace Out Of Memory should be handled as Fatal > >>> Error > >>>> in > >>>>> TaskManager > >>>>> This should solve the Metaspace problem even in a better way because > >>> OOM > >>>>> failure should point users to the docs immediately > >>>>> > >>>>> - [FLINK-16408] Bind user code class loader to lifetime of a slot > >>>>> This should give a better protection against class loading leaks > >>>>> > >>>>> - [FLINK-16018] Improve error reporting when submitting batch job > >>>> (instead > >>>>> of AskTimeoutException) > >>>>> This problem has recently happened for multiple users > >>>>> > >>>>> Best, > >>>>> Andrey > >>>>> > >>>>> > >>>>> On Wed, Mar 11, 2020 at 8:46 AM Jingsong Li <jingsongl...@gmail.com> > >>>>> wrote: > >>>>> > >>>>>> Thanks for driving. Yu. +1 for starting the 1.10.1 release. > >>>>>> > >>>>>> Some issues are very important, Users are looking forward to them. > >>>>>> > >>>>>> Best, > >>>>>> Jingsong Lee > >>>>>> > >>>>>> On Wed, Mar 11, 2020 at 2:52 PM Yangze Guo <karma...@gmail.com> > >>> wrote: > >>>>>> > >>>>>>> Thanks for driving this release, Yu! > >>>>>>> > >>>>>>> +1 for starting the 1.10.1 release cycle. > >>>>>>> > >>>>>>> Best, > >>>>>>> Yangze Guo > >>>>>>> > >>>>>>> On Wed, Mar 11, 2020 at 1:42 PM Xintong Song < > >>> tonysong...@gmail.com> > >>>>>>> wrote: > >>>>>>>> > >>>>>>>> Yu, > >>>>>>>> Thanks for the explanation. > >>>>>>>> I've no concerns. I was just trying to get some inputs for > >>>>> prioritizing > >>>>>>>> tasks on my side, and ~1month sounds good to me. > >>>>>>>> > >>>>>>>> > >>>>>>>> Thank you~ > >>>>>>>> > >>>>>>>> Xintong Song > >>>>>>>> > >>>>>>>> > >>>>>>>> > >>>>>>>> On Wed, Mar 11, 2020 at 12:15 PM Yu Li <car...@gmail.com> > >> wrote: > >>>>>>>> > >>>>>>>>> bq. what is the time plan for 1.10.1? > >>>>>>>>> > >>>>>>>>> According to the history, the first patch release of a major > >>>>> version > >>>>>>> will > >>>>>>>>> take ~1month from discussion started, depending on the speed > >> of > >>>>>> blocker > >>>>>>>>> issue resolving: > >>>>>>>>> * 1.8.1: started discussion on May 28th [1], released on Jul > >>> 3rd > >>>>> [2] > >>>>>>>>> * 1.9.1: started discussion on Sep 23rd [3], released on Oct > >>> 19th > >>>>> [4] > >>>>>>>>> > >>>>>>>>> We won't rush to match the history of course, but could use > >> it > >>>> as a > >>>>>>>>> reference. And please feel free to let me know if any > >> concerns > >>>>>> Xintong. > >>>>>>>>> Thanks. > >>>>>>>>> > >>>>>>>>> Best Regards, > >>>>>>>>> Yu > >>>>>>>>> > >>>>>>>>> [1] > >>>>>>>>> > >>>>>>>>> > >>>>>>> > >>>>>> > >>>>> > >>>> > >>> > >> > http://apache-flink-mailing-list-archive.1008284.n3.nabble.com/DISCUSS-Releasing-Flink-1-8-1-td29154.html > >>>>>>>>> [2] > >>>>>>>>> > >>>>>>>>> > >>>>>>> > >>>>>> > >>>>> > >>>> > >>> > >> > http://apache-flink-mailing-list-archive.1008284.n3.nabble.com/ANNOUNCE-Apache-Flink-1-8-1-released-td30124.html > >>>>>>>>> [3] > >>>>>>>>> > >>>>>>>>> > >>>>>>> > >>>>>> > >>>>> > >>>> > >>> > >> > http://apache-flink-mailing-list-archive.1008284.n3.nabble.com/DISCUSS-Releasing-Flink-1-9-1-td33343.html > >>>>>>>>> [4] > >>>>>>>>> > >>>>>>>>> > >>>>>>> > >>>>>> > >>>>> > >>>> > >>> > >> > http://apache-flink-mailing-list-archive.1008284.n3.nabble.com/ANNOUNCE-Apache-Flink-1-9-1-released-td34170.html > >>>>>>>>> > >>>>>>>>> > >>>>>>>>> On Wed, 11 Mar 2020 at 11:54, Xintong Song < > >>>> tonysong...@gmail.com> > >>>>>>> wrote: > >>>>>>>>> > >>>>>>>>>> Thanks Yu, for the kick off and volunteering to be the > >>> release > >>>>>>> manager. > >>>>>>>>>> > >>>>>>>>>> +1 for the proposal. > >>>>>>>>>> > >>>>>>>>>> > >>>>>>>>>> One quick question, what is the time plan for 1.10.1? > >>>>>>>>>> > >>>>>>>>>> > >>>>>>>>>> Thank you~ > >>>>>>>>>> > >>>>>>>>>> Xintong Song > >>>>>>>>>> > >>>>>>>>>> > >>>>>>>>>> > >>>>>>>>>> On Wed, Mar 11, 2020 at 11:51 AM Zhijiang > >>>>>>>>>> <wangzhijiang...@aliyun.com.invalid> wrote: > >>>>>>>>>> > >>>>>>>>>>> Thanks for driving this release, Yu! > >>>>>>>>>>> +1 on my side > >>>>>>>>>>> > >>>>>>>>>>> > >>>>>>>>>>> > >>>>>>>>>>> > >>>>>>>>>>> > >>>>>>>>>>> > >>>>>>>>>>> > >>>>>>>>>>> Best, > >>>>>>>>>>> Zhijiang > >>>>>>>>>>> > >>>>>>>>>>> > >>>>>>>>>>> > >>>>>> ------------------------------------------------------------------ > >>>>>>>>>>> From:Yu Li <car...@gmail.com> > >>>>>>>>>>> Send Time:2020 Mar. 10 (Tue.) 20:25 > >>>>>>>>>>> To:dev <dev@flink.apache.org> > >>>>>>>>>>> Subject:Re: [DISCUSS] Releasing Flink 1.10.1 > >>>>>>>>>>> > >>>>>>>>>>> Thanks for the supplement Hequn. Yes will also keep an > >> eye > >>> on > >>>>>> these > >>>>>>>>>>> existing blocker issues. > >>>>>>>>>>> > >>>>>>>>>>> Best Regards, > >>>>>>>>>>> Yu > >>>>>>>>>>> > >>>>>>>>>>> > >>>>>>>>>>> On Tue, 10 Mar 2020 at 19:10, Hequn Cheng < > >>> he...@apache.org> > >>>>>>> wrote: > >>>>>>>>>>> > >>>>>>>>>>>> Hi Yu, > >>>>>>>>>>>> > >>>>>>>>>>>> Thanks a lot for raising the discussion and volunteer > >> as > >>>> the > >>>>>>> release > >>>>>>>>>>>> manager! > >>>>>>>>>>>> > >>>>>>>>>>>> I found there are some other issues[1] which are marked > >>> as > >>>> a > >>>>>>> blocker: > >>>>>>>>>>>> - FLINK-16454 Update the copyright year in NOTICE files > >>>>>>>>>>>> - FLINK-16262 Class loader problem with > >>>>>>>>>>>> FlinkKafkaProducer.Semantic.EXACTLY_ONCE and usrlib > >>>> directory > >>>>>>>>>>>> - FLINK-16170 SearchTemplateRequest > >>> ClassNotFoundException > >>>>> when > >>>>>>> use > >>>>>>>>>>>> flink-sql-connector-elasticsearch7 > >>>>>>>>>>>> - FLINK-16018 Improve error reporting when submitting > >>> batch > >>>>> job > >>>>>>>>>> (instead > >>>>>>>>>>> of > >>>>>>>>>>>> AskTimeoutException) > >>>>>>>>>>>> > >>>>>>>>>>>> These may also need to be resolved in 1.10.1. > >>>>>>>>>>>> > >>>>>>>>>>>> Best, > >>>>>>>>>>>> Hequn > >>>>>>>>>>>> > >>>>>>>>>>>> [1] > >>>>>>> https://issues.apache.org/jira/projects/FLINK/versions/12346891 > >>>>>>>>>>>> > >>>>>>>>>>>> > >>>>>>>>>>>> On Tue, Mar 10, 2020 at 6:48 PM Yu Li < > >> car...@gmail.com> > >>>>>> wrote: > >>>>>>>>>>>> > >>>>>>>>>>>>> Hi Jincheng, > >>>>>>>>>>>>> > >>>>>>>>>>>>> Yes, your help would be very helpful. Thanks a lot! > >>>>>>>>>>>>> > >>>>>>>>>>>>> Best Regards, > >>>>>>>>>>>>> Yu > >>>>>>>>>>>>> > >>>>>>>>>>>>> > >>>>>>>>>>>>> On Tue, 10 Mar 2020 at 18:24, jincheng sun < > >>>>>>>>> sunjincheng...@gmail.com > >>>>>>>>>>> > >>>>>>>>>>>>> wrote: > >>>>>>>>>>>>> > >>>>>>>>>>>>>> Thanks for bring up the discussion Yu. I would like > >>> to > >>>>> give > >>>>>>> you a > >>>>>>>>>>> hand > >>>>>>>>>>>> at > >>>>>>>>>>>>>> the last stage when the RC is finished.(If you > >> need) > >>>> :) > >>>>>>>>>>>>>> > >>>>>>>>>>>>>> Best, > >>>>>>>>>>>>>> Jincheng > >>>>>>>>>>>>>> > >>>>>>>>>>>>>> > >>>>>>>>>>>>>> > >>>>>>>>>>>>>> Yu Li <car...@gmail.com> 于2020年3月10日周二 下午5:49写道: > >>>>>>>>>>>>>> > >>>>>>>>>>>>>>> Hi All, > >>>>>>>>>>>>>>> > >>>>>>>>>>>>>>> It has been almost one month since we released > >>> Flink > >>>>>>> 1.10.0. We > >>>>>>>>>>>> already > >>>>>>>>>>>>>>> have more than 40 resolved improvements/bugs in > >> the > >>>>>>>>> release-1.10 > >>>>>>>>>>>>> branch, > >>>>>>>>>>>>>>> and I propose to start the 1.10.1 release cycle. > >>>>>>>>>>>>>>> > >>>>>>>>>>>>>>> Most noticeable fixes are: > >>>>>>>>>>>>>>> > >>>>>>>>>>>>>>> - FLINK-16241 [legal] Remove the license and > >> notice > >>>>> file > >>>>>> in > >>>>>>>>>>>>> flink-ml-lib > >>>>>>>>>>>>>>> module > >>>>>>>>>>>>>>> - FLINK-16313 Fix RocksDB resource leak in > >>>>>>>>>>> flink-state-processor-api > >>>>>>>>>>>>>>> - FLINK-16161 Statistics zero should be known in > >>>>>>> HiveCatalog > >>>>>>>>>>>>>>> - FLINK-2336 ArrayIndexOufOBoundsException in > >>>>>> TypeExtractor > >>>>>>>>> when > >>>>>>>>>>>>> mapping > >>>>>>>>>>>>>>> - FLINK-16108 StreamSQLExample is failed if > >> running > >>>> in > >>>>>>> blink > >>>>>>>>>>> planner > >>>>>>>>>>>>>>> - FLINK-16139 Co-location constraints are not > >> reset > >>>> on > >>>>>> task > >>>>>>>>>>> recovery > >>>>>>>>>>>> in > >>>>>>>>>>>>>>> DefaultScheduler > >>>>>>>>>>>>>>> - FLINK-16414 Create udaf/udtf function using sql > >>>>> casuing > >>>>>>>>>>>>>>> ValidationException: SQL validation failed > >>>>>>>>>>>>>>> > >>>>>>>>>>>>>>> Furthermore, I think the following issues should > >> be > >>>>>> merged > >>>>>>>>> before > >>>>>>>>>>>>> 1.10.1 > >>>>>>>>>>>>>>> release (especially the Metaspace OOM issue): > >>>>>>>>>>>>>>> > >>>>>>>>>>>>>>> - FLINK-16142 Memory Leak causes Metaspace OOM > >>> error > >>>> on > >>>>>>>>> repeated > >>>>>>>>>>> job > >>>>>>>>>>>>>>> submission > >>>>>>>>>>>>>>> - FLINK-16406 Increase default value for JVM > >>>> Metaspace > >>>>> to > >>>>>>>>>> minimise > >>>>>>>>>>>> its > >>>>>>>>>>>>>>> OutOfMemoryError > >>>>>>>>>>>>>>> - FLINK-16047 Blink planner produces wrong > >>> aggregate > >>>>>>> results > >>>>>>>>> with > >>>>>>>>>>>> state > >>>>>>>>>>>>>>> clean up > >>>>>>>>>>>>>>> - FLINK-16070 Blink planner can not extract > >> correct > >>>>>> unique > >>>>>>> key > >>>>>>>>>> for > >>>>>>>>>>>>>>> UpsertStreamTableSink > >>>>>>>>>>>>>>> > >>>>>>>>>>>>>>> I would volunteer as the release manager and kick > >>> off > >>>>> the > >>>>>>>>> release > >>>>>>>>>>>>> process > >>>>>>>>>>>>>>> once blocker issues are merged. What do you > >> think? > >>>>>>>>>>>>>>> > >>>>>>>>>>>>>>> If there are any concerns or missing blocker > >> issues > >>>>> need > >>>>>>> to be > >>>>>>>>>>> fixed > >>>>>>>>>>>> in > >>>>>>>>>>>>>>> 1.10.1, please let me know. Thanks. > >>>>>>>>>>>>>>> > >>>>>>>>>>>>>>> Best Regards, > >>>>>>>>>>>>>>> Yu > >>>>>>>>>>>>>>> > >>>>>>>>>>>>>> > >>>>>>>>>>>>> > >>>>>>>>>>>> > >>>>>>>>>>> > >>>>>>>>>> > >>>>>>>>> > >>>>>>> > >>>>>> > >>>>>> > >>>>>> -- > >>>>>> Best, Jingsong Lee > >>>>>> > >>>>> > >>>> > >>> > >> > >