> About "FLINK-16142 Memory Leak causes Metaspace OOM error on repeated job”

My understanding that the issue is basically covered by:

- [FLINK-16225] Metaspace Out Of Memory should be handled as Fatal Error in 
TaskManager
   no full consensus there but improving error message for existing task thread 
fatal handling could be done at least

- [FLINK-16406] Increase default value for JVM Metaspace to minimise its 
OutOfMemoryError
   see further

- [FLINK-16246] Exclude "SdkMBeanRegistrySupport" from dynamically loaded AWS 
connectors
  not sure whether this is a blocker but looks close to be resolved 

> About "FLINK-16406 Increase default value for JVM Metaspace"
>  - Have we consensus that this is okay for a bugfix release? It changes
> setups, takes away memory from heap / managed memory on existing setups
> that keep their flink-conf.yaml.

My understanding was that increasing to 256m resolved the reported problems
and we decided to make the change so I have merged it today as there were no 
more concerns.
If there are concerns I can revert it.

On the other hand, I think improving the message error with reference to the 
metaspace option should help the most
because user would not have to read all docs to fix it
then maybe this change is not even needed.

Best,
Andrey

> On 12 Mar 2020, at 12:28, Stephan Ewen <se...@apache.org> wrote:
> 
> Good idea to go ahead with 1.10.1
> 
> About "FLINK-16142 Memory Leak causes Metaspace OOM error on repeated job"
>  - I don't think we have consensus on the exact solution, yet, and some of
> the changes might also have side effects that are hard to predict, so I am
> not sure we should rush this in.
> 
> About "FLINK-16406 Increase default value for JVM Metaspace"
>  - Have we consensus that this is okay for a bugfix release? It changes
> setups, takes away memory from heap / managed memory on existing setups
> that keep their flink-conf.yaml.
> 
> We may need to unblock the release form these two issues and think about
> having 1.10.2 in the near future.
> 
> On Thu, Mar 12, 2020 at 7:15 AM Yu Li <car...@gmail.com> wrote:
> 
>> Thanks for the reminder Jark. Will keep an eye on these two.
>> 
>> Best Regards,
>> Yu
>> 
>> 
>> On Thu, 12 Mar 2020 at 12:32, Jark Wu <imj...@gmail.com> wrote:
>> 
>>> Thanks for driving this release, Yu!
>>> +1 to start 1.10.1 release cycle.
>>> 
>>> From the Table SQL module, I think we should also try to get in the
>>> following issues:
>>> - FLINK-16441: Allow users to override flink-conf parameters from SQL CLI
>>> environment
>>>  this allows users to set e.g. statebackend, watermark interval,
>>> exactly-once/at-least-once, in the SQL CLI
>>> - FLINK-16217: SQL Client crashed when any uncatched exception is thrown
>>>  this will improve much experience when using SQL CLI
>>> 
>>> Best,
>>> Jark
>>> 
>>> 
>>> On Wed, 11 Mar 2020 at 20:37, Yu Li <car...@gmail.com> wrote:
>>> 
>>>> Thanks for the suggestion Andrey! I've added 1.10.1 into FLINK-16225
>> fix
>>>> versions and promoted its priority to Critical. Will also watch the
>>>> progress of FLINK-16108/FLINK-16408.
>>>> 
>>>> Best Regards,
>>>> Yu
>>>> 
>>>> 
>>>> On Wed, 11 Mar 2020 at 18:18, Andrey Zagrebin <azagre...@apache.org>
>>>> wrote:
>>>> 
>>>>> Hi Yu,
>>>>> 
>>>>> Thanks for kicking off the 1.10.1 release discussion!
>>>>> 
>>>>> Apart from
>>>>> - FLINK-16406 Increase default value for JVM Metaspace to minimise
>> its
>>>>> OutOfMemoryError
>>>>> which should be merged soon
>>>>> 
>>>>> I think we should also try to get in the following issues:
>>>>> 
>>>>> - [FLINK-16225] Metaspace Out Of Memory should be handled as Fatal
>>> Error
>>>> in
>>>>> TaskManager
>>>>> This should solve the Metaspace problem even in a better way because
>>> OOM
>>>>> failure should point users to the docs immediately
>>>>> 
>>>>> - [FLINK-16408] Bind user code class loader to lifetime of a slot
>>>>> This should give a better protection against class loading leaks
>>>>> 
>>>>> - [FLINK-16018] Improve error reporting when submitting batch job
>>>> (instead
>>>>> of AskTimeoutException)
>>>>> This problem has recently happened for multiple users
>>>>> 
>>>>> Best,
>>>>> Andrey
>>>>> 
>>>>> 
>>>>> On Wed, Mar 11, 2020 at 8:46 AM Jingsong Li <jingsongl...@gmail.com>
>>>>> wrote:
>>>>> 
>>>>>> Thanks for driving. Yu. +1 for starting the 1.10.1 release.
>>>>>> 
>>>>>> Some issues are very important, Users are looking forward to them.
>>>>>> 
>>>>>> Best,
>>>>>> Jingsong Lee
>>>>>> 
>>>>>> On Wed, Mar 11, 2020 at 2:52 PM Yangze Guo <karma...@gmail.com>
>>> wrote:
>>>>>> 
>>>>>>> Thanks for driving this release, Yu!
>>>>>>> 
>>>>>>> +1 for starting the 1.10.1 release cycle.
>>>>>>> 
>>>>>>> Best,
>>>>>>> Yangze Guo
>>>>>>> 
>>>>>>> On Wed, Mar 11, 2020 at 1:42 PM Xintong Song <
>>> tonysong...@gmail.com>
>>>>>>> wrote:
>>>>>>>> 
>>>>>>>> Yu,
>>>>>>>> Thanks for the explanation.
>>>>>>>> I've no concerns. I was just trying to get some inputs for
>>>>> prioritizing
>>>>>>>> tasks on my side, and ~1month sounds good to me.
>>>>>>>> 
>>>>>>>> 
>>>>>>>> Thank you~
>>>>>>>> 
>>>>>>>> Xintong Song
>>>>>>>> 
>>>>>>>> 
>>>>>>>> 
>>>>>>>> On Wed, Mar 11, 2020 at 12:15 PM Yu Li <car...@gmail.com>
>> wrote:
>>>>>>>> 
>>>>>>>>> bq. what is the time plan for 1.10.1?
>>>>>>>>> 
>>>>>>>>> According to the history, the first patch release of a major
>>>>> version
>>>>>>> will
>>>>>>>>> take ~1month from discussion started, depending on the speed
>> of
>>>>>> blocker
>>>>>>>>> issue resolving:
>>>>>>>>> * 1.8.1: started discussion on May 28th [1], released on Jul
>>> 3rd
>>>>> [2]
>>>>>>>>> * 1.9.1: started discussion on Sep 23rd [3], released on Oct
>>> 19th
>>>>> [4]
>>>>>>>>> 
>>>>>>>>> We won't rush to match the history of course, but could use
>> it
>>>> as a
>>>>>>>>> reference. And please feel free to let me know if any
>> concerns
>>>>>> Xintong.
>>>>>>>>> Thanks.
>>>>>>>>> 
>>>>>>>>> Best Regards,
>>>>>>>>> Yu
>>>>>>>>> 
>>>>>>>>> [1]
>>>>>>>>> 
>>>>>>>>> 
>>>>>>> 
>>>>>> 
>>>>> 
>>>> 
>>> 
>> http://apache-flink-mailing-list-archive.1008284.n3.nabble.com/DISCUSS-Releasing-Flink-1-8-1-td29154.html
>>>>>>>>> [2]
>>>>>>>>> 
>>>>>>>>> 
>>>>>>> 
>>>>>> 
>>>>> 
>>>> 
>>> 
>> http://apache-flink-mailing-list-archive.1008284.n3.nabble.com/ANNOUNCE-Apache-Flink-1-8-1-released-td30124.html
>>>>>>>>> [3]
>>>>>>>>> 
>>>>>>>>> 
>>>>>>> 
>>>>>> 
>>>>> 
>>>> 
>>> 
>> http://apache-flink-mailing-list-archive.1008284.n3.nabble.com/DISCUSS-Releasing-Flink-1-9-1-td33343.html
>>>>>>>>> [4]
>>>>>>>>> 
>>>>>>>>> 
>>>>>>> 
>>>>>> 
>>>>> 
>>>> 
>>> 
>> http://apache-flink-mailing-list-archive.1008284.n3.nabble.com/ANNOUNCE-Apache-Flink-1-9-1-released-td34170.html
>>>>>>>>> 
>>>>>>>>> 
>>>>>>>>> On Wed, 11 Mar 2020 at 11:54, Xintong Song <
>>>> tonysong...@gmail.com>
>>>>>>> wrote:
>>>>>>>>> 
>>>>>>>>>> Thanks Yu, for the kick off and volunteering to be the
>>> release
>>>>>>> manager.
>>>>>>>>>> 
>>>>>>>>>> +1 for the proposal.
>>>>>>>>>> 
>>>>>>>>>> 
>>>>>>>>>> One quick question, what is the time plan for 1.10.1?
>>>>>>>>>> 
>>>>>>>>>> 
>>>>>>>>>> Thank you~
>>>>>>>>>> 
>>>>>>>>>> Xintong Song
>>>>>>>>>> 
>>>>>>>>>> 
>>>>>>>>>> 
>>>>>>>>>> On Wed, Mar 11, 2020 at 11:51 AM Zhijiang
>>>>>>>>>> <wangzhijiang...@aliyun.com.invalid> wrote:
>>>>>>>>>> 
>>>>>>>>>>> Thanks for driving this release, Yu!
>>>>>>>>>>> +1 on my side
>>>>>>>>>>> 
>>>>>>>>>>> 
>>>>>>>>>>> 
>>>>>>>>>>> 
>>>>>>>>>>> 
>>>>>>>>>>> 
>>>>>>>>>>> 
>>>>>>>>>>> Best,
>>>>>>>>>>> Zhijiang
>>>>>>>>>>> 
>>>>>>>>>>> 
>>>>>>>>>>> 
>>>>>> ------------------------------------------------------------------
>>>>>>>>>>> From:Yu Li <car...@gmail.com>
>>>>>>>>>>> Send Time:2020 Mar. 10 (Tue.) 20:25
>>>>>>>>>>> To:dev <dev@flink.apache.org>
>>>>>>>>>>> Subject:Re: [DISCUSS] Releasing Flink 1.10.1
>>>>>>>>>>> 
>>>>>>>>>>> Thanks for the supplement Hequn. Yes will also keep an
>> eye
>>> on
>>>>>> these
>>>>>>>>>>> existing blocker issues.
>>>>>>>>>>> 
>>>>>>>>>>> Best Regards,
>>>>>>>>>>> Yu
>>>>>>>>>>> 
>>>>>>>>>>> 
>>>>>>>>>>> On Tue, 10 Mar 2020 at 19:10, Hequn Cheng <
>>> he...@apache.org>
>>>>>>> wrote:
>>>>>>>>>>> 
>>>>>>>>>>>> Hi Yu,
>>>>>>>>>>>> 
>>>>>>>>>>>> Thanks a lot for raising the discussion and volunteer
>> as
>>>> the
>>>>>>> release
>>>>>>>>>>>> manager!
>>>>>>>>>>>> 
>>>>>>>>>>>> I found there are some other issues[1] which are marked
>>> as
>>>> a
>>>>>>> blocker:
>>>>>>>>>>>> - FLINK-16454 Update the copyright year in NOTICE files
>>>>>>>>>>>> - FLINK-16262 Class loader problem with
>>>>>>>>>>>> FlinkKafkaProducer.Semantic.EXACTLY_ONCE and usrlib
>>>> directory
>>>>>>>>>>>> - FLINK-16170 SearchTemplateRequest
>>> ClassNotFoundException
>>>>> when
>>>>>>> use
>>>>>>>>>>>> flink-sql-connector-elasticsearch7
>>>>>>>>>>>> - FLINK-16018 Improve error reporting when submitting
>>> batch
>>>>> job
>>>>>>>>>> (instead
>>>>>>>>>>> of
>>>>>>>>>>>> AskTimeoutException)
>>>>>>>>>>>> 
>>>>>>>>>>>> These may also need to be resolved in 1.10.1.
>>>>>>>>>>>> 
>>>>>>>>>>>> Best,
>>>>>>>>>>>> Hequn
>>>>>>>>>>>> 
>>>>>>>>>>>> [1]
>>>>>>> https://issues.apache.org/jira/projects/FLINK/versions/12346891
>>>>>>>>>>>> 
>>>>>>>>>>>> 
>>>>>>>>>>>> On Tue, Mar 10, 2020 at 6:48 PM Yu Li <
>> car...@gmail.com>
>>>>>> wrote:
>>>>>>>>>>>> 
>>>>>>>>>>>>> Hi Jincheng,
>>>>>>>>>>>>> 
>>>>>>>>>>>>> Yes, your help would be very helpful. Thanks a lot!
>>>>>>>>>>>>> 
>>>>>>>>>>>>> Best Regards,
>>>>>>>>>>>>> Yu
>>>>>>>>>>>>> 
>>>>>>>>>>>>> 
>>>>>>>>>>>>> On Tue, 10 Mar 2020 at 18:24, jincheng sun <
>>>>>>>>> sunjincheng...@gmail.com
>>>>>>>>>>> 
>>>>>>>>>>>>> wrote:
>>>>>>>>>>>>> 
>>>>>>>>>>>>>> Thanks for bring up the discussion Yu. I would like
>>> to
>>>>> give
>>>>>>> you a
>>>>>>>>>>> hand
>>>>>>>>>>>> at
>>>>>>>>>>>>>> the last stage when the RC is finished.(If you
>> need)
>>>> :)
>>>>>>>>>>>>>> 
>>>>>>>>>>>>>> Best,
>>>>>>>>>>>>>> Jincheng
>>>>>>>>>>>>>> 
>>>>>>>>>>>>>> 
>>>>>>>>>>>>>> 
>>>>>>>>>>>>>> Yu Li <car...@gmail.com> 于2020年3月10日周二 下午5:49写道:
>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>> Hi All,
>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>> It has been almost one month since we released
>>> Flink
>>>>>>> 1.10.0. We
>>>>>>>>>>>> already
>>>>>>>>>>>>>>> have more than 40 resolved improvements/bugs in
>> the
>>>>>>>>> release-1.10
>>>>>>>>>>>>> branch,
>>>>>>>>>>>>>>> and I propose to start the 1.10.1 release cycle.
>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>> Most noticeable fixes are:
>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>> - FLINK-16241 [legal] Remove the license and
>> notice
>>>>> file
>>>>>> in
>>>>>>>>>>>>> flink-ml-lib
>>>>>>>>>>>>>>> module
>>>>>>>>>>>>>>> - FLINK-16313 Fix RocksDB resource leak in
>>>>>>>>>>> flink-state-processor-api
>>>>>>>>>>>>>>> - FLINK-16161 Statistics zero should be known in
>>>>>>> HiveCatalog
>>>>>>>>>>>>>>> - FLINK-2336 ArrayIndexOufOBoundsException in
>>>>>> TypeExtractor
>>>>>>>>> when
>>>>>>>>>>>>> mapping
>>>>>>>>>>>>>>> - FLINK-16108 StreamSQLExample is failed if
>> running
>>>> in
>>>>>>> blink
>>>>>>>>>>> planner
>>>>>>>>>>>>>>> - FLINK-16139 Co-location constraints are not
>> reset
>>>> on
>>>>>> task
>>>>>>>>>>> recovery
>>>>>>>>>>>> in
>>>>>>>>>>>>>>> DefaultScheduler
>>>>>>>>>>>>>>> - FLINK-16414 Create udaf/udtf function using sql
>>>>> casuing
>>>>>>>>>>>>>>> ValidationException: SQL validation failed
>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>> Furthermore, I think the following issues should
>> be
>>>>>> merged
>>>>>>>>> before
>>>>>>>>>>>>> 1.10.1
>>>>>>>>>>>>>>> release (especially the Metaspace OOM issue):
>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>> - FLINK-16142 Memory Leak causes Metaspace OOM
>>> error
>>>> on
>>>>>>>>> repeated
>>>>>>>>>>> job
>>>>>>>>>>>>>>> submission
>>>>>>>>>>>>>>> - FLINK-16406 Increase default value for JVM
>>>> Metaspace
>>>>> to
>>>>>>>>>> minimise
>>>>>>>>>>>> its
>>>>>>>>>>>>>>> OutOfMemoryError
>>>>>>>>>>>>>>> - FLINK-16047 Blink planner produces wrong
>>> aggregate
>>>>>>> results
>>>>>>>>> with
>>>>>>>>>>>> state
>>>>>>>>>>>>>>> clean up
>>>>>>>>>>>>>>> - FLINK-16070 Blink planner can not extract
>> correct
>>>>>> unique
>>>>>>> key
>>>>>>>>>> for
>>>>>>>>>>>>>>> UpsertStreamTableSink
>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>> I would volunteer as the release manager and kick
>>> off
>>>>> the
>>>>>>>>> release
>>>>>>>>>>>>> process
>>>>>>>>>>>>>>> once blocker issues are merged. What do you
>> think?
>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>> If there are any concerns or missing blocker
>> issues
>>>>> need
>>>>>>> to be
>>>>>>>>>>> fixed
>>>>>>>>>>>> in
>>>>>>>>>>>>>>> 1.10.1, please let me know. Thanks.
>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>> Best Regards,
>>>>>>>>>>>>>>> Yu
>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>> 
>>>>>>>>>>>>> 
>>>>>>>>>>>> 
>>>>>>>>>>> 
>>>>>>>>>> 
>>>>>>>>> 
>>>>>>> 
>>>>>> 
>>>>>> 
>>>>>> --
>>>>>> Best, Jingsong Lee
>>>>>> 
>>>>> 
>>>> 
>>> 
>> 

Reply via email to