Re: [Discuss] Tuning FLIP-49 configuration default values.

Till Rohrmann Wed, 15 Jan 2020 08:36:22 -0800

I'd be fine with these changes. Thanks for the summary Xintong.

Cheers,
Till


On Wed, Jan 15, 2020 at 11:09 AM Xintong Song <tonysong...@gmail.com> wrote:

> Thank you all for the well discussion.
>
> If there's no further concerns or objections, I would like to conclude this
> thread into the following action items.
>
>    - Change default value of "taskmanager.memory.jvm-overhead.min" to
> 192MB.
>    - Change default value of "taskmanager.memory.jvm-metaspace.size" to
>    96MB.
>    - Change the value of "taskmanager.memory.process.size" in the default
>    "flink-conf.yaml" to 1568MB.
>    - Relax JVM overhead sanity check, so that the fraction does not need to
>    be strictly followed, as long as the min/max range is respected.
>
>
> Thank you~
>
> Xintong Song
>
>
>
> On Wed, Jan 15, 2020 at 5:50 PM Xintong Song <tonysong...@gmail.com>
> wrote:
>
> > There's more idea from offline discussion with Andrey.
> >
> > If we decide to make metaspace 96MB, we can also make process.size 1568MB
> > (1.5G + 32MB).
> > According to the spreadsheet
> > <
> https://docs.google.com/spreadsheets/d/1mJaMkMPfDJJ-w6nMXALYmTc4XxiV30P5U7DzgwLkSoE/edit#gid=0
> >,
> > 1.5GB process size and 64MB metaspace result in memory sizes with the
> > values to be powers of 2.
> > When increasing the metaspace from 64MB to 96MB, it would be good to
> > preserve that alignment, for better readability that later we explain the
> > memory configuration and calculations in documents.
> > I believe it's not a big difference between 1.5GB and 1.5GB + 32 MB for
> > memory consumption.
> >
> > Thank you~
> >
> > Xintong Song
> >
> >
> >
> > On Wed, Jan 15, 2020 at 11:55 AM Xintong Song <tonysong...@gmail.com>
> > wrote:
> >
> >> Thanks for the discussion, Stephan, Till and Andrey.
> >>
> >> +1 for the managed fraction (0.4) and process.size (1.5G).
> >>
> >> *JVM overhead min 196 -> 192Mb (128 + 64)*
> >>> small correction for better power 2 alignment of sizes
> >>>
> >> Sorry, this was a typo (and the same for the jira comment which is
> >> copy-pasted). It was 192mb used in the tuning report.
> >>
> >> *meta space at least 96Mb?*
> >>> There is still a concern about JVM metaspace being just 64Mb.
> >>> We should confirm that it is not a problem by trying to test it also
> with
> >>> the SQL jobs, Blink planner.
> >>> Also, by running tpc-ds e2e Flink tests with this setting. Basically,
> >>> where
> >>> more classes are generated/loaded.
> >>> We can look into this tomorrow.
> >>>
> >> I have already tried the setting metaspace to 64Mb with the e2e tests,
> >> where I believe various sql / blink / tpc-ds test cases are included.
> (See
> >> https://travis-ci.com/flink-ci/flink/builds/142970113 )
> >> However, I'm also ok with 96Mb, since we are increasing the process.size
> >> to 1.5G.
> >> My original concern for having larger metaspace size was that we may
> >> result in too small flink.size for out-of-box configuration on
> >> containerized setups.
> >>
> >> *sanity check of JVM overhead*
> >>> When the explicitly configured process and flink memory sizes are
> >>> verified
> >>> with the JVM meta space and overhead,
> >>> JVM overhead does not have to be the exact fraction.
> >>> It can be just within its min/max range, similar to how it is now for
> >>> network/shuffle memory check after FLINK-15300.
> >>>
> >> Also +1 for this.
> >>
> >> Thank you~
> >>
> >> Xintong Song
> >>
> >>
> >>
> >> On Wed, Jan 15, 2020 at 6:16 AM Andrey Zagrebin <azagre...@apache.org>
> >> wrote:
> >>
> >>> Hi all,
> >>>
> >>> Stephan, Till and me had another offline discussion today. Here is the
> >>> outcome of our brainstorm.
> >>>
> >>> *managed fraction 0.4*
> >>> just confirmed what we already discussed here.
> >>>
> >>> *process.size = 1536Mb (1,5Gb)*
> >>> We agreed to have process.size in the default settings with the
> >>> explanation
> >>> of flink.size alternative in the comment.
> >>> The suggestion is to increase it from 1024 to 1536mb. As you can see in
> >>> the
> >>> earlier provided calculation spreadsheet,
> >>> it will result in bigger JVM Heap and managed memory (both ~0.5Gb) for
> >>> all
> >>> new setups.
> >>> This should provide good enough experience for trying out Flink.
> >>>
> >>> *JVM overhead min 196 -> 192Mb (128 + 64)*
> >>> small correction for better power 2 alignment of sizes
> >>>
> >>> *meta space at least 96Mb?*
> >>> There is still a concern about JVM metaspace being just 64Mb.
> >>> We should confirm that it is not a problem by trying to test it also
> with
> >>> the SQL jobs, Blink planner.
> >>> Also, by running tpc-ds e2e Flink tests with this setting. Basically,
> >>> where
> >>> more classes are generated/loaded.
> >>> We can look into this tomorrow.
> >>>
> >>> *sanity check of JVM overhead*
> >>> When the explicitly configured process and flink memory sizes are
> >>> verified
> >>> with the JVM meta space and overhead,
> >>> JVM overhead does not have to be the exact fraction.
> >>> It can be just within its min/max range, similar to how it is now for
> >>> network/shuffle memory check after FLINK-15300.
> >>>
> >>> Best,Andrey
> >>>
> >>> On Tue, Jan 14, 2020 at 4:30 PM Stephan Ewen <se...@apache.org> wrote:
> >>>
> >>> > I like the idea of having a larger default "flink.size" in the
> >>> config.yaml.
> >>> > Maybe we don't need to double it, but something like 1280m would be
> >>> okay?
> >>> >
> >>> > On Tue, Jan 14, 2020 at 3:47 PM Andrey Zagrebin <
> azagre...@apache.org>
> >>> > wrote:
> >>> >
> >>> > > Hi all!
> >>> > >
> >>> > > Great that we have already tried out new FLIP-49 with the bigger
> >>> jobs.
> >>> > >
> >>> > > I am also +1 for the JVM metaspace and overhead changes.
> >>> > >
> >>> > > Regarding 0.3 vs 0.4 for managed memory, +1 for having more managed
> >>> > memory
> >>> > > for Rocksdb limiting case.
> >>> > >
> >>> > > In general, this looks mostly to be about memory distribution
> >>> between JVM
> >>> > > heap and managed off-heap.
> >>> > > Comparing to the previous default setup, the JVM heap dropped
> >>> (especially
> >>> > > for standalone) mostly due to moving managed from heap to off-heap
> >>> and
> >>> > then
> >>> > > also adding framework off-heap.
> >>> > > In general, this can be the most important consequence for
> beginners
> >>> and
> >>> > > those who rely on the default configuration.
> >>> > > Especially the legacy default configuration in standalone with
> >>> falling
> >>> > back
> >>> > > heap.size to flink.size but there it seems we cannot do too much
> now.
> >>> > >
> >>> > > I prepared a spreadsheet
> >>> > > <
> >>> > >
> >>> >
> >>>
> https://docs.google.com/spreadsheets/d/1mJaMkMPfDJJ-w6nMXALYmTc4XxiV30P5U7DzgwLkSoE
> >>> > > >
> >>> > > to play with numbers for the mentioned in the report setups.
> >>> > >
> >>> > > One idea would be to set process size (or smaller flink size
> >>> > respectively)
> >>> > > to a bigger default number, like 2048.
> >>> > > In this case, the abs derived default JVM heap and managed memory
> are
> >>> > close
> >>> > > to the previous defaults, especially for managed fraction 0.3.
> >>> > > This should align the defaults with the previous standalone try-out
> >>> > > experience where the increased off-heap memory is not strictly
> >>> controlled
> >>> > > by the environment anyways.
> >>> > > The consequence for container users who relied on and updated the
> >>> default
> >>> > > configuration is that the containers will be requested with the
> >>> double
> >>> > > size.
> >>> > >
> >>> > > Best,
> >>> > > Andrey
> >>> > >
> >>> > >
> >>> > > On Tue, Jan 14, 2020 at 11:20 AM Till Rohrmann <
> trohrm...@apache.org
> >>> >
> >>> > > wrote:
> >>> > >
> >>> > > > +1 for the JVM metaspace and overhead changes.
> >>> > > >
> >>> > > > On Tue, Jan 14, 2020 at 11:19 AM Till Rohrmann <
> >>> trohrm...@apache.org>
> >>> > > > wrote:
> >>> > > >
> >>> > > >> I guess one of the most important results of this experiment is
> to
> >>> > have
> >>> > > a
> >>> > > >> good tuning guide available for users who are past the initial
> >>> try-out
> >>> > > >> phase because the default settings will be kind of a
> compromise. I
> >>> > > assume
> >>> > > >> that this is part of the outstanding FLIP-49 documentation task.
> >>> > > >>
> >>> > > >> If we limit RocksDB's memory consumption by default, then I
> >>> believe
> >>> > that
> >>> > > >> 0.4 would give the better all-round experience as it leaves a
> bit
> >>> more
> >>> > > >> memory for RocksDB. However, I'm a bit sceptical whether we
> should
> >>> > > optimize
> >>> > > >> the default settings for a configuration where the user still
> >>> needs to
> >>> > > >> activate the strict memory limiting for RocksDB. In this case, I
> >>> would
> >>> > > >> expect that the user could also adapt the managed memory
> fraction.
> >>> > > >>
> >>> > > >> Cheers,
> >>> > > >> Till
> >>> > > >>
> >>> > > >> On Tue, Jan 14, 2020 at 3:39 AM Xintong Song <
> >>> tonysong...@gmail.com>
> >>> > > >> wrote:
> >>> > > >>
> >>> > > >>> Thanks for the feedback, Stephan and Kurt.
> >>> > > >>>
> >>> > > >>> @Stephan
> >>> > > >>>
> >>> > > >>> Regarding managed memory fraction,
> >>> > > >>> - It makes sense to keep the default value 0.4, if we assume
> >>> rocksdb
> >>> > > >>> memory is limited by default.
> >>> > > >>> - AFAIK, currently rocksdb by default does not limit its memory
> >>> > usage.
> >>> > > >>> And I'm positive to change it.
> >>> > > >>> - Personally, I don't like the idea that we the out-of-box
> >>> experience
> >>> > > >>> (for which we set the default fraction) relies on that users
> will
> >>> > > manually
> >>> > > >>> turn another switch on.
> >>> > > >>>
> >>> > > >>> Regarding framework heap memory,
> >>> > > >>> - The major reason we set it by default is, as you mentioned,
> >>> that to
> >>> > > >>> have a safe net of minimal JVM heap size.
> >>> > > >>> - Also, considering the in progress FLIP-56 (dynamic slot
> >>> > allocation),
> >>> > > >>> we want to reserve some heap memory that will not go into the
> >>> slot
> >>> > > >>> profiles. That's why we decide the default value according to
> the
> >>> > heap
> >>> > > >>> memory usage of an empty task executor.
> >>> > > >>>
> >>> > > >>> @Kurt
> >>> > > >>> Regarding metaspace,
> >>> > > >>> - This config option ("taskmanager.memory.jvm-metaspace") only
> >>> takes
> >>> > > >>> effect on TMs. Currently we do not set metaspace size for JM.
> >>> > > >>> - If we have the same metaspace problem on TMs, then yes,
> >>> changing it
> >>> > > >>> from 128M to 64M will make it worse. However, IMO 10T tpc-ds
> >>> > benchmark
> >>> > > >>> should not be considered as out-of-box experience and it makes
> >>> sense
> >>> > to
> >>> > > >>> tune the configurations for it. I think the smaller metaspace
> >>> size
> >>> > > would be
> >>> > > >>> a better choice for the first trying-out, where a job should
> not
> >>> be
> >>> > too
> >>> > > >>> complicated, the TM size could be relative small (e.g. 1g).
> >>> > > >>>
> >>> > > >>> Thank you~
> >>> > > >>>
> >>> > > >>> Xintong Song
> >>> > > >>>
> >>> > > >>>
> >>> > > >>>
> >>> > > >>> On Tue, Jan 14, 2020 at 9:38 AM Kurt Young <ykt...@gmail.com>
> >>> wrote:
> >>> > > >>>
> >>> > > >>>> HI Xingtong,
> >>> > > >>>>
> >>> > > >>>> IIRC during our tpc-ds 10T benchmark, we have suffered by JM's
> >>> > > >>>> metaspace size and full gc which
> >>> > > >>>> caused by lots of classloadings of source input split. Could
> you
> >>> > check
> >>> > > >>>> whether changing the default
> >>> > > >>>> value from 128MB to 64MB will make it worse?
> >>> > > >>>>
> >>> > > >>>> Correct me if I misunderstood anything, also cc @Jingsong
> >>> > > >>>>
> >>> > > >>>> Best,
> >>> > > >>>> Kurt
> >>> > > >>>>
> >>> > > >>>>
> >>> > > >>>> On Tue, Jan 14, 2020 at 3:44 AM Stephan Ewen <
> se...@apache.org>
> >>> > > wrote:
> >>> > > >>>>
> >>> > > >>>>> Hi all!
> >>> > > >>>>>
> >>> > > >>>>> Thanks a lot, Xintong, for this thorough analysis. Based on
> >>> your
> >>> > > >>>>> analysis,
> >>> > > >>>>> here are some thoughts:
> >>> > > >>>>>
> >>> > > >>>>> +1 to change default JVM metaspace size from 128MB to 64MB
> >>> > > >>>>> +1 to change default JVM overhead min size from 128MB to
> 196MB
> >>> > > >>>>>
> >>> > > >>>>> Concerning the managed memory fraction, I am not sure I would
> >>> > change
> >>> > > >>>>> it,
> >>> > > >>>>> for the following reasons:
> >>> > > >>>>>
> >>> > > >>>>>   - We should assume RocksDB will be limited to managed
> memory
> >>> by
> >>> > > >>>>> default.
> >>> > > >>>>> This will either be active by default or we would encourage
> >>> > everyone
> >>> > > >>>>> to use
> >>> > > >>>>> this by default, because otherwise it is super hard to reason
> >>> about
> >>> > > the
> >>> > > >>>>> RocksDB footprint.
> >>> > > >>>>>   - For standalone, a managed memory fraction of 0.3 is less
> >>> than
> >>> > > half
> >>> > > >>>>> of
> >>> > > >>>>> the managed memory from 1.9.
> >>> > > >>>>>   - I am not sure if the managed memory fraction is a value
> >>> that
> >>> > all
> >>> > > >>>>> users
> >>> > > >>>>> adjust immediately when scaling up the memory during their
> >>> first
> >>> > > >>>>> try-out
> >>> > > >>>>> phase. I would assume that most users initially only adjust
> >>> > > >>>>> "memory.flink.size" or "memory.process.size". A value of 0.3
> >>> will
> >>> > > lead
> >>> > > >>>>> to
> >>> > > >>>>> having too large heaps and very little RocksDB / batch memory
> >>> even
> >>> > > when
> >>> > > >>>>> scaling up during the initial exploration.
> >>> > > >>>>>   - I agree, though, that 0.5 looks too aggressive, from your
> >>> > > >>>>> benchmarks.
> >>> > > >>>>> So maybe keeping it at 0.4 could work?
> >>> > > >>>>>
> >>> > > >>>>> And one question: Why do we set the Framework Heap by
> default?
> >>> Is
> >>> > > that
> >>> > > >>>>> so
> >>> > > >>>>> we reduce the managed memory further is less than framework
> >>> heap
> >>> > > would
> >>> > > >>>>> be
> >>> > > >>>>> left from the JVM heap?
> >>> > > >>>>>
> >>> > > >>>>> Best,
> >>> > > >>>>> Stephan
> >>> > > >>>>>
> >>> > > >>>>> On Thu, Jan 9, 2020 at 10:54 AM Xintong Song <
> >>> > tonysong...@gmail.com>
> >>> > > >>>>> wrote:
> >>> > > >>>>>
> >>> > > >>>>> > Hi all,
> >>> > > >>>>> >
> >>> > > >>>>> > As described in FLINK-15145 [1], we decided to tune the
> >>> default
> >>> > > >>>>> > configuration values of FLIP-49 with more jobs and cases.
> >>> > > >>>>> >
> >>> > > >>>>> > After spending time analyzing and tuning the
> configurations,
> >>> I've
> >>> > > >>>>> come
> >>> > > >>>>> > with several findings. To be brief, I would suggest the
> >>> following
> >>> > > >>>>> changes,
> >>> > > >>>>> > and for more details please take a look at my tuning report
> >>> [2].
> >>> > > >>>>> >
> >>> > > >>>>> >    - Change default managed memory fraction from 0.4 to
> 0.3.
> >>> > > >>>>> >    - Change default JVM metaspace size from 128MB to 64MB.
> >>> > > >>>>> >    - Change default JVM overhead min size from 128MB to
> >>> 196MB.
> >>> > > >>>>> >
> >>> > > >>>>> > Looking forward to your feedback.
> >>> > > >>>>> >
> >>> > > >>>>> > Thank you~
> >>> > > >>>>> >
> >>> > > >>>>> > Xintong Song
> >>> > > >>>>> >
> >>> > > >>>>> >
> >>> > > >>>>> > [1] https://issues.apache.org/jira/browse/FLINK-15145
> >>> > > >>>>> >
> >>> > > >>>>> > [2]
> >>> > > >>>>> >
> >>> > > >>>>>
> >>> > >
> >>> >
> >>>
> https://docs.google.com/document/d/1-LravhQYUIkXb7rh0XnBB78vSvhp3ecLSAgsiabfVkk/edit?usp=sharing
> >>> > > >>>>> >
> >>> > > >>>>> >
> >>> > > >>>>>
> >>> > > >>>>
> >>> > >
> >>> >
> >>>
> >>
>

Re: [Discuss] Tuning FLIP-49 configuration default values.

Reply via email to