I'd be fine with these changes. Thanks for the summary Xintong. Cheers, Till
On Wed, Jan 15, 2020 at 11:09 AM Xintong Song <tonysong...@gmail.com> wrote: > Thank you all for the well discussion. > > If there's no further concerns or objections, I would like to conclude this > thread into the following action items. > > - Change default value of "taskmanager.memory.jvm-overhead.min" to > 192MB. > - Change default value of "taskmanager.memory.jvm-metaspace.size" to > 96MB. > - Change the value of "taskmanager.memory.process.size" in the default > "flink-conf.yaml" to 1568MB. > - Relax JVM overhead sanity check, so that the fraction does not need to > be strictly followed, as long as the min/max range is respected. > > > Thank you~ > > Xintong Song > > > > On Wed, Jan 15, 2020 at 5:50 PM Xintong Song <tonysong...@gmail.com> > wrote: > > > There's more idea from offline discussion with Andrey. > > > > If we decide to make metaspace 96MB, we can also make process.size 1568MB > > (1.5G + 32MB). > > According to the spreadsheet > > < > https://docs.google.com/spreadsheets/d/1mJaMkMPfDJJ-w6nMXALYmTc4XxiV30P5U7DzgwLkSoE/edit#gid=0 > >, > > 1.5GB process size and 64MB metaspace result in memory sizes with the > > values to be powers of 2. > > When increasing the metaspace from 64MB to 96MB, it would be good to > > preserve that alignment, for better readability that later we explain the > > memory configuration and calculations in documents. > > I believe it's not a big difference between 1.5GB and 1.5GB + 32 MB for > > memory consumption. > > > > Thank you~ > > > > Xintong Song > > > > > > > > On Wed, Jan 15, 2020 at 11:55 AM Xintong Song <tonysong...@gmail.com> > > wrote: > > > >> Thanks for the discussion, Stephan, Till and Andrey. > >> > >> +1 for the managed fraction (0.4) and process.size (1.5G). > >> > >> *JVM overhead min 196 -> 192Mb (128 + 64)* > >>> small correction for better power 2 alignment of sizes > >>> > >> Sorry, this was a typo (and the same for the jira comment which is > >> copy-pasted). It was 192mb used in the tuning report. > >> > >> *meta space at least 96Mb?* > >>> There is still a concern about JVM metaspace being just 64Mb. > >>> We should confirm that it is not a problem by trying to test it also > with > >>> the SQL jobs, Blink planner. > >>> Also, by running tpc-ds e2e Flink tests with this setting. Basically, > >>> where > >>> more classes are generated/loaded. > >>> We can look into this tomorrow. > >>> > >> I have already tried the setting metaspace to 64Mb with the e2e tests, > >> where I believe various sql / blink / tpc-ds test cases are included. > (See > >> https://travis-ci.com/flink-ci/flink/builds/142970113 ) > >> However, I'm also ok with 96Mb, since we are increasing the process.size > >> to 1.5G. > >> My original concern for having larger metaspace size was that we may > >> result in too small flink.size for out-of-box configuration on > >> containerized setups. > >> > >> *sanity check of JVM overhead* > >>> When the explicitly configured process and flink memory sizes are > >>> verified > >>> with the JVM meta space and overhead, > >>> JVM overhead does not have to be the exact fraction. > >>> It can be just within its min/max range, similar to how it is now for > >>> network/shuffle memory check after FLINK-15300. > >>> > >> Also +1 for this. > >> > >> Thank you~ > >> > >> Xintong Song > >> > >> > >> > >> On Wed, Jan 15, 2020 at 6:16 AM Andrey Zagrebin <azagre...@apache.org> > >> wrote: > >> > >>> Hi all, > >>> > >>> Stephan, Till and me had another offline discussion today. Here is the > >>> outcome of our brainstorm. > >>> > >>> *managed fraction 0.4* > >>> just confirmed what we already discussed here. > >>> > >>> *process.size = 1536Mb (1,5Gb)* > >>> We agreed to have process.size in the default settings with the > >>> explanation > >>> of flink.size alternative in the comment. > >>> The suggestion is to increase it from 1024 to 1536mb. As you can see in > >>> the > >>> earlier provided calculation spreadsheet, > >>> it will result in bigger JVM Heap and managed memory (both ~0.5Gb) for > >>> all > >>> new setups. > >>> This should provide good enough experience for trying out Flink. > >>> > >>> *JVM overhead min 196 -> 192Mb (128 + 64)* > >>> small correction for better power 2 alignment of sizes > >>> > >>> *meta space at least 96Mb?* > >>> There is still a concern about JVM metaspace being just 64Mb. > >>> We should confirm that it is not a problem by trying to test it also > with > >>> the SQL jobs, Blink planner. > >>> Also, by running tpc-ds e2e Flink tests with this setting. Basically, > >>> where > >>> more classes are generated/loaded. > >>> We can look into this tomorrow. > >>> > >>> *sanity check of JVM overhead* > >>> When the explicitly configured process and flink memory sizes are > >>> verified > >>> with the JVM meta space and overhead, > >>> JVM overhead does not have to be the exact fraction. > >>> It can be just within its min/max range, similar to how it is now for > >>> network/shuffle memory check after FLINK-15300. > >>> > >>> Best,Andrey > >>> > >>> On Tue, Jan 14, 2020 at 4:30 PM Stephan Ewen <se...@apache.org> wrote: > >>> > >>> > I like the idea of having a larger default "flink.size" in the > >>> config.yaml. > >>> > Maybe we don't need to double it, but something like 1280m would be > >>> okay? > >>> > > >>> > On Tue, Jan 14, 2020 at 3:47 PM Andrey Zagrebin < > azagre...@apache.org> > >>> > wrote: > >>> > > >>> > > Hi all! > >>> > > > >>> > > Great that we have already tried out new FLIP-49 with the bigger > >>> jobs. > >>> > > > >>> > > I am also +1 for the JVM metaspace and overhead changes. > >>> > > > >>> > > Regarding 0.3 vs 0.4 for managed memory, +1 for having more managed > >>> > memory > >>> > > for Rocksdb limiting case. > >>> > > > >>> > > In general, this looks mostly to be about memory distribution > >>> between JVM > >>> > > heap and managed off-heap. > >>> > > Comparing to the previous default setup, the JVM heap dropped > >>> (especially > >>> > > for standalone) mostly due to moving managed from heap to off-heap > >>> and > >>> > then > >>> > > also adding framework off-heap. > >>> > > In general, this can be the most important consequence for > beginners > >>> and > >>> > > those who rely on the default configuration. > >>> > > Especially the legacy default configuration in standalone with > >>> falling > >>> > back > >>> > > heap.size to flink.size but there it seems we cannot do too much > now. > >>> > > > >>> > > I prepared a spreadsheet > >>> > > < > >>> > > > >>> > > >>> > https://docs.google.com/spreadsheets/d/1mJaMkMPfDJJ-w6nMXALYmTc4XxiV30P5U7DzgwLkSoE > >>> > > > > >>> > > to play with numbers for the mentioned in the report setups. > >>> > > > >>> > > One idea would be to set process size (or smaller flink size > >>> > respectively) > >>> > > to a bigger default number, like 2048. > >>> > > In this case, the abs derived default JVM heap and managed memory > are > >>> > close > >>> > > to the previous defaults, especially for managed fraction 0.3. > >>> > > This should align the defaults with the previous standalone try-out > >>> > > experience where the increased off-heap memory is not strictly > >>> controlled > >>> > > by the environment anyways. > >>> > > The consequence for container users who relied on and updated the > >>> default > >>> > > configuration is that the containers will be requested with the > >>> double > >>> > > size. > >>> > > > >>> > > Best, > >>> > > Andrey > >>> > > > >>> > > > >>> > > On Tue, Jan 14, 2020 at 11:20 AM Till Rohrmann < > trohrm...@apache.org > >>> > > >>> > > wrote: > >>> > > > >>> > > > +1 for the JVM metaspace and overhead changes. > >>> > > > > >>> > > > On Tue, Jan 14, 2020 at 11:19 AM Till Rohrmann < > >>> trohrm...@apache.org> > >>> > > > wrote: > >>> > > > > >>> > > >> I guess one of the most important results of this experiment is > to > >>> > have > >>> > > a > >>> > > >> good tuning guide available for users who are past the initial > >>> try-out > >>> > > >> phase because the default settings will be kind of a > compromise. I > >>> > > assume > >>> > > >> that this is part of the outstanding FLIP-49 documentation task. > >>> > > >> > >>> > > >> If we limit RocksDB's memory consumption by default, then I > >>> believe > >>> > that > >>> > > >> 0.4 would give the better all-round experience as it leaves a > bit > >>> more > >>> > > >> memory for RocksDB. However, I'm a bit sceptical whether we > should > >>> > > optimize > >>> > > >> the default settings for a configuration where the user still > >>> needs to > >>> > > >> activate the strict memory limiting for RocksDB. In this case, I > >>> would > >>> > > >> expect that the user could also adapt the managed memory > fraction. > >>> > > >> > >>> > > >> Cheers, > >>> > > >> Till > >>> > > >> > >>> > > >> On Tue, Jan 14, 2020 at 3:39 AM Xintong Song < > >>> tonysong...@gmail.com> > >>> > > >> wrote: > >>> > > >> > >>> > > >>> Thanks for the feedback, Stephan and Kurt. > >>> > > >>> > >>> > > >>> @Stephan > >>> > > >>> > >>> > > >>> Regarding managed memory fraction, > >>> > > >>> - It makes sense to keep the default value 0.4, if we assume > >>> rocksdb > >>> > > >>> memory is limited by default. > >>> > > >>> - AFAIK, currently rocksdb by default does not limit its memory > >>> > usage. > >>> > > >>> And I'm positive to change it. > >>> > > >>> - Personally, I don't like the idea that we the out-of-box > >>> experience > >>> > > >>> (for which we set the default fraction) relies on that users > will > >>> > > manually > >>> > > >>> turn another switch on. > >>> > > >>> > >>> > > >>> Regarding framework heap memory, > >>> > > >>> - The major reason we set it by default is, as you mentioned, > >>> that to > >>> > > >>> have a safe net of minimal JVM heap size. > >>> > > >>> - Also, considering the in progress FLIP-56 (dynamic slot > >>> > allocation), > >>> > > >>> we want to reserve some heap memory that will not go into the > >>> slot > >>> > > >>> profiles. That's why we decide the default value according to > the > >>> > heap > >>> > > >>> memory usage of an empty task executor. > >>> > > >>> > >>> > > >>> @Kurt > >>> > > >>> Regarding metaspace, > >>> > > >>> - This config option ("taskmanager.memory.jvm-metaspace") only > >>> takes > >>> > > >>> effect on TMs. Currently we do not set metaspace size for JM. > >>> > > >>> - If we have the same metaspace problem on TMs, then yes, > >>> changing it > >>> > > >>> from 128M to 64M will make it worse. However, IMO 10T tpc-ds > >>> > benchmark > >>> > > >>> should not be considered as out-of-box experience and it makes > >>> sense > >>> > to > >>> > > >>> tune the configurations for it. I think the smaller metaspace > >>> size > >>> > > would be > >>> > > >>> a better choice for the first trying-out, where a job should > not > >>> be > >>> > too > >>> > > >>> complicated, the TM size could be relative small (e.g. 1g). > >>> > > >>> > >>> > > >>> Thank you~ > >>> > > >>> > >>> > > >>> Xintong Song > >>> > > >>> > >>> > > >>> > >>> > > >>> > >>> > > >>> On Tue, Jan 14, 2020 at 9:38 AM Kurt Young <ykt...@gmail.com> > >>> wrote: > >>> > > >>> > >>> > > >>>> HI Xingtong, > >>> > > >>>> > >>> > > >>>> IIRC during our tpc-ds 10T benchmark, we have suffered by JM's > >>> > > >>>> metaspace size and full gc which > >>> > > >>>> caused by lots of classloadings of source input split. Could > you > >>> > check > >>> > > >>>> whether changing the default > >>> > > >>>> value from 128MB to 64MB will make it worse? > >>> > > >>>> > >>> > > >>>> Correct me if I misunderstood anything, also cc @Jingsong > >>> > > >>>> > >>> > > >>>> Best, > >>> > > >>>> Kurt > >>> > > >>>> > >>> > > >>>> > >>> > > >>>> On Tue, Jan 14, 2020 at 3:44 AM Stephan Ewen < > se...@apache.org> > >>> > > wrote: > >>> > > >>>> > >>> > > >>>>> Hi all! > >>> > > >>>>> > >>> > > >>>>> Thanks a lot, Xintong, for this thorough analysis. Based on > >>> your > >>> > > >>>>> analysis, > >>> > > >>>>> here are some thoughts: > >>> > > >>>>> > >>> > > >>>>> +1 to change default JVM metaspace size from 128MB to 64MB > >>> > > >>>>> +1 to change default JVM overhead min size from 128MB to > 196MB > >>> > > >>>>> > >>> > > >>>>> Concerning the managed memory fraction, I am not sure I would > >>> > change > >>> > > >>>>> it, > >>> > > >>>>> for the following reasons: > >>> > > >>>>> > >>> > > >>>>> - We should assume RocksDB will be limited to managed > memory > >>> by > >>> > > >>>>> default. > >>> > > >>>>> This will either be active by default or we would encourage > >>> > everyone > >>> > > >>>>> to use > >>> > > >>>>> this by default, because otherwise it is super hard to reason > >>> about > >>> > > the > >>> > > >>>>> RocksDB footprint. > >>> > > >>>>> - For standalone, a managed memory fraction of 0.3 is less > >>> than > >>> > > half > >>> > > >>>>> of > >>> > > >>>>> the managed memory from 1.9. > >>> > > >>>>> - I am not sure if the managed memory fraction is a value > >>> that > >>> > all > >>> > > >>>>> users > >>> > > >>>>> adjust immediately when scaling up the memory during their > >>> first > >>> > > >>>>> try-out > >>> > > >>>>> phase. I would assume that most users initially only adjust > >>> > > >>>>> "memory.flink.size" or "memory.process.size". A value of 0.3 > >>> will > >>> > > lead > >>> > > >>>>> to > >>> > > >>>>> having too large heaps and very little RocksDB / batch memory > >>> even > >>> > > when > >>> > > >>>>> scaling up during the initial exploration. > >>> > > >>>>> - I agree, though, that 0.5 looks too aggressive, from your > >>> > > >>>>> benchmarks. > >>> > > >>>>> So maybe keeping it at 0.4 could work? > >>> > > >>>>> > >>> > > >>>>> And one question: Why do we set the Framework Heap by > default? > >>> Is > >>> > > that > >>> > > >>>>> so > >>> > > >>>>> we reduce the managed memory further is less than framework > >>> heap > >>> > > would > >>> > > >>>>> be > >>> > > >>>>> left from the JVM heap? > >>> > > >>>>> > >>> > > >>>>> Best, > >>> > > >>>>> Stephan > >>> > > >>>>> > >>> > > >>>>> On Thu, Jan 9, 2020 at 10:54 AM Xintong Song < > >>> > tonysong...@gmail.com> > >>> > > >>>>> wrote: > >>> > > >>>>> > >>> > > >>>>> > Hi all, > >>> > > >>>>> > > >>> > > >>>>> > As described in FLINK-15145 [1], we decided to tune the > >>> default > >>> > > >>>>> > configuration values of FLIP-49 with more jobs and cases. > >>> > > >>>>> > > >>> > > >>>>> > After spending time analyzing and tuning the > configurations, > >>> I've > >>> > > >>>>> come > >>> > > >>>>> > with several findings. To be brief, I would suggest the > >>> following > >>> > > >>>>> changes, > >>> > > >>>>> > and for more details please take a look at my tuning report > >>> [2]. > >>> > > >>>>> > > >>> > > >>>>> > - Change default managed memory fraction from 0.4 to > 0.3. > >>> > > >>>>> > - Change default JVM metaspace size from 128MB to 64MB. > >>> > > >>>>> > - Change default JVM overhead min size from 128MB to > >>> 196MB. > >>> > > >>>>> > > >>> > > >>>>> > Looking forward to your feedback. > >>> > > >>>>> > > >>> > > >>>>> > Thank you~ > >>> > > >>>>> > > >>> > > >>>>> > Xintong Song > >>> > > >>>>> > > >>> > > >>>>> > > >>> > > >>>>> > [1] https://issues.apache.org/jira/browse/FLINK-15145 > >>> > > >>>>> > > >>> > > >>>>> > [2] > >>> > > >>>>> > > >>> > > >>>>> > >>> > > > >>> > > >>> > https://docs.google.com/document/d/1-LravhQYUIkXb7rh0XnBB78vSvhp3ecLSAgsiabfVkk/edit?usp=sharing > >>> > > >>>>> > > >>> > > >>>>> > > >>> > > >>>>> > >>> > > >>>> > >>> > > > >>> > > >>> > >> >