Thanks all for the discussion. We agreed to have process.size in the default settings with the explanation > of flink.size alternative in the comment. > This way we keep -tm as a shortcut to process.size only and any > inconsistencies fail fast as if configured in yaml. >
The conclusions sounds good to me. I'll update the JIRA description and PR for FLINK-15530. +1 to resolve this discussion thread. We can keep the default values discussion in the "tuning FLIP-49 default configuration values" ML thread. Thank you~ Xintong Song On Wed, Jan 15, 2020 at 5:54 AM Andrey Zagrebin <azagre...@apache.org> wrote: > Hi all, > > Stephan, Till and me had another offline discussion today. Here is the > outcome of our brainstorm. > > We agreed to have process.size in the default settings with the explanation > of flink.size alternative in the comment. > This way we keep -tm as a shortcut to process.size only and any > inconsistencies fail fast as if configured in yaml. > I will also follow-up on the thread "[Discuss] Tuning FLIP-49 configuration > default values" with a bit more details. > > If no further objections, we can conclude this last point in this > discussion. > > Best, > Andrey > > On Tue, Jan 14, 2020 at 4:28 PM Stephan Ewen <se...@apache.org> wrote: > > > I think that problem exists anyways and is independent of the "-tm" > option. > > > > You can have a combination of `task.heap.size` and `managed.size` and > > `framework.heap.size` that conflicts with `flink.size`. In that case, we > > get an exception during the startup (incompatible memory configuration)? > > That is the price for having these "shortcut" options (`flink.size` and > > `process.size`). But it is a fair price, because the shortcuts are very > > valuable to most users. > > > > That is added with the "-tm" setting is that this is an easy way to get > the > > shortcut setting added, even if it was not in the config. So where a > config > > worked in standalone, it can now fail in a container setup when "-tm" is > > used. > > But that is expected, I guess, when you start manually tune different > > memory types and end up defining an inconsistent combination. And it > never > > conflicts with the default setup, only with manually tuned regions. > > > > But this example shows why we need to have log output for the logic that > > validates and configures the memory settings, so that users understand > what > > is happening. > > > > Best, > > Stephan > > > > > > On Tue, Jan 14, 2020 at 2:54 PM Till Rohrmann <trohrm...@apache.org> > > wrote: > > > > > Clearing the `flink.size` option and setting `process.size` could > indeed > > be > > > a solution. The thing I'm wondering is what would happen if the user > has > > > configured `task.heap.size` and `managed.size` instead of `flink.size`? > > > Would we also ignore these settings? If not, then we risk to run into > the > > > situation that TaskExecutorResourceUtils fails because the memory does > > not > > > add up. I guess the thing which bugs me a bit is the special casing > which > > > could lead easily into inconsistent behaviour if we don't cover all > > cases. > > > The consequence of using slightly different concepts (`flink.size`, > > > `process.size`) in standalone vs. container/Yarn/Mesos mode in order to > > > keep the status quo is an increased maintenance overhead which we > should > > be > > > aware of. > > > > > > Cheers, > > > Till > > > > > > On Tue, Jan 14, 2020 at 3:59 AM Xintong Song <tonysong...@gmail.com> > > > wrote: > > > > > > > True, even we have "process.size" rather than "flink.size" in the > > default > > > > config file, user can still have "flink.size" in their custom config > > > file. > > > > If we consider "-tm" as a shortcut for users to override the TM > memory > > > > size, then it makes less sense to require users to remove > "flink.size" > > > from > > > > their config file whenever then want to use "-tm". > > > > > > > > I'm convinced that ignoring "flink.size" might not be a bad idea. > > > > And I think we should also update the description of "-tm" (in > > > > "FlinkYarnSessionCli") to explicitly mention that it would overwrite > > > > "flink.size" and "process.size". > > > > > > > > Thank you~ > > > > > > > > Xintong Song > > > > > > > > > > > > > > > > On Tue, Jan 14, 2020 at 2:24 AM Stephan Ewen <se...@apache.org> > wrote: > > > > > > > > > Would be good to hear the thoughts of some more Yarn users, though. > > > > > > > > > > On Mon, Jan 13, 2020 at 7:23 PM Stephan Ewen <se...@apache.org> > > wrote: > > > > > > > > > > > I think we need an interpretation of "-tm" regardless of what is > in > > > the > > > > > > default configuration, because we can always have a modified > > > > > configuration > > > > > > and then a user passes the "-tm" flag. > > > > > > > > > > > > I kind of like the first proposal: Interpret "-tm" as "override > > > memory > > > > > > size config and set the Yarn TM container size." It would mean > > > exactly > > > > > > ignoring "taskmanager.memory.flink.size" and using the "-tm" > value > > > as " > > > > > > "taskmanager.memory.process.size". > > > > > > That does not sound too bad to me. > > > > > > > > > > > > Best, > > > > > > Stephan > > > > > > > > > > > > > > > > > > On Mon, Jan 13, 2020 at 5:35 PM Andrey Zagrebin < > > > azagre...@apache.org> > > > > > > wrote: > > > > > > > > > > > >> Hi all, > > > > > >> > > > > > >> While working on changing process memory to Flink memory in > > default > > > > > >> configuration, Xintong encountered a problem. > > > > > >> When -tm option is used to rewrite container memory, basically > > > process > > > > > >> memory, it can collide with the default Flink memory. > > > > > >> For legacy users it should not be a problem as we adjusted the > > > legacy > > > > > heap > > > > > >> size option to be interpreted differently for standalone and > > > container > > > > > >> modes. > > > > > >> > > > > > >> One solution could be to say in -tm docs that we rewrite both > > > options > > > > > >> under > > > > > >> the hood: process and Flink memory, basically unset Flink memory > > > from > > > > > yaml > > > > > >> config. > > > > > >> The downside is that this adds more magic. > > > > > >> > > > > > >> Alternatively, we can keep process memory in default config and, > > as > > > > > >> mentioned before, increase it to maintain the user experience by > > > > > matching > > > > > >> the previous default setting for heap (now Flink in standalone) > > > size. > > > > > >> The Flink memory can be mentioned in process memory comment as a > > > > simpler > > > > > >> alternative which does not require accounting for JVM overhead. > > > > > >> The downside is again more confusion while trying out Flink and > > > tuning > > > > > >> memory at the same time. > > > > > >> On the other hand, if memory already needs to be tuned it should > > > > > >> quite quickly lead to the necessity of understanding the memory > > > model > > > > in > > > > > >> Flink. > > > > > >> > > > > > >> Best, > > > > > >> Andrey > > > > > >> > > > > > >> On Thu, Jan 9, 2020 at 12:27 PM Stephan Ewen <se...@apache.org> > > > > wrote: > > > > > >> > > > > > >> > Great! Thanks, guys, for the continued effort on this topic! > > > > > >> > > > > > > >> > On Thu, Jan 9, 2020 at 5:27 AM Xintong Song < > > > tonysong...@gmail.com> > > > > > >> wrote: > > > > > >> > > > > > > >> > > Thanks all for the discussion. I believe we have get > consensus > > > on > > > > > all > > > > > >> the > > > > > >> > > open questions discussed in this thread. > > > > > >> > > > > > > > >> > > Since Andrey already create a jira ticket for renaming > shuffle > > > > > memory > > > > > >> > > config keys with "taskmanager.memory.network.*", I'll create > > > > ticket > > > > > >> for > > > > > >> > the > > > > > >> > > other topic that puts flink.size in flink-conf.yaml. > > > > > >> > > > > > > > >> > > Thank you~ > > > > > >> > > > > > > > >> > > Xintong Song > > > > > >> > > > > > > > >> > > > > > > > >> > > > > > > > >> > > On Wed, Jan 8, 2020 at 9:39 PM Andrey Zagrebin < > > > > > azagre...@apache.org> > > > > > >> > > wrote: > > > > > >> > > > > > > > >> > > > It also looks to me that we should only swap network and > > > memory > > > > in > > > > > >> the > > > > > >> > > > option names: 'taskmanager.memory.network.*'. > > > > > >> > > > There is no strong consensus towards using new 'shuffle' > > > naming > > > > so > > > > > >> we > > > > > >> > can > > > > > >> > > > just rename it to 'taskmanager.memory.network.*' as > > 'shuffle' > > > > > >> naming > > > > > >> > has > > > > > >> > > > never been released. > > > > > >> > > > When we have other shuffle services and start advertising > > more > > > > > this > > > > > >> > > concept > > > > > >> > > > in Flink, we could revisit again the whole naming for this > > > > > concept. > > > > > >> > > > https://jira.apache.org/jira/browse/FLINK-15517 > > > > > >> > > > > > > > > >> > > > > > > > >> > > > > > > >> > > > > > > > > > > > > > > > > > > > > >