Thanks all for the discussion.

We agreed to have process.size in the default settings with the explanation
> of flink.size alternative in the comment.
> This way we keep -tm as a shortcut to process.size only and any
> inconsistencies fail fast as if configured in yaml.
>

The conclusions sounds good to me. I'll update the JIRA description and PR
for FLINK-15530.

+1 to resolve this discussion thread. We can keep the default values
discussion in the "tuning FLIP-49 default configuration values" ML thread.


Thank you~

Xintong Song



On Wed, Jan 15, 2020 at 5:54 AM Andrey Zagrebin <azagre...@apache.org>
wrote:

> Hi all,
>
> Stephan, Till and me had another offline discussion today. Here is the
> outcome of our brainstorm.
>
> We agreed to have process.size in the default settings with the explanation
> of flink.size alternative in the comment.
> This way we keep -tm as a shortcut to process.size only and any
> inconsistencies fail fast as if configured in yaml.
> I will also follow-up on the thread "[Discuss] Tuning FLIP-49 configuration
> default values" with a bit more details.
>
> If no further objections, we can conclude this last point in this
> discussion.
>
> Best,
> Andrey
>
> On Tue, Jan 14, 2020 at 4:28 PM Stephan Ewen <se...@apache.org> wrote:
>
> > I think that problem exists anyways and is independent of the "-tm"
> option.
> >
> > You can have a combination of `task.heap.size` and `managed.size` and
> > `framework.heap.size` that conflicts with `flink.size`. In that case, we
> > get an exception during the startup (incompatible memory configuration)?
> > That is the price for having these "shortcut" options (`flink.size` and
> > `process.size`). But it is a fair price, because the shortcuts are very
> > valuable to most users.
> >
> > That is added with the "-tm" setting is that this is an easy way to get
> the
> > shortcut setting added, even if it was not in the config. So where a
> config
> > worked in standalone, it can now fail in a container setup when "-tm" is
> > used.
> > But that is expected, I guess, when you start manually tune different
> > memory types and end up defining an inconsistent combination. And it
> never
> > conflicts with the default setup, only with manually tuned regions.
> >
> > But this example shows why we need to have log output for the logic that
> > validates and configures the memory settings, so that users understand
> what
> > is happening.
> >
> > Best,
> > Stephan
> >
> >
> > On Tue, Jan 14, 2020 at 2:54 PM Till Rohrmann <trohrm...@apache.org>
> > wrote:
> >
> > > Clearing the `flink.size` option and setting `process.size` could
> indeed
> > be
> > > a solution. The thing I'm wondering is what would happen if the user
> has
> > > configured `task.heap.size` and `managed.size` instead of `flink.size`?
> > > Would we also ignore these settings? If not, then we risk to run into
> the
> > > situation that TaskExecutorResourceUtils fails because the memory does
> > not
> > > add up. I guess the thing which bugs me a bit is the special casing
> which
> > > could lead easily into inconsistent behaviour if we don't cover all
> > cases.
> > > The consequence of using slightly different concepts (`flink.size`,
> > > `process.size`) in standalone vs. container/Yarn/Mesos mode in order to
> > > keep the status quo is an increased maintenance overhead which we
> should
> > be
> > > aware of.
> > >
> > > Cheers,
> > > Till
> > >
> > > On Tue, Jan 14, 2020 at 3:59 AM Xintong Song <tonysong...@gmail.com>
> > > wrote:
> > >
> > > > True, even we have "process.size" rather than "flink.size" in the
> > default
> > > > config file, user can still have "flink.size" in their custom config
> > > file.
> > > > If we consider "-tm" as a shortcut for users to override the TM
> memory
> > > > size, then it makes less sense to require users to remove
> "flink.size"
> > > from
> > > > their config file whenever then want to use "-tm".
> > > >
> > > > I'm convinced that ignoring "flink.size" might not be a bad idea.
> > > > And I think we should also update the description of "-tm" (in
> > > > "FlinkYarnSessionCli") to explicitly mention that it would overwrite
> > > > "flink.size" and "process.size".
> > > >
> > > > Thank you~
> > > >
> > > > Xintong Song
> > > >
> > > >
> > > >
> > > > On Tue, Jan 14, 2020 at 2:24 AM Stephan Ewen <se...@apache.org>
> wrote:
> > > >
> > > > > Would be good to hear the thoughts of some more Yarn users, though.
> > > > >
> > > > > On Mon, Jan 13, 2020 at 7:23 PM Stephan Ewen <se...@apache.org>
> > wrote:
> > > > >
> > > > > > I think we need an interpretation of "-tm" regardless of what is
> in
> > > the
> > > > > > default configuration, because we can always have a modified
> > > > > configuration
> > > > > > and then a user passes the "-tm" flag.
> > > > > >
> > > > > > I kind of like the first proposal: Interpret "-tm" as "override
> > > memory
> > > > > > size config and set the Yarn TM container size." It would mean
> > > exactly
> > > > > > ignoring "taskmanager.memory.flink.size" and using the "-tm"
> value
> > > as "
> > > > > > "taskmanager.memory.process.size".
> > > > > > That does not sound too bad to me.
> > > > > >
> > > > > > Best,
> > > > > > Stephan
> > > > > >
> > > > > >
> > > > > > On Mon, Jan 13, 2020 at 5:35 PM Andrey Zagrebin <
> > > azagre...@apache.org>
> > > > > > wrote:
> > > > > >
> > > > > >> Hi all,
> > > > > >>
> > > > > >> While working on changing process memory to Flink memory in
> > default
> > > > > >> configuration, Xintong encountered a problem.
> > > > > >> When -tm option is used to rewrite container memory, basically
> > > process
> > > > > >> memory, it can collide with the default Flink memory.
> > > > > >> For legacy users it should not be a problem as we adjusted the
> > > legacy
> > > > > heap
> > > > > >> size option to be interpreted differently for standalone and
> > > container
> > > > > >> modes.
> > > > > >>
> > > > > >> One solution could be to say in -tm docs that we rewrite both
> > > options
> > > > > >> under
> > > > > >> the hood: process and Flink memory, basically unset Flink memory
> > > from
> > > > > yaml
> > > > > >> config.
> > > > > >> The downside is that this adds more magic.
> > > > > >>
> > > > > >> Alternatively, we can keep process memory in default config and,
> > as
> > > > > >> mentioned before, increase it to maintain the user experience by
> > > > > matching
> > > > > >> the previous default setting for heap (now Flink in standalone)
> > > size.
> > > > > >> The Flink memory can be mentioned in process memory comment as a
> > > > simpler
> > > > > >> alternative which does not require accounting for JVM overhead.
> > > > > >> The downside is again more confusion while trying out Flink and
> > > tuning
> > > > > >> memory at the same time.
> > > > > >> On the other hand, if memory already needs to be tuned it should
> > > > > >> quite quickly lead to the necessity of understanding the memory
> > > model
> > > > in
> > > > > >> Flink.
> > > > > >>
> > > > > >> Best,
> > > > > >> Andrey
> > > > > >>
> > > > > >> On Thu, Jan 9, 2020 at 12:27 PM Stephan Ewen <se...@apache.org>
> > > > wrote:
> > > > > >>
> > > > > >> > Great! Thanks, guys, for the continued effort on this topic!
> > > > > >> >
> > > > > >> > On Thu, Jan 9, 2020 at 5:27 AM Xintong Song <
> > > tonysong...@gmail.com>
> > > > > >> wrote:
> > > > > >> >
> > > > > >> > > Thanks all for the discussion. I believe we have get
> consensus
> > > on
> > > > > all
> > > > > >> the
> > > > > >> > > open questions discussed in this thread.
> > > > > >> > >
> > > > > >> > > Since Andrey already create a jira ticket for renaming
> shuffle
> > > > > memory
> > > > > >> > > config keys with "taskmanager.memory.network.*", I'll create
> > > > ticket
> > > > > >> for
> > > > > >> > the
> > > > > >> > > other topic that puts flink.size in flink-conf.yaml.
> > > > > >> > >
> > > > > >> > > Thank you~
> > > > > >> > >
> > > > > >> > > Xintong Song
> > > > > >> > >
> > > > > >> > >
> > > > > >> > >
> > > > > >> > > On Wed, Jan 8, 2020 at 9:39 PM Andrey Zagrebin <
> > > > > azagre...@apache.org>
> > > > > >> > > wrote:
> > > > > >> > >
> > > > > >> > > > It also looks to me that we should only swap network and
> > > memory
> > > > in
> > > > > >> the
> > > > > >> > > > option names: 'taskmanager.memory.network.*'.
> > > > > >> > > > There is no strong consensus towards using new 'shuffle'
> > > naming
> > > > so
> > > > > >> we
> > > > > >> > can
> > > > > >> > > > just rename it to  'taskmanager.memory.network.*' as
> > 'shuffle'
> > > > > >> naming
> > > > > >> > has
> > > > > >> > > > never been released.
> > > > > >> > > > When we have other shuffle services and start advertising
> > more
> > > > > this
> > > > > >> > > concept
> > > > > >> > > > in Flink, we could revisit again the whole naming for this
> > > > > concept.
> > > > > >> > > > https://jira.apache.org/jira/browse/FLINK-15517
> > > > > >> > > >
> > > > > >> > >
> > > > > >> >
> > > > > >>
> > > > > >
> > > > >
> > > >
> > >
> >
>

Reply via email to