>
> Would the default weight for OPERATOR and STATE_BACKEND be the same value?
>

I would say yes, to align with previous behaviors.


Thank you~

Xintong Song



On Tue, Jan 5, 2021 at 5:51 PM Till Rohrmann <trohrm...@apache.org> wrote:

> +1 for Jark's and Xintong's proposal.
>
> Would the default weight for OPERATOR and STATE_BACKEND be the same value?
>
> Cheers,
> Till
>
> On Tue, Jan 5, 2021 at 6:39 AM Jingsong Li <jingsongl...@gmail.com> wrote:
>
> > +1 for allowing streaming operators to use managed memory.
> >
> > The memory use of streams requires some hierarchy, and the bottom layer
> is
> > undoubtedly the current StateBackend.
> > Let the stream operators freely use the managed memory, which will make
> the
> > memory management model to be unified and give the operator free space.
> >
> > Xingtong's proposal looks good to me. +1 to split `DATAPROC` into
> > `STATE_BACKEND` or `OPERATOR`.
> >
> > Best,
> > Jingsong
> >
> > On Tue, Jan 5, 2021 at 12:33 PM Jark Wu <imj...@gmail.com> wrote:
> >
> > > +1 to Xingtong's proposal!
> > >
> > > Best,
> > > Jark
> > >
> > > On Tue, 5 Jan 2021 at 12:13, Xintong Song <tonysong...@gmail.com>
> wrote:
> > >
> > > > +1 for allowing streaming operators to use managed memory.
> > > >
> > > > As for the consumer names, I'm afraid using `DATAPROC` for both
> > streaming
> > > > ops and state backends will not work. Currently, RocksDB state
> backend
> > > uses
> > > > a shared piece of memory for all the states within that slot. It's
> not
> > > the
> > > > operator's decision how much memory it uses for the states.
> > > >
> > > > I would suggest the following. (IIUC, the same as what Jark proposed)
> > > > * `OPERATOR` for both streaming and bath operators
> > > > * `STATE_BACKEND` for state backends
> > > > * `PYTHON` for python processes
> > > > * `DATAPROC` as a legacy key for state backend or batch operators if
> > > > `STATE_BACKEND` or `OPERATOR` are not specified.
> > > >
> > > > Thank you~
> > > >
> > > > Xintong Song
> > > >
> > > >
> > > >
> > > > On Tue, Jan 5, 2021 at 11:23 AM Jark Wu <imj...@gmail.com> wrote:
> > > >
> > > > > Hi Aljoscha,
> > > > >
> > > > > I think we may need to divide `DATAPROC` into `OPERATOR` and
> > > > > `STATE_BACKEND`, because they have different scope (slot vs.
> > operator).
> > > > > But @Xintong Song <tonysong...@gmail.com> may have more insights
> on
> > > it.
> > > > >
> > > > > Best,
> > > > > Jark
> > > > >
> > > > >
> > > > > On Mon, 4 Jan 2021 at 20:44, Aljoscha Krettek <aljos...@apache.org
> >
> > > > wrote:
> > > > >
> > > > >> I agree, we should allow streaming operators to use managed memory
> > for
> > > > >> other use cases.
> > > > >>
> > > > >> Do you think we need an additional "consumer" setting or that they
> > > would
> > > > >> just use `DATAPROC` and decide by themselves what to use the
> memory
> > > for?
> > > > >>
> > > > >> Best,
> > > > >> Aljoscha
> > > > >>
> > > > >> On 2020/12/22 17:14, Jark Wu wrote:
> > > > >> >Hi all,
> > > > >> >
> > > > >> >I found that currently the managed memory can only be used in 3
> > > > workloads
> > > > >> >[1]:
> > > > >> >- state backends for streaming jobs
> > > > >> >- sorting, hash tables for batch jobs
> > > > >> >- python UDFs
> > > > >> >
> > > > >> >And the configuration option
> > > > >> `taskmanager.memory.managed.consumer-weights`
> > > > >> >only allows values: PYTHON and DATAPROC (state in streaming or
> > > > algorithms
> > > > >> >in batch).
> > > > >> >I'm confused why it doesn't allow streaming operators to use
> > managed
> > > > >> memory
> > > > >> >for purposes other than state backends.
> > > > >> >
> > > > >> >The background is that we are planning to use some batch
> algorithms
> > > > >> >(sorting & bytes hash table) to improve the performance of
> > streaming
> > > > SQL
> > > > >> >operators, especially for the mini-batch operators.
> > > > >> >Currently, the mini-batch operators are buffering input records
> and
> > > > >> >accumulators in heap (i.e. Java HashMap) which is not efficient
> and
> > > > there
> > > > >> >are potential risks of full GC and OOM.
> > > > >> >With the managed memory, we can fully use the memory to buffer
> more
> > > > data
> > > > >> >without worrying about OOM and improve the performance a lot.
> > > > >> >
> > > > >> >What do you think about allowing streaming operators to use
> managed
> > > > >> memory
> > > > >> >and exposing it in configuration.
> > > > >> >
> > > > >> >Best,
> > > > >> >Jark
> > > > >> >
> > > > >> >[1]:
> > > > >> >
> > > > >>
> > > >
> > >
> >
> https://ci.apache.org/projects/flink/flink-docs-master/deployment/memory/mem_setup_tm.html#managed-memory
> > > > >>
> > > > >
> > > >
> > >
> >
> >
> > --
> > Best, Jingsong Lee
> >
>

Reply via email to