Hi all,

Given the simplicity, I also +1 for PTF or any other function
implementation if PTF is not applicable for this.

I would like to raise a consideration regarding the usage implementation:
> Would it be necessary to allow users to utilize the CREATE FUNCTION
> statement for registering the PTF?


 I'd also suggest we make it built-in without registration.

Currently, Flink SQL supports letting external systems register modules and
> leverage these modules to centrally manage all function definitions. Given
> this architectural approach, I’m curious if the plan involves introducing
> additional functions in the future. If so, I would advocate for introducing
> a dedicated state module to centralize such management. This would empower
> users to:


I can’t think of any further functions for now, but I'd +1 for a module if
it could omit the registration.


Best,
Zakelly.



On Fri, Mar 28, 2025 at 10:25 AM Shengkai Fang <fskm...@gmail.com> wrote:

> One more question about the FLIP.
>
> I think the output schema is definitely a public API to users. If users
> use the `CREATE FUNCTION` statement, is it means the class path is also a
> public API to users. Alternatively, this is merely an experimental feature
> and we don't have any promise about this function.
>
> Best,
> Shengkai
>
> Shengkai Fang <fskm...@gmail.com> 于2025年3月28日周五 10:20写道:
>
>> +1 to use PTF.
>>
>> I would like to raise a consideration regarding the usage implementation:
>> Would it be necessary to allow users to utilize the CREATE FUNCTION
>> statement for registering the PTF?
>>
>> Currently, Flink SQL supports letting external systems register modules
>> and leverage these modules to centrally manage all function definitions.
>> Given this architectural approach, I’m curious if the plan involves
>> introducing additional functions in the future. If so, I would advocate for
>> introducing a dedicated state module to centralize such management. This
>> would empower users to:
>>
>> 1. Simply execute the LOAD MODULE command to load the required module, and
>> 2. Directly invoke read_metadata thereafter.
>>
>> For more details about the module, please refer to this document[1].
>>
>> Best,
>> Shengkai
>>
>> [1]
>> https://nightlies.apache.org/flink/flink-docs-master/docs/dev/table/modules/
>>
>> Gabor Somogyi <gabor.g.somo...@gmail.com> 于2025年3月28日周五 00:26写道:
>>
>>> Just found out that PTF in batch mode is not supported, plz see the dev
>>> mailing about it [1].
>>>
>>> [1] https://lists.apache.org/thread/ytm9m1qt4pq2q2gjngfktrn8vrlvkf07
>>>
>>> BR,
>>> G
>>>
>>>
>>> On Thu, Mar 27, 2025 at 3:38 PM Gabor Somogyi <gabor.g.somo...@gmail.com
>>> >
>>> wrote:
>>>
>>> > In the meantime I've just updated the FLIP according to this to be
>>> > optimistic 🙂
>>> >
>>> > BR,
>>> > G
>>> >
>>> > On Thu, Mar 27, 2025 at 2:15 PM Gabor Somogyi <
>>> gabor.g.somo...@gmail.com>
>>> > wrote:
>>> >
>>> >> Considering all the facts I also +1 on PTF. Even if something is
>>> missing
>>> >> we can add later.
>>> >>
>>> >> @Zakelly Lan <zakelly....@gmail.com> @Shengkai Fang are you also on
>>> the
>>> >> same page or have something to add?
>>> >>
>>> >> BR,
>>> >> G
>>> >>
>>> >>
>>> >> On Thu, Mar 27, 2025 at 1:50 PM Lincoln Lee <lincoln.8...@gmail.com>
>>> >> wrote:
>>> >>
>>> >>> +1 for PTF
>>> >>>
>>> >>> > Is it possible to describe such function to see the column
>>> names/types?
>>> >>>
>>> >>> Although Flink SQL does not directly support this feature, users can
>>> >>> achieve
>>> >>> similar results with the help of `explain` syntax, e.g.
>>> >>> 'explain select * from read_state_metadata(...)'
>>> >>>
>>> >>>
>>> >>> Best,
>>> >>> Lincoln Lee
>>> >>>
>>> >>>
>>> >>> Gyula Fóra <gyula.f...@gmail.com> 于2025年3月27日周四 20:41写道:
>>> >>>
>>> >>> > Hey!
>>> >>> >
>>> >>> > I think the PTF approach strikes a great balance in simplicity and
>>> the
>>> >>> > capabilities that we get out of it.
>>> >>> >
>>> >>> > I think this could be a completely viable alternative to the
>>> dedicated
>>> >>> > connector, +1.
>>> >>> >
>>> >>> > Cheers,
>>> >>> > Gyula
>>> >>> >
>>> >>> > On Thu, Mar 27, 2025 at 10:37 AM Shengkai Fang <fskm...@gmail.com>
>>> >>> wrote:
>>> >>> >
>>> >>> > > Hi, Gabor.
>>> >>> > >
>>> >>> > > > Do I understand correctly that this is 2.x only feature and we
>>> >>> can't
>>> >>> > > backport it to 1.x line
>>> >>> > >
>>> >>> > > Yes. PTF is only supported in 2.x verison.
>>> >>> > >
>>> >>> > > > Is it possible to describe such function to see the column
>>> >>> names/types?
>>> >>> > >
>>> >>> > > Flink SQL doesn't support this feature, but postgres[2] or
>>> mysql[1]
>>> >>> has
>>> >>> > > similar feature.
>>> >>> > >
>>> >>> > > [1]
>>> >>> https://dev.mysql.com/doc/refman/8.4/en/show-create-procedure.html
>>> >>> > > [2]
>>> >>> > >
>>> >>> > >
>>> >>> >
>>> >>>
>>> https://stackoverflow.com/questions/6898453/show-the-code-of-a-function-procedure-and-trigger-in-postgresql
>>> >>> > >
>>> >>> > > Best,
>>> >>> > > Shengkai
>>> >>> > >
>>> >>> > >
>>> >>> > > Gabor Somogyi <gabor.g.somo...@gmail.com> 于2025年3月27日周四 16:25写道:
>>> >>> > >
>>> >>> > > > Hi Shengkai,
>>> >>> > > >
>>> >>> > > > Thanks for your effort with the example, this looks promising.
>>> >>> > > > I like the fact that users wouldn't need to sweat with complex
>>> >>> create
>>> >>> > > table
>>> >>> > > > statements.
>>> >>> > > >
>>> >>> > > > Couple of questions:
>>> >>> > > > * Do I understand correctly that this is 2.x only feature and
>>> we
>>> >>> can't
>>> >>> > > > backport it to 1.x line?
>>> >>> > > > I'm not intended to do any backport, just would like to know
>>> the
>>> >>> > > technical
>>> >>> > > > constraints.
>>> >>> > > > * Is it possible to describe such function to see the column
>>> >>> > names/types?
>>> >>> > > >
>>> >>> > > > BR,
>>> >>> > > > G
>>> >>> > > >
>>> >>> > > >
>>> >>> > > > On Thu, Mar 27, 2025 at 3:17 AM Shengkai Fang <
>>> fskm...@gmail.com>
>>> >>> > wrote:
>>> >>> > > >
>>> >>> > > > > Many thanks for your reminder, Leonard. Here's the link I
>>> >>> > mentioned[1].
>>> >>> > > > >
>>> >>> > > > > Best,
>>> >>> > > > > Shengkai
>>> >>> > > > >
>>> >>> > > > > [1] https://github.com/apache/flink/pull/26358
>>> >>> > > > >
>>> >>> > > > > Leonard Xu <xbjt...@gmail.com> 于2025年3月27日周四 10:05写道:
>>> >>> > > > >
>>> >>> > > > > > Your link is broken, Shengkai
>>> >>> > > > > >
>>> >>> > > > > > Best,
>>> >>> > > > > > Leonard
>>> >>> > > > > >
>>> >>> > > > > > > 2025年3月27日 10:01,Shengkai Fang <fskm...@gmail.com> 写道:
>>> >>> > > > > > >
>>> >>> > > > > > > Hi, All.
>>> >>> > > > > > >
>>> >>> > > > > > > I write a simple demo to illustrate my idea. Hope this
>>> helps.
>>> >>> > > > > > >
>>> >>> > > > > > > Best,
>>> >>> > > > > > > Shengkai
>>> >>> > > > > > >
>>> >>> > > > > > >
>>> >>> > > > > >
>>> >>> > > > >
>>> >>> > > >
>>> >>> > >
>>> >>> >
>>> >>>
>>> https://github.com/apache/flink/compare/master...fsk119:flink:example?expand=1
>>> >>> > > > > > >
>>> >>> > > > > > > Gabor Somogyi <gabor.g.somo...@gmail.com> 于2025年3月26日周三
>>> >>> 15:54写道:
>>> >>> > > > > > >
>>> >>> > > > > > >>> I'm fine with a seperate SQL connector for metadata, so
>>> >>> maybe
>>> >>> > we
>>> >>> > > > > could
>>> >>> > > > > > >> update the FLIP about our discussion?
>>> >>> > > > > > >>
>>> >>> > > > > > >> Sorry, I've forgotten this part. Yeah, no matter we
>>> choose
>>> >>> I'm
>>> >>> > > going
>>> >>> > > > > to
>>> >>> > > > > > >> update the FLIP.
>>> >>> > > > > > >>
>>> >>> > > > > > >> G
>>> >>> > > > > > >>
>>> >>> > > > > > >>
>>> >>> > > > > > >> On Wed, Mar 26, 2025 at 8:51 AM Gabor Somogyi <
>>> >>> > > > > > gabor.g.somo...@gmail.com>
>>> >>> > > > > > >> wrote:
>>> >>> > > > > > >>
>>> >>> > > > > > >>> Hi All,
>>> >>> > > > > > >>>
>>> >>> > > > > > >>> I've also lack of the knowledge of PTF so I've read
>>> just
>>> >>> the
>>> >>> > > > > motivation
>>> >>> > > > > > >>> part:
>>> >>> > > > > > >>>
>>> >>> > > > > > >>> "The SQL 2016 standard introduced a way of defining
>>> custom
>>> >>> SQL
>>> >>> > > > > > operators
>>> >>> > > > > > >>> defined by ISO/IEC 19075-7:2021 (Part 7: Polymorphic
>>> table
>>> >>> > > > > functions).
>>> >>> > > > > > >>> ~200 pages define how this new kind of function can
>>> >>> consume and
>>> >>> > > > > produce
>>> >>> > > > > > >>> tables with various execution properties.
>>> >>> > > > > > >>> Unfortunately, this part of the standard is not
>>> publicly
>>> >>> > > > available."
>>> >>> > > > > > >>>
>>> >>> > > > > > >>> Of course we can take a look at some examples but do we
>>> >>> really
>>> >>> > > want
>>> >>> > > > > to
>>> >>> > > > > > >>> expose state data with this construct
>>> >>> > > > > > >>> which is described in ~200 pages and part of the
>>> standard
>>> >>> is
>>> >>> > not
>>> >>> > > > > > publicly
>>> >>> > > > > > >>> available? 🙂
>>> >>> > > > > > >>> I mean the dataset is couple of rows and the use-case
>>> is
>>> >>> join
>>> >>> > > with
>>> >>> > > > > > >> another
>>> >>> > > > > > >>> table like with state data.
>>> >>> > > > > > >>> If somebody can give advantages I would buy that but
>>> from
>>> >>> my
>>> >>> > > > limited
>>> >>> > > > > > >>> understanding this would be an overkill here.
>>> >>> > > > > > >>>
>>> >>> > > > > > >>> BR,
>>> >>> > > > > > >>> G
>>> >>> > > > > > >>>
>>> >>> > > > > > >>>
>>> >>> > > > > > >>> On Wed, Mar 26, 2025 at 8:28 AM Gyula Fóra <
>>> >>> > gyula.f...@gmail.com
>>> >>> > > >
>>> >>> > > > > > wrote:
>>> >>> > > > > > >>>
>>> >>> > > > > > >>>> Hi Zakelly , Shengkai!
>>> >>> > > > > > >>>>
>>> >>> > > > > > >>>> I don't know too much about PTFs, it would be
>>> interesting
>>> >>> to
>>> >>> > see
>>> >>> > > > how
>>> >>> > > > > > the
>>> >>> > > > > > >>>> usage would look in practice.
>>> >>> > > > > > >>>>
>>> >>> > > > > > >>>> Do you have some mockup/example in mind how the PTF
>>> would
>>> >>> look
>>> >>> > > for
>>> >>> > > > > > >> example
>>> >>> > > > > > >>>> when want to:
>>> >>> > > > > > >>>> - Simply display/aggregate whats in the metadata
>>> >>> > > > > > >>>> - Join keyed state with some metadata columns
>>> >>> > > > > > >>>>
>>> >>> > > > > > >>>> Thanks
>>> >>> > > > > > >>>> Gyula
>>> >>> > > > > > >>>>
>>> >>> > > > > > >>>> On Wed, Mar 26, 2025 at 7:33 AM Zakelly Lan <
>>> >>> > > > zakelly....@gmail.com>
>>> >>> > > > > > >>>> wrote:
>>> >>> > > > > > >>>>
>>> >>> > > > > > >>>>> Hi everyone,
>>> >>> > > > > > >>>>>
>>> >>> > > > > > >>>>> I'm fine with a seperate SQL connector for metadata,
>>> so
>>> >>> maybe
>>> >>> > > we
>>> >>> > > > > > could
>>> >>> > > > > > >>>>> update the FLIP about our discussion? And Shengkai
>>> >>> provides a
>>> >>> > > PTF
>>> >>> > > > > > >>>>> implementation, does that also meet the requirement?
>>> >>> > > > > > >>>>>
>>> >>> > > > > > >>>>>
>>> >>> > > > > > >>>>> Best,
>>> >>> > > > > > >>>>> Zakelly
>>> >>> > > > > > >>>>>
>>> >>> > > > > > >>>>> On Thu, Mar 20, 2025 at 4:47 PM Gabor Somogyi <
>>> >>> > > > > > >>>> gabor.g.somo...@gmail.com>
>>> >>> > > > > > >>>>> wrote:
>>> >>> > > > > > >>>>>
>>> >>> > > > > > >>>>>> Hi All,
>>> >>> > > > > > >>>>>>
>>> >>> > > > > > >>>>>> @Zakelly: Gyula summarised it correctly what I
>>> meant so
>>> >>> > please
>>> >>> > > > > treat
>>> >>> > > > > > >>>> the
>>> >>> > > > > > >>>>>> content as mine.
>>> >>> > > > > > >>>>>> As an addition I'm not against to add CLI at all,
>>> I'm
>>> >>> just
>>> >>> > > > stating
>>> >>> > > > > > >>>> that
>>> >>> > > > > > >>>>> in
>>> >>> > > > > > >>>>>> some cases like this, users would like to have
>>> >>> > > > > > >>>>>> a self-serving solution where they can provide SQL
>>> >>> > statements
>>> >>> > > > > which
>>> >>> > > > > > >>>> can
>>> >>> > > > > > >>>>>> trigger alerts automatically.
>>> >>> > > > > > >>>>>>
>>> >>> > > > > > >>>>>> My personal opinion is that CLI would be beneficial
>>> for
>>> >>> > > several
>>> >>> > > > > > >>>> cases. A
>>> >>> > > > > > >>>>>> good example is when users want to restart job
>>> >>> > > > > > >>>>>> from specific Kafka offsets which are persisted in a
>>> >>> > > savepoint.
>>> >>> > > > > For
>>> >>> > > > > > >>>> such
>>> >>> > > > > > >>>>>> scenario users are more than happy since they
>>> >>> > > > > > >>>>>> expect manual intervention with full control. So
>>> all in
>>> >>> all
>>> >>> > > one
>>> >>> > > > > can
>>> >>> > > > > > >>>> count
>>> >>> > > > > > >>>>>> on my +1 when CLI FLIP would come up...
>>> >>> > > > > > >>>>>>
>>> >>> > > > > > >>>>>> BR,
>>> >>> > > > > > >>>>>> G
>>> >>> > > > > > >>>>>>
>>> >>> > > > > > >>>>>>
>>> >>> > > > > > >>>>>> On Thu, Mar 20, 2025 at 8:20 AM Gyula Fóra <
>>> >>> > > > gyula.f...@gmail.com>
>>> >>> > > > > > >>>> wrote:
>>> >>> > > > > > >>>>>>
>>> >>> > > > > > >>>>>>> Hi!
>>> >>> > > > > > >>>>>>>
>>> >>> > > > > > >>>>>>> @Zakelly Lan <zakelly....@gmail.com>
>>> >>> > > > > > >>>>>>> I think what Gabor means is that users want to have
>>> >>> > > predefined
>>> >>> > > > > SQL
>>> >>> > > > > > >>>>> scripts
>>> >>> > > > > > >>>>>>> to perform state analysis tasks to debug/identify
>>> >>> problems.
>>> >>> > > > > > >>>>>>> Such as write a SQL script that joins the metadata
>>> >>> table
>>> >>> > with
>>> >>> > > > the
>>> >>> > > > > > >>>> state
>>> >>> > > > > > >>>>>>> and
>>> >>> > > > > > >>>>>>> do some analytics on it.
>>> >>> > > > > > >>>>>>>
>>> >>> > > > > > >>>>>>> If we have a meta table then the SQL script that
>>> can do
>>> >>> > this
>>> >>> > > is
>>> >>> > > > > > >> fixed
>>> >>> > > > > > >>>>> and
>>> >>> > > > > > >>>>>>> users can trigger this on demand by simply
>>> providing a
>>> >>> new
>>> >>> > > > > > >> savepoint
>>> >>> > > > > > >>>>> path.
>>> >>> > > > > > >>>>>>>
>>> >>> > > > > > >>>>>>> If we have a different mechanism to extract
>>> metadata
>>> >>> that
>>> >>> > is
>>> >>> > > > not
>>> >>> > > > > > >> SQL
>>> >>> > > > > > >>>>>>> native
>>> >>> > > > > > >>>>>>> then manual steps need to be executed and a custom
>>> SQL
>>> >>> > script
>>> >>> > > > > would
>>> >>> > > > > > >>>> need
>>> >>> > > > > > >>>>>>> to
>>> >>> > > > > > >>>>>>> be written that adds the manually extracted
>>> metadata
>>> >>> into
>>> >>> > the
>>> >>> > > > > > >> script.
>>> >>> > > > > > >>>>>>>
>>> >>> > > > > > >>>>>>> Cheers,
>>> >>> > > > > > >>>>>>> Gyula
>>> >>> > > > > > >>>>>>>
>>> >>> > > > > > >>>>>>> On Thu, Mar 20, 2025 at 4:32 AM Zakelly Lan <
>>> >>> > > > > zakelly....@gmail.com
>>> >>> > > > > > >>>
>>> >>> > > > > > >>>>>>> wrote:
>>> >>> > > > > > >>>>>>>
>>> >>> > > > > > >>>>>>>> Hi all,
>>> >>> > > > > > >>>>>>>>
>>> >>> > > > > > >>>>>>>> Thanks for your answers! Getting everyone aligned
>>> on
>>> >>> this
>>> >>> > > > topic
>>> >>> > > > > > >> is
>>> >>> > > > > > >>>>>>>> challenging, but it’s definitely worth the effort
>>> >>> since it
>>> >>> > > > will
>>> >>> > > > > > >>>> help
>>> >>> > > > > > >>>>>>>> streamline things moving forward.
>>> >>> > > > > > >>>>>>>>
>>> >>> > > > > > >>>>>>>> @Gabor are you saying that users are using some
>>> >>> scripts to
>>> >>> > > > > define
>>> >>> > > > > > >>>> the
>>> >>> > > > > > >>>>>>> SQL
>>> >>> > > > > > >>>>>>>> metadata connector and get the information,
>>> right? If
>>> >>> so,
>>> >>> > > > would
>>> >>> > > > > a
>>> >>> > > > > > >>>> CLI
>>> >>> > > > > > >>>>>>> tool
>>> >>> > > > > > >>>>>>>> be more convenient? It's easy to invoke and can
>>> get
>>> >>> the
>>> >>> > > result
>>> >>> > > > > > >>>>> swiftly.
>>> >>> > > > > > >>>>>>> And
>>> >>> > > > > > >>>>>>>> there should be some other systems to track the
>>> >>> checkpoint
>>> >>> > > > > > >> lineage
>>> >>> > > > > > >>>> and
>>> >>> > > > > > >>>>>>>> analyze if there are outliers in metadata (e.g.
>>> state
>>> >>> size
>>> >>> > > of
>>> >>> > > > > one
>>> >>> > > > > > >>>>>>> operator)
>>> >>> > > > > > >>>>>>>> right? Well, maybe I missed something so please
>>> >>> correct me
>>> >>> > > if
>>> >>> > > > > I'm
>>> >>> > > > > > >>>>> wrong.
>>> >>> > > > > > >>>>>>>>
>>> >>> > > > > > >>>>>>>> I think the overall vision in Flink SQL is to
>>> provide
>>> >>> a
>>> >>> > SQL
>>> >>> > > > > > >> native
>>> >>> > > > > > >>>>>>>>> environment where we can serve complex use-cases
>>> >>> like you
>>> >>> > > > would
>>> >>> > > > > > >>>>> expect
>>> >>> > > > > > >>>>>>>> in a
>>> >>> > > > > > >>>>>>>>> regular database.
>>> >>> > > > > > >>>>>>>>
>>> >>> > > > > > >>>>>>>>
>>> >>> > > > > > >>>>>>>> @Gyula Well, this is a good point. From the
>>> >>> perspective of
>>> >>> > > > > > >>>>> comprehensive
>>> >>> > > > > > >>>>>>>> SQL experience, I'd +1 for treating metadata as
>>> data.
>>> >>> > > > Although I
>>> >>> > > > > > >>>> doubt
>>> >>> > > > > > >>>>>>> if
>>> >>> > > > > > >>>>>>>> there is a need for processing metadata, I won't
>>> be
>>> >>> > against
>>> >>> > > a
>>> >>> > > > > > >>>> separate
>>> >>> > > > > > >>>>>>>> connector.
>>> >>> > > > > > >>>>>>>>
>>> >>> > > > > > >>>>>>>> Regarding the CLI tool, I still think it’s worth
>>> >>> > > implementing.
>>> >>> > > > > > >>>> Such a
>>> >>> > > > > > >>>>>>> tool
>>> >>> > > > > > >>>>>>>> could provide savepoint information before
>>> resuming
>>> >>> from a
>>> >>> > > > > > >>>> savepoint,
>>> >>> > > > > > >>>>>>> which
>>> >>> > > > > > >>>>>>>> would enhance the user experience in CLI-based
>>> >>> workflows.
>>> >>> > It
>>> >>> > > > > > >> would
>>> >>> > > > > > >>>> be
>>> >>> > > > > > >>>>>>> good
>>> >>> > > > > > >>>>>>>> if someone could implement this feature. We
>>> shouldn’t
>>> >>> > worry
>>> >>> > > > > about
>>> >>> > > > > > >>>>>>> whether
>>> >>> > > > > > >>>>>>>> this tool might be retired in the future.
>>> Regardless
>>> >>> of
>>> >>> > the
>>> >>> > > > > > >>>> SQL-based
>>> >>> > > > > > >>>>>>>> solution we eventually adopt, this capability will
>>> >>> remain
>>> >>> > > > > > >> essential
>>> >>> > > > > > >>>>> for
>>> >>> > > > > > >>>>>>> CLI
>>> >>> > > > > > >>>>>>>> users. This is another topic.
>>> >>> > > > > > >>>>>>>>
>>> >>> > > > > > >>>>>>>>
>>> >>> > > > > > >>>>>>>> Best,
>>> >>> > > > > > >>>>>>>> Zakelly
>>> >>> > > > > > >>>>>>>>
>>> >>> > > > > > >>>>>>>>
>>> >>> > > > > > >>>>>>>> On Thu, Mar 20, 2025 at 10:37 AM Shengkai Fang <
>>> >>> > > > > > >> fskm...@gmail.com>
>>> >>> > > > > > >>>>>>> wrote:
>>> >>> > > > > > >>>>>>>>
>>> >>> > > > > > >>>>>>>>> Hi.
>>> >>> > > > > > >>>>>>>>>
>>> >>> > > > > > >>>>>>>>> After reading the doc[1], I think Spark provides
>>> a
>>> >>> > function
>>> >>> > > > for
>>> >>> > > > > > >>>>> users
>>> >>> > > > > > >>>>>>> to
>>> >>> > > > > > >>>>>>>>> consume the metadata from the savepoint.  In
>>> Flink
>>> >>> SQL,
>>> >>> > > > similar
>>> >>> > > > > > >>>>>>>>> functionality is implemented through Polymorphic
>>> >>> Table
>>> >>> > > > > > >> Functions
>>> >>> > > > > > >>>>>>> (PTF) as
>>> >>> > > > > > >>>>>>>>> proposed in FLIP-440[2]. Below is a code
>>> example[3]
>>> >>> > > > > > >> illustrating
>>> >>> > > > > > >>>>> this
>>> >>> > > > > > >>>>>>>>> concept:
>>> >>> > > > > > >>>>>>>>>
>>> >>> > > > > > >>>>>>>>> ```
>>> >>> > > > > > >>>>>>>>>    public static class ScalarArgsFunction extends
>>> >>> > > > > > >>>>>>>>> TestProcessTableFunctionBase {
>>> >>> > > > > > >>>>>>>>>        public void eval(Integer i, Boolean b) {
>>> >>> > > > > > >>>>>>>>>            collectObjects(i, b);
>>> >>> > > > > > >>>>>>>>>        }
>>> >>> > > > > > >>>>>>>>>    }
>>> >>> > > > > > >>>>>>>>> ```
>>> >>> > > > > > >>>>>>>>>
>>> >>> > > > > > >>>>>>>>> ```
>>> >>> > > > > > >>>>>>>>> INSERT INTO sink SELECT * FROM f(i => 42, b =>
>>> >>> > CAST('TRUE'
>>> >>> > > AS
>>> >>> > > > > > >>>>>>> BOOLEAN))
>>> >>> > > > > > >>>>>>>>> ``
>>> >>> > > > > > >>>>>>>>>
>>> >>> > > > > > >>>>>>>>> So we can add a builtin function named
>>> >>> > > `read_state_metadata`
>>> >>> > > > to
>>> >>> > > > > > >>>> read
>>> >>> > > > > > >>>>>>>>> savepoint data.
>>> >>> > > > > > >>>>>>>>>
>>> >>> > > > > > >>>>>>>>> Best,
>>> >>> > > > > > >>>>>>>>> Shengkai
>>> >>> > > > > > >>>>>>>>>
>>> >>> > > > > > >>>>>>>>> [1]
>>> >>> > > > > > >>>>>>>>>
>>> >>> > > > > > >>>>>>>>>
>>> >>> > > > > > >>>>>>>>
>>> >>> > > > > > >>>>>>>
>>> >>> > > > > > >>>>>
>>> >>> > > > > > >>>>
>>> >>> > > > > > >>
>>> >>> > > > > >
>>> >>> > > > >
>>> >>> > > >
>>> >>> > >
>>> >>> >
>>> >>>
>>> https://docs.databricks.com/aws/en/structured-streaming/read-state?language=SQL
>>> >>> > > > > > >>>>>>>>> [2]
>>> >>> > > > > > >>>>>>>>>
>>> >>> > > > > > >>>>>>>>
>>> >>> > > > > > >>>>>>>
>>> >>> > > > > > >>>>>
>>> >>> > > > > > >>>>
>>> >>> > > > > > >>
>>> >>> > > > > >
>>> >>> > > > >
>>> >>> > > >
>>> >>> > >
>>> >>> >
>>> >>>
>>> https://cwiki.apache.org/confluence/pages/viewpage.action?pageId=298781093
>>> >>> > > > > > >>>>>>>>> [3]
>>> >>> > > > > > >>>>>>>>>
>>> >>> > > > > > >>>>>>>>>
>>> >>> > > > > > >>>>>>>>
>>> >>> > > > > > >>>>>>>
>>> >>> > > > > > >>>>>
>>> >>> > > > > > >>>>
>>> >>> > > > > > >>
>>> >>> > > > > >
>>> >>> > > > >
>>> >>> > > >
>>> >>> > >
>>> >>> >
>>> >>>
>>> https://github.com/apache/flink/blob/master/flink-table/flink-table-planner/src/test/java/org/apache/flink/table/planner/plan/nodes/exec/stream/ProcessTableFunctionTestPrograms.java#L140
>>> >>> > > > > > >>>>>>>>>
>>> >>> > > > > > >>>>>>>>> Gyula Fóra <gyula.f...@gmail.com> 于2025年3月19日周三
>>> >>> 18:37写道:
>>> >>> > > > > > >>>>>>>>>
>>> >>> > > > > > >>>>>>>>>> Hi All!
>>> >>> > > > > > >>>>>>>>>>
>>> >>> > > > > > >>>>>>>>>> Thank you for the answers and concerns from
>>> >>> everyone.
>>> >>> > > > > > >>>>>>>>>>
>>> >>> > > > > > >>>>>>>>>> On the CLI vs State Metadata Connector/Table
>>> >>> question I
>>> >>> > > > would
>>> >>> > > > > > >>>> also
>>> >>> > > > > > >>>>>>> like
>>> >>> > > > > > >>>>>>>>> to
>>> >>> > > > > > >>>>>>>>>> step back a little and look at the bigger
>>> picture.
>>> >>> > > > > > >>>>>>>>>>
>>> >>> > > > > > >>>>>>>>>> I think the overall vision in Flink SQL is to
>>> >>> provide a
>>> >>> > > SQL
>>> >>> > > > > > >>>> native
>>> >>> > > > > > >>>>>>>>>> environment where we can serve complex use-cases
>>> >>> like
>>> >>> > you
>>> >>> > > > > > >> would
>>> >>> > > > > > >>>>>>> expect
>>> >>> > > > > > >>>>>>>>> in a
>>> >>> > > > > > >>>>>>>>>> regular database.
>>> >>> > > > > > >>>>>>>>>> Most features, developments in the recent years
>>> have
>>> >>> > gone
>>> >>> > > > > > >> this
>>> >>> > > > > > >>>>> way.
>>> >>> > > > > > >>>>>>>>>>
>>> >>> > > > > > >>>>>>>>>> The State Metadata Table would be a natural and
>>> >>> > > > > > >> straightforward
>>> >>> > > > > > >>>>> fit
>>> >>> > > > > > >>>>>>>> here.
>>> >>> > > > > > >>>>>>>>>> So from my side, +1 for that.
>>> >>> > > > > > >>>>>>>>>>
>>> >>> > > > > > >>>>>>>>>> However I could understand if we are not ready
>>> to
>>> >>> add a
>>> >>> > > new
>>> >>> > > > > > >>>>>>>>>> connector/format due to maintenance concerns
>>> (and in
>>> >>> > > general
>>> >>> > > > > > >>>>> concern
>>> >>> > > > > > >>>>>>>>> about
>>> >>> > > > > > >>>>>>>>>> the design).
>>> >>> > > > > > >>>>>>>>>> If that's the issue then we should spend more
>>> time
>>> >>> on
>>> >>> > the
>>> >>> > > > > > >>>> design
>>> >>> > > > > > >>>>> to
>>> >>> > > > > > >>>>>>> get
>>> >>> > > > > > >>>>>>>>>> comfortable with the approach and seek feedback
>>> >>> from the
>>> >>> > > > > > >> wider
>>> >>> > > > > > >>>>>>>> community
>>> >>> > > > > > >>>>>>>>>>
>>> >>> > > > > > >>>>>>>>>> I am -1 for the CLI/tooling approach as that
>>> will
>>> >>> not
>>> >>> > > > provide
>>> >>> > > > > > >>>> the
>>> >>> > > > > > >>>>>>>>>> featureset we are looking for that is not
>>> already
>>> >>> > covered
>>> >>> > > by
>>> >>> > > > > > >>>> the
>>> >>> > > > > > >>>>>>> Java
>>> >>> > > > > > >>>>>>>>>> connector. And that approach would come with the
>>> >>> same
>>> >>> > > > > > >>>> maintenance
>>> >>> > > > > > >>>>>>>>>> implications.
>>> >>> > > > > > >>>>>>>>>>
>>> >>> > > > > > >>>>>>>>>> Cheers
>>> >>> > > > > > >>>>>>>>>> Gyula
>>> >>> > > > > > >>>>>>>>>>
>>> >>> > > > > > >>>>>>>>>>
>>> >>> > > > > > >>>>>>>>>> On Wed, Mar 19, 2025 at 11:24 AM Gabor Somogyi <
>>> >>> > > > > > >>>>>>>>> gabor.g.somo...@gmail.com>
>>> >>> > > > > > >>>>>>>>>> wrote:
>>> >>> > > > > > >>>>>>>>>>
>>> >>> > > > > > >>>>>>>>>>> Hi Zaklely, Shengkai
>>> >>> > > > > > >>>>>>>>>>>
>>> >>> > > > > > >>>>>>>>>>> Several topics are going on so adding gist
>>> answers
>>> >>> to
>>> >>> > > them.
>>> >>> > > > > > >>>> When
>>> >>> > > > > > >>>>>>> some
>>> >>> > > > > > >>>>>>>>>> topic
>>> >>> > > > > > >>>>>>>>>>> is not touched please highlight it.
>>> >>> > > > > > >>>>>>>>>>>
>>> >>> > > > > > >>>>>>>>>>> @Shengkai: I've read through all the previous
>>> FLIPs
>>> >>> > > related
>>> >>> > > > > > >>>>>>> catalogs
>>> >>> > > > > > >>>>>>>>> and
>>> >>> > > > > > >>>>>>>>>> if
>>> >>> > > > > > >>>>>>>>>>> we would like to keep the concepts there
>>> >>> > > > > > >>>>>>>>>>> then one-to-one mapping relationship between
>>> >>> savepoint
>>> >>> > > and
>>> >>> > > > > > >>>>> catalog
>>> >>> > > > > > >>>>>>>> is a
>>> >>> > > > > > >>>>>>>>>>> reasonable direction. In short I'm happy that
>>> >>> > > > > > >>>>>>>>>>> you've highlighted this and agree as a whole.
>>> I've
>>> >>> > > written
>>> >>> > > > > > >> it
>>> >>> > > > > > >>>>> down
>>> >>> > > > > > >>>>>>>>>>> previously, just want to double confirm that
>>> state
>>> >>> > > catalog
>>> >>> > > > > > >> is
>>> >>> > > > > > >>>>>>>>>>> essential and planned. When we reach this point
>>> >>> then
>>> >>> > your
>>> >>> > > > > > >>>> input
>>> >>> > > > > > >>>>> is
>>> >>> > > > > > >>>>>>>> more
>>> >>> > > > > > >>>>>>>>>>> than welcome.
>>> >>> > > > > > >>>>>>>>>>>
>>> >>> > > > > > >>>>>>>>>>> @Zakelly: We've tried the CLI and separate
>>> library
>>> >>> > > > > > >> approaches
>>> >>> > > > > > >>>>> with
>>> >>> > > > > > >>>>>>>>> users
>>> >>> > > > > > >>>>>>>>>>> already and these are not something which is
>>> >>> welcome
>>> >>> > > > > > >> because
>>> >>> > > > > > >>>> of
>>> >>> > > > > > >>>>>>> the
>>> >>> > > > > > >>>>>>>>>>> following:
>>> >>> > > > > > >>>>>>>>>>> * Users want to have automated tasks and not
>>> manual
>>> >>> > > > > > >>>> CLI/library
>>> >>> > > > > > >>>>>>>> output
>>> >>> > > > > > >>>>>>>>>>> parsing. This can be hacked around but our
>>> >>> experience
>>> >>> > is
>>> >>> > > > > > >>>>> negative
>>> >>> > > > > > >>>>>>> on
>>> >>> > > > > > >>>>>>>>> this
>>> >>> > > > > > >>>>>>>>>>> because it's just brittle.
>>> >>> > > > > > >>>>>>>>>>> * From development perspective It's way much
>>> bigger
>>> >>> > > effort
>>> >>> > > > > > >>>> than
>>> >>> > > > > > >>>>> a
>>> >>> > > > > > >>>>>>>>>> connector
>>> >>> > > > > > >>>>>>>>>>> (hard to test, packaging/version handling is
>>> and
>>> >>> extra
>>> >>> > > > > > >> layer
>>> >>> > > > > > >>>> of
>>> >>> > > > > > >>>>>>>>>> complexity,
>>> >>> > > > > > >>>>>>>>>>> external FS authentication is pain for users,
>>> >>> expecting
>>> >>> > > > > > >> them
>>> >>> > > > > > >>>> to
>>> >>> > > > > > >>>>>>>>> download
>>> >>> > > > > > >>>>>>>>>>> savepoints also)
>>> >>> > > > > > >>>>>>>>>>> * Purely personal opinion but if we would find
>>> >>> better
>>> >>> > > ways
>>> >>> > > > > > >>>> later
>>> >>> > > > > > >>>>>>> then
>>> >>> > > > > > >>>>>>>>>>> retire a CLI is not more lightweight than
>>> retire a
>>> >>> > > > > > >> connector
>>> >>> > > > > > >>>>>>>>>>>
>>> >>> > > > > > >>>>>>>>>>>> It would be great if you give some examples
>>> on how
>>> >>> > user
>>> >>> > > > > > >>>> could
>>> >>> > > > > > >>>>>>>>> leverage
>>> >>> > > > > > >>>>>>>>>>> the separate connector to process the metadata.
>>> >>> > > > > > >>>>>>>>>>>
>>> >>> > > > > > >>>>>>>>>>> The most simplest cases:
>>> >>> > > > > > >>>>>>>>>>> * give me the overgroving state uids
>>> >>> > > > > > >>>>>>>>>>> * give me the not known (new or renamed) state
>>> uids
>>> >>> > > > > > >>>>>>>>>>> * give me the state uids where state size
>>> >>> drastically
>>> >>> > > > > > >> dropped
>>> >>> > > > > > >>>>>>> compare
>>> >>> > > > > > >>>>>>>>> to
>>> >>> > > > > > >>>>>>>>>> a
>>> >>> > > > > > >>>>>>>>>>> previous savepoint (accidental state loss)
>>> >>> > > > > > >>>>>>>>>>>
>>> >>> > > > > > >>>>>>>>>>> Since it was mentioned: as a general offtopic
>>> >>> teaser,
>>> >>> > > yeah
>>> >>> > > > > > >> it
>>> >>> > > > > > >>>>>>> would
>>> >>> > > > > > >>>>>>>> be
>>> >>> > > > > > >>>>>>>>>> good
>>> >>> > > > > > >>>>>>>>>>> to have some sort of checkpoint/savepoint
>>> lineage
>>> >>> or
>>> >>> > > > > > >> however
>>> >>> > > > > > >>>> we
>>> >>> > > > > > >>>>>>> call
>>> >>> > > > > > >>>>>>>>> it.
>>> >>> > > > > > >>>>>>>>>>> Since we've not yet reached this point there
>>> are no
>>> >>> > > > > > >> technical
>>> >>> > > > > > >>>>>>>> details,
>>> >>> > > > > > >>>>>>>>>> it's
>>> >>> > > > > > >>>>>>>>>>> more like a vision. It's a common pattern that
>>> >>> > > > > > >>>>>>>>>>> jobs are physically running but somehow the
>>> state
>>> >>> > > > > > >> processing
>>> >>> > > > > > >>>> is
>>> >>> > > > > > >>>>>>> stuck
>>> >>> > > > > > >>>>>>>>> and
>>> >>> > > > > > >>>>>>>>>>> it would be good to add some way to find it out
>>> >>> > > > > > >>>> automatically.
>>> >>> > > > > > >>>>>>>>>>> The important saying here is automation and not
>>> >>> manual
>>> >>> > > > > > >>>>> evaluation
>>> >>> > > > > > >>>>>>>> since
>>> >>> > > > > > >>>>>>>>>>> handling 10k+ jobs is just not allowing that.
>>> >>> > > > > > >>>>>>>>>>>
>>> >>> > > > > > >>>>>>>>>>> BR,
>>> >>> > > > > > >>>>>>>>>>> G
>>> >>> > > > > > >>>>>>>>>>>
>>> >>> > > > > > >>>>>>>>>>>
>>> >>> > > > > > >>>>>>>>>>> On Wed, Mar 19, 2025 at 6:46 AM Shengkai Fang <
>>> >>> > > > > > >>>>> fskm...@gmail.com>
>>> >>> > > > > > >>>>>>>>> wrote:
>>> >>> > > > > > >>>>>>>>>>>
>>> >>> > > > > > >>>>>>>>>>>> Hi, All.
>>> >>> > > > > > >>>>>>>>>>>>
>>> >>> > > > > > >>>>>>>>>>>> About State Catalog, I want to share more
>>> thoughts
>>> >>> > about
>>> >>> > > > > > >>>> this.
>>> >>> > > > > > >>>>>>>>>>>>
>>> >>> > > > > > >>>>>>>>>>>> In the initial design concept, I understood
>>> that a
>>> >>> > > > > > >>>> savepoint
>>> >>> > > > > > >>>>>>> and a
>>> >>> > > > > > >>>>>>>>>> state
>>> >>> > > > > > >>>>>>>>>>>> catalog have a one-to-one mapping
>>> relationship.
>>> >>> Each
>>> >>> > > > > > >>>> operator
>>> >>> > > > > > >>>>>>>>>> corresponds
>>> >>> > > > > > >>>>>>>>>>>> to a database, and the state of each operator
>>> is
>>> >>> > > > > > >>>> represented
>>> >>> > > > > > >>>>> as
>>> >>> > > > > > >>>>>>>>>>> individual
>>> >>> > > > > > >>>>>>>>>>>> tables. The rationale behind this design is:
>>> >>> > > > > > >>>>>>>>>>>>
>>> >>> > > > > > >>>>>>>>>>>> *State Diversity*: An operator may involve
>>> >>> multiple
>>> >>> > > types
>>> >>> > > > > > >>>> of
>>> >>> > > > > > >>>>>>>> states.
>>> >>> > > > > > >>>>>>>>>> For
>>> >>> > > > > > >>>>>>>>>>>> example, in our VVR design, a "multi-join"
>>> >>> operator
>>> >>> > uses
>>> >>> > > > > > >>>> keyed
>>> >>> > > > > > >>>>>>>> states
>>> >>> > > > > > >>>>>>>>>> for
>>> >>> > > > > > >>>>>>>>>>>> two input streams and a broadcast state for
>>> the
>>> >>> third
>>> >>> > > > > > >>>> stream.
>>> >>> > > > > > >>>>>>> This
>>> >>> > > > > > >>>>>>>>>> makes
>>> >>> > > > > > >>>>>>>>>>> it
>>> >>> > > > > > >>>>>>>>>>>> challenging to represent all states of an
>>> operator
>>> >>> > > > > > >> within a
>>> >>> > > > > > >>>>>>> single
>>> >>> > > > > > >>>>>>>>>> table.
>>> >>> > > > > > >>>>>>>>>>>> *Scalability*: Internally, an operator might
>>> have
>>> >>> > > > > > >> multiple
>>> >>> > > > > > >>>>> keyed
>>> >>> > > > > > >>>>>>>>> states
>>> >>> > > > > > >>>>>>>>>>>> (e.g., value state and list state). However,
>>> large
>>> >>> > list
>>> >>> > > > > > >>>> states
>>> >>> > > > > > >>>>>>> may
>>> >>> > > > > > >>>>>>>>> not
>>> >>> > > > > > >>>>>>>>>>> fit
>>> >>> > > > > > >>>>>>>>>>>> entirely in memory. To address this, we
>>> recommend
>>> >>> > > > > > >>>> implementing
>>> >>> > > > > > >>>>>>> each
>>> >>> > > > > > >>>>>>>>>> state
>>> >>> > > > > > >>>>>>>>>>>> as a separate table.
>>> >>> > > > > > >>>>>>>>>>>>
>>> >>> > > > > > >>>>>>>>>>>> To resolve the loosely coupled relationships
>>> >>> between
>>> >>> > > > > > >>>> operator
>>> >>> > > > > > >>>>>>>> states,
>>> >>> > > > > > >>>>>>>>>> we
>>> >>> > > > > > >>>>>>>>>>>> propose embedding predefined views within the
>>> >>> catalog.
>>> >>> > > > > > >>>> These
>>> >>> > > > > > >>>>>>> views
>>> >>> > > > > > >>>>>>>>>>> simplify
>>> >>> > > > > > >>>>>>>>>>>> user understanding of operator
>>> implementations and
>>> >>> > > > > > >> provide
>>> >>> > > > > > >>>> a
>>> >>> > > > > > >>>>>>> more
>>> >>> > > > > > >>>>>>>>>>> intuitive
>>> >>> > > > > > >>>>>>>>>>>> perspective. For instance, a join operator may
>>> >>> have
>>> >>> > > > > > >>>> multiple
>>> >>> > > > > > >>>>>>> state
>>> >>> > > > > > >>>>>>>>>>>> implementations (depending on whether the
>>> join key
>>> >>> > > > > > >> includes
>>> >>> > > > > > >>>>>>> unique
>>> >>> > > > > > >>>>>>>>>>>> attributes), but users primarily care about
>>> the
>>> >>> data
>>> >>> > > > > > >>>>> associated
>>> >>> > > > > > >>>>>>>> with
>>> >>> > > > > > >>>>>>>>> a
>>> >>> > > > > > >>>>>>>>>>>> specific join key across input streams.
>>> >>> > > > > > >>>>>>>>>>>>
>>> >>> > > > > > >>>>>>>>>>>> Returning to the one-to-one mapping between
>>> >>> savepoints
>>> >>> > > > > > >> and
>>> >>> > > > > > >>>>>>>> catalogs,
>>> >>> > > > > > >>>>>>>>> we
>>> >>> > > > > > >>>>>>>>>>> aim
>>> >>> > > > > > >>>>>>>>>>>> to manage multiple user state catalogs
>>> through a
>>> >>> > catalog
>>> >>> > > > > > >>>>> store.
>>> >>> > > > > > >>>>>>>> When
>>> >>> > > > > > >>>>>>>>> a
>>> >>> > > > > > >>>>>>>>>>> user
>>> >>> > > > > > >>>>>>>>>>>> triggers a savepoint for a job on the
>>> platform:
>>> >>> > > > > > >>>>>>>>>>>>
>>> >>> > > > > > >>>>>>>>>>>> 1. The platform sends a REST request to the
>>> >>> > JobManager.
>>> >>> > > > > > >>>>>>>>>>>> 2. Simultaneously, it registers a new state
>>> >>> catalog in
>>> >>> > > > > > >> the
>>> >>> > > > > > >>>>>>> catalog
>>> >>> > > > > > >>>>>>>>>> store,
>>> >>> > > > > > >>>>>>>>>>>> enabling immediate analysis of state data on
>>> the
>>> >>> > > > > > >> platform.
>>> >>> > > > > > >>>>>>>>>>>> 3. Deleting a savepoint would also trigger the
>>> >>> removal
>>> >>> > > of
>>> >>> > > > > > >>>> its
>>> >>> > > > > > >>>>>>>>>> associated
>>> >>> > > > > > >>>>>>>>>>>> catalog.
>>> >>> > > > > > >>>>>>>>>>>>
>>> >>> > > > > > >>>>>>>>>>>> This vision assumes that states are
>>> >>> self-describing or
>>> >>> > > > > > >>>> that a
>>> >>> > > > > > >>>>>>> state
>>> >>> > > > > > >>>>>>>>>>>> metaservice is introduced to analyze savepoint
>>> >>> > > > > > >> structures.
>>> >>> > > > > > >>>>>>>>>>>>
>>> >>> > > > > > >>>>>>>>>>>>> How can users create logic to identify
>>> >>> differences
>>> >>> > > > > > >>>> between
>>> >>> > > > > > >>>>>>>> multiple
>>> >>> > > > > > >>>>>>>>>>>> savepoints?
>>> >>> > > > > > >>>>>>>>>>>>
>>> >>> > > > > > >>>>>>>>>>>> Since savepoints and state catalogs are
>>> one-to-one
>>> >>> > > > > > >> mapped,
>>> >>> > > > > > >>>>> users
>>> >>> > > > > > >>>>>>>> can
>>> >>> > > > > > >>>>>>>>>>> query
>>> >>> > > > > > >>>>>>>>>>>> metadata via their respective catalogs. For
>>> >>> example:
>>> >>> > > > > > >>>>>>>>>>>>
>>> >>> > > > > > >>>>>>>>>>>> 1.
>>> >>> > > > > > >>>>>
>>> >>> `savepoint-${id}`.`system`.`metadata_table`.`<operator-name>`
>>> >>> > > > > > >>>>>>>>>> provides
>>> >>> > > > > > >>>>>>>>>>>> operator-specific metadata (e.g., state size,
>>> >>> type).
>>> >>> > > > > > >>>>>>>>>>>> 2. Comparing metadata tables (e.g., schema
>>> >>> versions,
>>> >>> > > > > > >> state
>>> >>> > > > > > >>>>> entry
>>> >>> > > > > > >>>>>>>>>> counts)
>>> >>> > > > > > >>>>>>>>>>>> across catalogs reveals structural or
>>> quantitative
>>> >>> > > > > > >>>>> differences.
>>> >>> > > > > > >>>>>>>>>>>> 3. For deeper analysis, users could write SQL
>>> >>> queries
>>> >>> > to
>>> >>> > > > > > >>>>> compare
>>> >>> > > > > > >>>>>>>>>> specific
>>> >>> > > > > > >>>>>>>>>>>> state partitions or leverage the metaservice
>>> to
>>> >>> track
>>> >>> > > > > > >> state
>>> >>> > > > > > >>>>>>>> evolution
>>> >>> > > > > > >>>>>>>>>>>> (e.g., added/removed operators, modified state
>>> >>> > > > > > >>>>> configurations).
>>> >>> > > > > > >>>>>>>>>>>>
>>> >>> > > > > > >>>>>>>>>>>> If we plan to introduce a state catalog in the
>>> >>> > future, I
>>> >>> > > > > > >>>> would
>>> >>> > > > > > >>>>>>> lean
>>> >>> > > > > > >>>>>>>>>>> toward
>>> >>> > > > > > >>>>>>>>>>>> using metadata tables. If a utility tool can
>>> >>> address
>>> >>> > the
>>> >>> > > > > > >>>>>>> challenges
>>> >>> > > > > > >>>>>>>>> we
>>> >>> > > > > > >>>>>>>>>>>> face, could we avoid introducing an additional
>>> >>> > > connector?
>>> >>> > > > > > >>>>>>>>>>>>
>>> >>> > > > > > >>>>>>>>>>>> Best,
>>> >>> > > > > > >>>>>>>>>>>> Shengkai
>>> >>> > > > > > >>>>>>>>>>>>
>>> >>> > > > > > >>>>>>>>>>>> Gyula Fóra <gyula.f...@gmail.com>
>>> 于2025年3月17日周一
>>> >>> > > 20:25写道:
>>> >>> > > > > > >>>>>>>>>>>>
>>> >>> > > > > > >>>>>>>>>>>>> Hi All!
>>> >>> > > > > > >>>>>>>>>>>>>
>>> >>> > > > > > >>>>>>>>>>>>> Without going into too much detail here are
>>> my 2
>>> >>> > cents
>>> >>> > > > > > >>>>>>> regarding
>>> >>> > > > > > >>>>>>>>> the
>>> >>> > > > > > >>>>>>>>>>>>> virtual column / catalog metadata / table
>>> >>> (connector)
>>> >>> > > > > > >>>>>>> discussion
>>> >>> > > > > > >>>>>>>>> for
>>> >>> > > > > > >>>>>>>>>>> the
>>> >>> > > > > > >>>>>>>>>>>>> State metadata.
>>> >>> > > > > > >>>>>>>>>>>>>
>>> >>> > > > > > >>>>>>>>>>>>> State metadata such as the types of states,
>>> their
>>> >>> > > > > > >>>>> properties,
>>> >>> > > > > > >>>>>>>>> names,
>>> >>> > > > > > >>>>>>>>>>>> sizes
>>> >>> > > > > > >>>>>>>>>>>>> etc are all valuable information that can be
>>> >>> used to
>>> >>> > > > > > >>>> enrich
>>> >>> > > > > > >>>>>>> the
>>> >>> > > > > > >>>>>>>>>>>>> computations we do on state.
>>> >>> > > > > > >>>>>>>>>>>>> We can either analyze it standalone (such as
>>> >>> discover
>>> >>> > > > > > >>>>>>> anomalies,
>>> >>> > > > > > >>>>>>>>> for
>>> >>> > > > > > >>>>>>>>>>>> large
>>> >>> > > > > > >>>>>>>>>>>>> jobs with many states), across multiple
>>> >>> savepoints
>>> >>> > > > > > >>>> (discover
>>> >>> > > > > > >>>>>>> how
>>> >>> > > > > > >>>>>>>>>> state
>>> >>> > > > > > >>>>>>>>>>>>> changed over time) or by joining it with
>>> keyed or
>>> >>> > > > > > >>>> non-keyed
>>> >>> > > > > > >>>>>>> state
>>> >>> > > > > > >>>>>>>>>> data
>>> >>> > > > > > >>>>>>>>>>> to
>>> >>> > > > > > >>>>>>>>>>>>> serve more complex queries on the state.
>>> >>> > > > > > >>>>>>>>>>>>>
>>> >>> > > > > > >>>>>>>>>>>>> The only solution that seems to serve all
>>> these
>>> >>> > > > > > >> use-cases
>>> >>> > > > > > >>>>> and
>>> >>> > > > > > >>>>>>>>>>>> requirements
>>> >>> > > > > > >>>>>>>>>>>>> in a straightforward and SQL canonical way
>>> is to
>>> >>> > simply
>>> >>> > > > > > >>>>> expose
>>> >>> > > > > > >>>>>>>> the
>>> >>> > > > > > >>>>>>>>>>> state
>>> >>> > > > > > >>>>>>>>>>>>> metadata as a separate table. This is a
>>> metadata
>>> >>> > table
>>> >>> > > > > > >>>> but
>>> >>> > > > > > >>>>> you
>>> >>> > > > > > >>>>>>>> can
>>> >>> > > > > > >>>>>>>>>> also
>>> >>> > > > > > >>>>>>>>>>>>> think of it as data table, it makes no
>>> practical
>>> >>> > > > > > >>>> difference
>>> >>> > > > > > >>>>>>> here.
>>> >>> > > > > > >>>>>>>>>>>>>
>>> >>> > > > > > >>>>>>>>>>>>> Once we have a catalog later, the catalog can
>>> >>> offer
>>> >>> > > > > > >> this
>>> >>> > > > > > >>>>> table
>>> >>> > > > > > >>>>>>>> out
>>> >>> > > > > > >>>>>>>>> of
>>> >>> > > > > > >>>>>>>>>>> the
>>> >>> > > > > > >>>>>>>>>>>>> box, the same way databases provide metadata
>>> >>> tables.
>>> >>> > > > > > >> For
>>> >>> > > > > > >>>>> this
>>> >>> > > > > > >>>>>>> to
>>> >>> > > > > > >>>>>>>>> work
>>> >>> > > > > > >>>>>>>>>>>>> however we need another, simpler connector
>>> that
>>> >>> > creates
>>> >>> > > > > > >>>> this
>>> >>> > > > > > >>>>>>>> table.
>>> >>> > > > > > >>>>>>>>>>>>>
>>> >>> > > > > > >>>>>>>>>>>>> +1 for state metadata as a separate
>>> >>> connector/table,
>>> >>> > > > > > >>>> instead
>>> >>> > > > > > >>>>>>> of
>>> >>> > > > > > >>>>>>>>>> adding
>>> >>> > > > > > >>>>>>>>>>>>> virtual columns and adhoc catalog metadata
>>> that
>>> >>> is
>>> >>> > hard
>>> >>> > > > > > >>>> to
>>> >>> > > > > > >>>>> use
>>> >>> > > > > > >>>>>>>> in a
>>> >>> > > > > > >>>>>>>>>>> large
>>> >>> > > > > > >>>>>>>>>>>>> number of queries.
>>> >>> > > > > > >>>>>>>>>>>>>
>>> >>> > > > > > >>>>>>>>>>>>> Cheers,
>>> >>> > > > > > >>>>>>>>>>>>> Gyula
>>> >>> > > > > > >>>>>>>>>>>>>
>>> >>> > > > > > >>>>>>>>>>>>> On Mon, Mar 17, 2025 at 12:44 PM Gabor
>>> Somogyi <
>>> >>> > > > > > >>>>>>>>>>>> gabor.g.somo...@gmail.com>
>>> >>> > > > > > >>>>>>>>>>>>> wrote:
>>> >>> > > > > > >>>>>>>>>>>>>
>>> >>> > > > > > >>>>>>>>>>>>>> 1. State TTL for Value Columns
>>> >>> > > > > > >>>>>>>>>>>>>>
>>> >>> > > > > > >>>>>>>>>>>>>>> I’m planning on adding this, and we may
>>> >>> collaborate
>>> >>> > > > > > >>>> on
>>> >>> > > > > > >>>>> it
>>> >>> > > > > > >>>>>>> in
>>> >>> > > > > > >>>>>>>>> the
>>> >>> > > > > > >>>>>>>>>>>>> future.
>>> >>> > > > > > >>>>>>>>>>>>>>
>>> >>> > > > > > >>>>>>>>>>>>>> +1 on this, just ping me.
>>> >>> > > > > > >>>>>>>>>>>>>>
>>> >>> > > > > > >>>>>>>>>>>>>> 2. Metadata Table vs. Metadata Column
>>> >>> > > > > > >>>>>>>>>>>>>>
>>> >>> > > > > > >>>>>>>>>>>>>> After some code digging and POC all I can
>>> say
>>> >>> that
>>> >>> > > > > > >> with
>>> >>> > > > > > >>>>>>> heavy
>>> >>> > > > > > >>>>>>>>>> effort
>>> >>> > > > > > >>>>>>>>>>> we
>>> >>> > > > > > >>>>>>>>>>>>> can
>>> >>> > > > > > >>>>>>>>>>>>>> maybe add such changes that we're able to
>>> show
>>> >>> > > > > > >> metadata
>>> >>> > > > > > >>>>> of a
>>> >>> > > > > > >>>>>>>>>>> savepoint
>>> >>> > > > > > >>>>>>>>>>>>> from
>>> >>> > > > > > >>>>>>>>>>>>>> catalog.
>>> >>> > > > > > >>>>>>>>>>>>>> I'm not against that but from user
>>> perspective
>>> >>> this
>>> >>> > > > > > >> has
>>> >>> > > > > > >>>>>>> limited
>>> >>> > > > > > >>>>>>>>>>> value,
>>> >>> > > > > > >>>>>>>>>>>>> let
>>> >>> > > > > > >>>>>>>>>>>>>> me explain why.
>>> >>> > > > > > >>>>>>>>>>>>>>
>>> >>> > > > > > >>>>>>>>>>>>>> From high level perspective I see the
>>> following
>>> >>> > > > > > >> which I
>>> >>> > > > > > >>>>> see
>>> >>> > > > > > >>>>>>>>>> agreement
>>> >>> > > > > > >>>>>>>>>>>> on:
>>> >>> > > > > > >>>>>>>>>>>>>> * We should have a catalog which is
>>> >>> representing one
>>> >>> > > > > > >> or
>>> >>> > > > > > >>>>> more
>>> >>> > > > > > >>>>>>>> jobs
>>> >>> > > > > > >>>>>>>>>>>>> savepoint
>>> >>> > > > > > >>>>>>>>>>>>>> data set (future plan)
>>> >>> > > > > > >>>>>>>>>>>>>> * Savepoints should be able to be
>>> registered in
>>> >>> the
>>> >>> > > > > > >>>>> catalog
>>> >>> > > > > > >>>>>>>> which
>>> >>> > > > > > >>>>>>>>>> are
>>> >>> > > > > > >>>>>>>>>>>>> then
>>> >>> > > > > > >>>>>>>>>>>>>> databases (future plan)
>>> >>> > > > > > >>>>>>>>>>>>>> * There must be a possiblity to create
>>> tables
>>> >>> from
>>> >>> > > > > > >>>>> databases
>>> >>> > > > > > >>>>>>>>> where
>>> >>> > > > > > >>>>>>>>>>>> users
>>> >>> > > > > > >>>>>>>>>>>>>> can read state data (exists already)
>>> >>> > > > > > >>>>>>>>>>>>>>
>>> >>> > > > > > >>>>>>>>>>>>>> In terms of metadata, If I understand
>>> correctly
>>> >>> then
>>> >>> > > > > > >>>> the
>>> >>> > > > > > >>>>>>>>> suggested
>>> >>> > > > > > >>>>>>>>>>>>> approach
>>> >>> > > > > > >>>>>>>>>>>>>> would be to access
>>> >>> > > > > > >>>>>>>>>>>>>> it from the catalog describe command, right?
>>> >>> Adding
>>> >>> > > > > > >>>> that
>>> >>> > > > > > >>>>>>> info
>>> >>> > > > > > >>>>>>>>> when
>>> >>> > > > > > >>>>>>>>>>>>> specific
>>> >>> > > > > > >>>>>>>>>>>>>> database describe command
>>> >>> > > > > > >>>>>>>>>>>>>> is executed could be done.
>>> >>> > > > > > >>>>>>>>>>>>>>
>>> >>> > > > > > >>>>>>>>>>>>>> The question is for instance how can users
>>> >>> create
>>> >>> > > > > > >> such
>>> >>> > > > > > >>>> a
>>> >>> > > > > > >>>>>>> logic
>>> >>> > > > > > >>>>>>>>> that
>>> >>> > > > > > >>>>>>>>>>>> tells
>>> >>> > > > > > >>>>>>>>>>>>>> them what is
>>> >>> > > > > > >>>>>>>>>>>>>> the difference between multiple savepoints?
>>> >>> > > > > > >>>>>>>>>>>>>> Just to give some examples:
>>> >>> > > > > > >>>>>>>>>>>>>> * per operator size changes between
>>> savepoints
>>> >>> > > > > > >>>>>>>>>>>>>> * show values from operator data where state
>>> >>> size
>>> >>> > > > > > >>>> reaches
>>> >>> > > > > > >>>>> a
>>> >>> > > > > > >>>>>>>>>> boundary
>>> >>> > > > > > >>>>>>>>>>>>>> * in general "find which checkpoint ruined
>>> >>> things"
>>> >>> > is
>>> >>> > > > > > >>>>> quite
>>> >>> > > > > > >>>>>>>>> common
>>> >>> > > > > > >>>>>>>>>>>>> pattern
>>> >>> > > > > > >>>>>>>>>>>>>> What I would like to highlight here is that
>>> from
>>> >>> > > > > > >> Flink
>>> >>> > > > > > >>>>>>> point of
>>> >>> > > > > > >>>>>>>>>> view
>>> >>> > > > > > >>>>>>>>>>>> the
>>> >>> > > > > > >>>>>>>>>>>>>> metadata can be
>>> >>> > > > > > >>>>>>>>>>>>>> considered as a static side output
>>> information
>>> >>> but
>>> >>> > > > > > >> for
>>> >>> > > > > > >>>>> users
>>> >>> > > > > > >>>>>>>>> these
>>> >>> > > > > > >>>>>>>>>>>> values
>>> >>> > > > > > >>>>>>>>>>>>>> are actual real data
>>> >>> > > > > > >>>>>>>>>>>>>> where logic is planned to build around.
>>> >>> > > > > > >>>>>>>>>>>>>>
>>> >>> > > > > > >>>>>>>>>>>>>>> The metadata is more like one-time
>>> information
>>> >>> > > > > > >>>> instead
>>> >>> > > > > > >>>>> of
>>> >>> > > > > > >>>>>>> a
>>> >>> > > > > > >>>>>>>>>>> streaming
>>> >>> > > > > > >>>>>>>>>>>>>> data that changes all
>>> >>> > > > > > >>>>>>>>>>>>>> the time, so a single connector seems to be
>>> an
>>> >>> > > > > > >>>> overkill.
>>> >>> > > > > > >>>>>>>>>>>>>>
>>> >>> > > > > > >>>>>>>>>>>>>> State data is also static within a
>>> savepoint and
>>> >>> > > > > > >> that's
>>> >>> > > > > > >>>>> the
>>> >>> > > > > > >>>>>>>>> reason
>>> >>> > > > > > >>>>>>>>>>> why
>>> >>> > > > > > >>>>>>>>>>>>> the
>>> >>> > > > > > >>>>>>>>>>>>>> state processor API is working in batch
>>> mode.
>>> >>> > > > > > >>>>>>>>>>>>>> When we handle multiple checkpoints in a
>>> >>> streaming
>>> >>> > > > > > >>>> fashion
>>> >>> > > > > > >>>>>>> then
>>> >>> > > > > > >>>>>>>>>> this
>>> >>> > > > > > >>>>>>>>>>>> can
>>> >>> > > > > > >>>>>>>>>>>>> be
>>> >>> > > > > > >>>>>>>>>>>>>> viewed from another angle.
>>> >>> > > > > > >>>>>>>>>>>>>>
>>> >>> > > > > > >>>>>>>>>>>>>> We can come up with more lightweight
>>> solution
>>> >>> other
>>> >>> > > > > > >>>> than a
>>> >>> > > > > > >>>>>>> new
>>> >>> > > > > > >>>>>>>>>>>> connector
>>> >>> > > > > > >>>>>>>>>>>>>> but enforcing users to parse the catalog
>>> >>> > > > > > >>>>>>>>>>>>>> describe command output in order to compare
>>> >>> multiple
>>> >>> > > > > > >>>>>>> savepoints
>>> >>> > > > > > >>>>>>>>>>> doesn't
>>> >>> > > > > > >>>>>>>>>>>>>> sound smooth user experience.
>>> >>> > > > > > >>>>>>>>>>>>>> Honestly I've no other idea how exposing
>>> >>> metadata as
>>> >>> > > > > > >>>> real
>>> >>> > > > > > >>>>>>> user
>>> >>> > > > > > >>>>>>>>> data
>>> >>> > > > > > >>>>>>>>>>> so
>>> >>> > > > > > >>>>>>>>>>>>>> waiting on other approaches.
>>> >>> > > > > > >>>>>>>>>>>>>>
>>> >>> > > > > > >>>>>>>>>>>>>> BR,
>>> >>> > > > > > >>>>>>>>>>>>>> G
>>> >>> > > > > > >>>>>>>>>>>>>>
>>> >>> > > > > > >>>>>>>>>>>>>>
>>> >>> > > > > > >>>>>>>>>>>>>> On Thu, Mar 13, 2025 at 2:44 AM Shengkai
>>> Fang <
>>> >>> > > > > > >>>>>>>> fskm...@gmail.com
>>> >>> > > > > > >>>>>>>>>>
>>> >>> > > > > > >>>>>>>>>>>> wrote:
>>> >>> > > > > > >>>>>>>>>>>>>>
>>> >>> > > > > > >>>>>>>>>>>>>>> Looking forward to hearing the good news!
>>> >>> > > > > > >>>>>>>>>>>>>>>
>>> >>> > > > > > >>>>>>>>>>>>>>> Best,
>>> >>> > > > > > >>>>>>>>>>>>>>> Shengkai
>>> >>> > > > > > >>>>>>>>>>>>>>>
>>> >>> > > > > > >>>>>>>>>>>>>>> Gabor Somogyi <gabor.g.somo...@gmail.com>
>>> >>> > > > > > >>>> 于2025年3月12日周三
>>> >>> > > > > > >>>>>>>>> 22:24写道:
>>> >>> > > > > > >>>>>>>>>>>>>>>
>>> >>> > > > > > >>>>>>>>>>>>>>>> Thanks for both the valuable input!
>>> >>> > > > > > >>>>>>>>>>>>>>>>
>>> >>> > > > > > >>>>>>>>>>>>>>>> Let me take a closer look at the
>>> suggestions,
>>> >>> > > > > > >> like
>>> >>> > > > > > >>>> the
>>> >>> > > > > > >>>>>>>>> Catalog
>>> >>> > > > > > >>>>>>>>>>>>>>> capabilities
>>> >>> > > > > > >>>>>>>>>>>>>>>> and possibility of embedding
>>> TypeInformation
>>> >>> or
>>> >>> > > > > > >>>>>>>>>>>>>>>> StateDescriptor metadata directly into
>>> the raw
>>> >>> > > > > > >>>> state
>>> >>> > > > > > >>>>>>>> files...
>>> >>> > > > > > >>>>>>>>>>>>>>>>
>>> >>> > > > > > >>>>>>>>>>>>>>>> BR,
>>> >>> > > > > > >>>>>>>>>>>>>>>> G
>>> >>> > > > > > >>>>>>>>>>>>>>>>
>>> >>> > > > > > >>>>>>>>>>>>>>>>
>>> >>> > > > > > >>>>>>>>>>>>>>>> On Wed, Mar 12, 2025 at 8:17 AM Shengkai
>>> Fang
>>> >>> <
>>> >>> > > > > > >>>>>>>>>> fskm...@gmail.com
>>> >>> > > > > > >>>>>>>>>>>>
>>> >>> > > > > > >>>>>>>>>>>>>> wrote:
>>> >>> > > > > > >>>>>>>>>>>>>>>>
>>> >>> > > > > > >>>>>>>>>>>>>>>>> Thanks for Zakelly's clarification.
>>> >>> > > > > > >>>>>>>>>>>>>>>>>
>>> >>> > > > > > >>>>>>>>>>>>>>>>> 1. State TTL for Value Columns
>>> >>> > > > > > >>>>>>>>>>>>>>>>>
>>> >>> > > > > > >>>>>>>>>>>>>>>>> +1 to delay the discussion about this.
>>> >>> > > > > > >>>>>>>>>>>>>>>>>
>>> >>> > > > > > >>>>>>>>>>>>>>>>> 2. Metadata Table vs. Metadata Column
>>> >>> > > > > > >>>>>>>>>>>>>>>>>
>>> >>> > > > > > >>>>>>>>>>>>>>>>> I’d like to share my perspective on the
>>> State
>>> >>> > > > > > >>>>> Catalog
>>> >>> > > > > > >>>>>>>>>> proposal.
>>> >>> > > > > > >>>>>>>>>>>>> While
>>> >>> > > > > > >>>>>>>>>>>>>>>>> introducing this capability is
>>> beneficial,
>>> >>> > > > > > >> there
>>> >>> > > > > > >>>> is
>>> >>> > > > > > >>>>> a
>>> >>> > > > > > >>>>>>>>>> blocker:
>>> >>> > > > > > >>>>>>>>>>>> the
>>> >>> > > > > > >>>>>>>>>>>>>>>> current
>>> >>> > > > > > >>>>>>>>>>>>>>>>> StateBackend architecture does not permit
>>> >>> > > > > > >>>> operators
>>> >>> > > > > > >>>>> to
>>> >>> > > > > > >>>>>>>>> encode
>>> >>> > > > > > >>>>>>>>>>>>>>>>> TypeInformation into the state—it only
>>> >>> > > > > > >> preserves
>>> >>> > > > > > >>>> the
>>> >>> > > > > > >>>>>>>>>>> Serializer.
>>> >>> > > > > > >>>>>>>>>>>>> This
>>> >>> > > > > > >>>>>>>>>>>>>>>>> limitation creates an asymmetry, as
>>> operators
>>> >>> > > > > > >>>> alone
>>> >>> > > > > > >>>>>>>> retain
>>> >>> > > > > > >>>>>>>>>>>>> knowledge
>>> >>> > > > > > >>>>>>>>>>>>>> of
>>> >>> > > > > > >>>>>>>>>>>>>>>> the
>>> >>> > > > > > >>>>>>>>>>>>>>>>> data structure’s schema.
>>> >>> > > > > > >>>>>>>>>>>>>>>>>
>>> >>> > > > > > >>>>>>>>>>>>>>>>> To address this, I suggest allowing
>>> operators
>>> >>> > > > > > >> to
>>> >>> > > > > > >>>>> embed
>>> >>> > > > > > >>>>>>>>>>>>>> TypeInformation
>>> >>> > > > > > >>>>>>>>>>>>>>> or
>>> >>> > > > > > >>>>>>>>>>>>>>>>> StateDescriptor metadata directly into
>>> the
>>> >>> raw
>>> >>> > > > > > >>>> state
>>> >>> > > > > > >>>>>>>> files.
>>> >>> > > > > > >>>>>>>>>>> Such
>>> >>> > > > > > >>>>>>>>>>>> a
>>> >>> > > > > > >>>>>>>>>>>>>>> design
>>> >>> > > > > > >>>>>>>>>>>>>>>>> would enable the Catalog to:
>>> >>> > > > > > >>>>>>>>>>>>>>>>>
>>> >>> > > > > > >>>>>>>>>>>>>>>>> 1. Parse state files and programmatically
>>> >>> > > > > > >> derive
>>> >>> > > > > > >>>> the
>>> >>> > > > > > >>>>>>>> schema
>>> >>> > > > > > >>>>>>>>>> and
>>> >>> > > > > > >>>>>>>>>>>>>>>> structural
>>> >>> > > > > > >>>>>>>>>>>>>>>>> guarantees for each state.
>>> >>> > > > > > >>>>>>>>>>>>>>>>> 2. Leverage existing Flink Table
>>> utilities,
>>> >>> > > > > > >> such
>>> >>> > > > > > >>>> as
>>> >>> > > > > > >>>>>>>>>>>>>>>>> LegacyTypeInfoDataTypeConverter (in
>>> >>> > > > > > >>>>>>>>>>>>>>> org.apache.flink.table.types.utils),
>>> >>> > > > > > >>>>>>>>>>>>>>>> to
>>> >>> > > > > > >>>>>>>>>>>>>>>>> bridge TypeInformation and DataType
>>> >>> > > > > > >> conversions.
>>> >>> > > > > > >>>>>>>>>>>>>>>>>
>>> >>> > > > > > >>>>>>>>>>>>>>>>> If we can not store the TypeInformation
>>> or
>>> >>> > > > > > >>>>>>>> StateDescriptor
>>> >>> > > > > > >>>>>>>>>> into
>>> >>> > > > > > >>>>>>>>>>>> the
>>> >>> > > > > > >>>>>>>>>>>>>> raw
>>> >>> > > > > > >>>>>>>>>>>>>>>>> state files, I am +1 for this FLIP to use
>>> >>> > > > > > >>>> metadata
>>> >>> > > > > > >>>>>>> column
>>> >>> > > > > > >>>>>>>>> to
>>> >>> > > > > > >>>>>>>>>>>>> retrieve
>>> >>> > > > > > >>>>>>>>>>>>>>>>> information.
>>> >>> > > > > > >>>>>>>>>>>>>>>>>
>>> >>> > > > > > >>>>>>>>>>>>>>>>> Best,
>>> >>> > > > > > >>>>>>>>>>>>>>>>> Shengkai
>>> >>> > > > > > >>>>>>>>>>>>>>>>>
>>> >>> > > > > > >>>>>>>>>>>>>>>>>
>>> >>> > > > > > >>>>>>>>>>>>>>>>>
>>> >>> > > > > > >>>>>>>>>>>>>>>>>
>>> >>> > > > > > >>>>>>>>>>>>>>>>> Zakelly Lan <zakelly....@gmail.com>
>>> >>> > > > > > >>>> 于2025年3月12日周三
>>> >>> > > > > > >>>>>>>> 12:43写道:
>>> >>> > > > > > >>>>>>>>>>>>>>>>>
>>> >>> > > > > > >>>>>>>>>>>>>>>>>> Hi Gabor and Shengkai,
>>> >>> > > > > > >>>>>>>>>>>>>>>>>>
>>> >>> > > > > > >>>>>>>>>>>>>>>>>> Thanks for sharing your thoughts! This
>>> is a
>>> >>> > > > > > >>>> long
>>> >>> > > > > > >>>>>>>>> discussion
>>> >>> > > > > > >>>>>>>>>>> and
>>> >>> > > > > > >>>>>>>>>>>>>> sorry
>>> >>> > > > > > >>>>>>>>>>>>>>>> for
>>> >>> > > > > > >>>>>>>>>>>>>>>>>> the late reply (I'm busy catching up
>>> with
>>> >>> > > > > > >>>> release
>>> >>> > > > > > >>>>>>> 2.0
>>> >>> > > > > > >>>>>>>>> these
>>> >>> > > > > > >>>>>>>>>>>>> days).
>>> >>> > > > > > >>>>>>>>>>>>>>>>>>
>>> >>> > > > > > >>>>>>>>>>>>>>>>>> 1. State TTL for Value Columns
>>> >>> > > > > > >>>>>>>>>>>>>>>>>>
>>> >>> > > > > > >>>>>>>>>>>>>>>>>>
>>> >>> > > > > > >>>>>>>>>>>>>>>>>> Let me first clarify your thoughts to
>>> ensure
>>> >>> > > > > > >> I
>>> >>> > > > > > >>>>>>>> understand
>>> >>> > > > > > >>>>>>>>>>>>>> correctly.
>>> >>> > > > > > >>>>>>>>>>>>>>>>> IIUC,
>>> >>> > > > > > >>>>>>>>>>>>>>>>>> there is no persistent configuration for
>>> >>> > > > > > >> state
>>> >>> > > > > > >>>> TTL
>>> >>> > > > > > >>>>>>> in
>>> >>> > > > > > >>>>>>>> the
>>> >>> > > > > > >>>>>>>>>>>>>> checkpoint.
>>> >>> > > > > > >>>>>>>>>>>>>>>>> While
>>> >>> > > > > > >>>>>>>>>>>>>>>>>> you can infer that TTL is enabled by
>>> reading
>>> >>> > > > > > >>>> the
>>> >>> > > > > > >>>>>>>>>> serializer,
>>> >>> > > > > > >>>>>>>>>>>> the
>>> >>> > > > > > >>>>>>>>>>>>>>>>> checkpoint
>>> >>> > > > > > >>>>>>>>>>>>>>>>>> itself only stores the last access time
>>> for
>>> >>> > > > > > >>>> each
>>> >>> > > > > > >>>>>>> value.
>>> >>> > > > > > >>>>>>>>> So
>>> >>> > > > > > >>>>>>>>>>> the
>>> >>> > > > > > >>>>>>>>>>>>> only
>>> >>> > > > > > >>>>>>>>>>>>>>>> thing
>>> >>> > > > > > >>>>>>>>>>>>>>>>>> we can show is the last access time for
>>> each
>>> >>> > > > > > >>>>> value.
>>> >>> > > > > > >>>>>>> But
>>> >>> > > > > > >>>>>>>>> it
>>> >>> > > > > > >>>>>>>>>> is
>>> >>> > > > > > >>>>>>>>>>>> not
>>> >>> > > > > > >>>>>>>>>>>>>>>>> required
>>> >>> > > > > > >>>>>>>>>>>>>>>>>> for all state backends to store this, as
>>> >>> they
>>> >>> > > > > > >>>> may
>>> >>> > > > > > >>>>>>>>> directly
>>> >>> > > > > > >>>>>>>>>>>> store
>>> >>> > > > > > >>>>>>>>>>>>>> the
>>> >>> > > > > > >>>>>>>>>>>>>>>>>> expired time. This will also increase
>>> the
>>> >>> > > > > > >>>>>>> difficulty of
>>> >>> > > > > > >>>>>>>>>>>>>>> implementation
>>> >>> > > > > > >>>>>>>>>>>>>>>> &
>>> >>> > > > > > >>>>>>>>>>>>>>>>>> maintenance.
>>> >>> > > > > > >>>>>>>>>>>>>>>>>>
>>> >>> > > > > > >>>>>>>>>>>>>>>>>> This once again reiterates the
>>> importance of
>>> >>> > > > > > >>>>> unified
>>> >>> > > > > > >>>>>>>>>> metadata
>>> >>> > > > > > >>>>>>>>>>>> for
>>> >>> > > > > > >>>>>>>>>>>>>>>>>> checkpoints. I’m planning on adding
>>> this,
>>> >>> and
>>> >>> > > > > > >>>> we
>>> >>> > > > > > >>>>> may
>>> >>> > > > > > >>>>>>>>>>>> collaborate
>>> >>> > > > > > >>>>>>>>>>>>> on
>>> >>> > > > > > >>>>>>>>>>>>>>> it
>>> >>> > > > > > >>>>>>>>>>>>>>>> in
>>> >>> > > > > > >>>>>>>>>>>>>>>>>> the future.
>>> >>> > > > > > >>>>>>>>>>>>>>>>>>
>>> >>> > > > > > >>>>>>>>>>>>>>>>>> 2. Metadata Table vs. Metadata Column
>>> >>> > > > > > >>>>>>>>>>>>>>>>>>
>>> >>> > > > > > >>>>>>>>>>>>>>>>>>
>>> >>> > > > > > >>>>>>>>>>>>>>>>>> I'm not in favor of adding a new
>>> connector
>>> >>> > > > > > >> for
>>> >>> > > > > > >>>>>>>> metadata.
>>> >>> > > > > > >>>>>>>>>> The
>>> >>> > > > > > >>>>>>>>>>>>>> metadata
>>> >>> > > > > > >>>>>>>>>>>>>>>> is
>>> >>> > > > > > >>>>>>>>>>>>>>>>>> more like one-time information instead
>>> of a
>>> >>> > > > > > >>>>>>> streaming
>>> >>> > > > > > >>>>>>>>> data
>>> >>> > > > > > >>>>>>>>>>> that
>>> >>> > > > > > >>>>>>>>>>>>>>> changes
>>> >>> > > > > > >>>>>>>>>>>>>>>>> all
>>> >>> > > > > > >>>>>>>>>>>>>>>>>> the time, so a single connector seems
>>> to be
>>> >>> > > > > > >> an
>>> >>> > > > > > >>>>>>>> overkill.
>>> >>> > > > > > >>>>>>>>> It
>>> >>> > > > > > >>>>>>>>>>> is
>>> >>> > > > > > >>>>>>>>>>>>> not
>>> >>> > > > > > >>>>>>>>>>>>>>> easy
>>> >>> > > > > > >>>>>>>>>>>>>>>>> to
>>> >>> > > > > > >>>>>>>>>>>>>>>>>> withdraw a connector if we have a better
>>> >>> > > > > > >>>> solution
>>> >>> > > > > > >>>>> in
>>> >>> > > > > > >>>>>>>>>> future.
>>> >>> > > > > > >>>>>>>>>>>> I'm
>>> >>> > > > > > >>>>>>>>>>>>>> not
>>> >>> > > > > > >>>>>>>>>>>>>>>>>> familiar with current Catalog
>>> capabilities,
>>> >>> > > > > > >>>> and if
>>> >>> > > > > > >>>>>>> it
>>> >>> > > > > > >>>>>>>>> could
>>> >>> > > > > > >>>>>>>>>>>>> extract
>>> >>> > > > > > >>>>>>>>>>>>>>> and
>>> >>> > > > > > >>>>>>>>>>>>>>>>>> show some operator-level information
>>> from
>>> >>> > > > > > >>>>> savepoint,
>>> >>> > > > > > >>>>>>>> that
>>> >>> > > > > > >>>>>>>>>>> would
>>> >>> > > > > > >>>>>>>>>>>>> be
>>> >>> > > > > > >>>>>>>>>>>>>>>> great.
>>> >>> > > > > > >>>>>>>>>>>>>>>>>>
>>> >>> > > > > > >>>>>>>>>>>>>>>>>> If the Catalog can't do that, I would
>>> >>> > > > > > >> consider
>>> >>> > > > > > >>>> the
>>> >>> > > > > > >>>>>>>>> current
>>> >>> > > > > > >>>>>>>>>>> FLIP
>>> >>> > > > > > >>>>>>>>>>>>> to
>>> >>> > > > > > >>>>>>>>>>>>>>> be a
>>> >>> > > > > > >>>>>>>>>>>>>>>>>> compromise solution.
>>> >>> > > > > > >>>>>>>>>>>>>>>>>>
>>> >>> > > > > > >>>>>>>>>>>>>>>>>> And if we have that unified metadata for
>>> >>> > > > > > >>>>>>>>>> checkpoint/savepoint
>>> >>> > > > > > >>>>>>>>>>>> in
>>> >>> > > > > > >>>>>>>>>>>>>>>> future,
>>> >>> > > > > > >>>>>>>>>>>>>>>>> we
>>> >>> > > > > > >>>>>>>>>>>>>>>>>> may directly register savepoint in
>>> catalog,
>>> >>> > > > > > >> and
>>> >>> > > > > > >>>>>>> create
>>> >>> > > > > > >>>>>>>> a
>>> >>> > > > > > >>>>>>>>>>> source
>>> >>> > > > > > >>>>>>>>>>>>>>> without
>>> >>> > > > > > >>>>>>>>>>>>>>>>>> specifying complex columns, as well as
>>> >>> > > > > > >> describe
>>> >>> > > > > > >>>>> the
>>> >>> > > > > > >>>>>>>>>> savepoint
>>> >>> > > > > > >>>>>>>>>>>>>> catalog
>>> >>> > > > > > >>>>>>>>>>>>>>>> to
>>> >>> > > > > > >>>>>>>>>>>>>>>>>> get the metadata. That's a good
>>> solution in
>>> >>> > > > > > >> my
>>> >>> > > > > > >>>>> mind.
>>> >>> > > > > > >>>>>>>>>>>>>>>>>>
>>> >>> > > > > > >>>>>>>>>>>>>>>>>>
>>> >>> > > > > > >>>>>>>>>>>>>>>>>> Best,
>>> >>> > > > > > >>>>>>>>>>>>>>>>>> Zakelly
>>> >>> > > > > > >>>>>>>>>>>>>>>>>>
>>> >>> > > > > > >>>>>>>>>>>>>>>>>> On Wed, Mar 12, 2025 at 10:35 AM
>>> Shengkai
>>> >>> > > > > > >> Fang
>>> >>> > > > > > >>>> <
>>> >>> > > > > > >>>>>>>>>>>>> fskm...@gmail.com>
>>> >>> > > > > > >>>>>>>>>>>>>>>>> wrote:
>>> >>> > > > > > >>>>>>>>>>>>>>>>>>
>>> >>> > > > > > >>>>>>>>>>>>>>>>>>> Hi Gabor,
>>> >>> > > > > > >>>>>>>>>>>>>>>>>>>
>>> >>> > > > > > >>>>>>>>>>>>>>>>>>>> 2. Adding a new connector with
>>> >>> > > > > > >>>>>>> `savepoint-metadata`
>>> >>> > > > > > >>>>>>>>>>>>>>>>>>>
>>> >>> > > > > > >>>>>>>>>>>>>>>>>>> I would argue against introducing a new
>>> >>> > > > > > >>>>> connector
>>> >>> > > > > > >>>>>>>> type
>>> >>> > > > > > >>>>>>>>>>> named
>>> >>> > > > > > >>>>>>>>>>>>>>>>>>> savepoint-metadata, as the existing
>>> Catalog
>>> >>> > > > > > >>>>>>> mechanism
>>> >>> > > > > > >>>>>>>>> can
>>> >>> > > > > > >>>>>>>>>>>>>>> inherently
>>> >>> > > > > > >>>>>>>>>>>>>>>>>>> provide the necessary connector factory
>>> >>> > > > > > >>>>>>> capabilities.
>>> >>> > > > > > >>>>>>>>>> I’ve
>>> >>> > > > > > >>>>>>>>>>>>>> detailed
>>> >>> > > > > > >>>>>>>>>>>>>>>>> this
>>> >>> > > > > > >>>>>>>>>>>>>>>>>>> proposal in branch[1]. Please take a
>>> moment
>>> >>> > > > > > >>>> to
>>> >>> > > > > > >>>>>>> review
>>> >>> > > > > > >>>>>>>>> it.
>>> >>> > > > > > >>>>>>>>>>>>>>>>>>>
>>> >>> > > > > > >>>>>>>>>>>>>>>>>>> If we introduce a connector named
>>> >>> > > > > > >>>>>>>> `savepoint-metadata`,
>>> >>> > > > > > >>>>>>>>>> it
>>> >>> > > > > > >>>>>>>>>>>>> means
>>> >>> > > > > > >>>>>>>>>>>>>>> user
>>> >>> > > > > > >>>>>>>>>>>>>>>>> can
>>> >>> > > > > > >>>>>>>>>>>>>>>>>>> create a temporary table with connector
>>> >>> > > > > > >>>>>>>>>>> `savepoint-metadata`
>>> >>> > > > > > >>>>>>>>>>>>> and
>>> >>> > > > > > >>>>>>>>>>>>>>> the
>>> >>> > > > > > >>>>>>>>>>>>>>>>>>> connector needs to check whether table
>>> >>> > > > > > >>>> schema is
>>> >>> > > > > > >>>>>>> same
>>> >>> > > > > > >>>>>>>>> to
>>> >>> > > > > > >>>>>>>>>>> the
>>> >>> > > > > > >>>>>>>>>>>>>> schema
>>> >>> > > > > > >>>>>>>>>>>>>>>> we
>>> >>> > > > > > >>>>>>>>>>>>>>>>>>> proposed in the FLIP. On the other
>>> hand,
>>> >>> > > > > > >> it's
>>> >>> > > > > > >>>>> not
>>> >>> > > > > > >>>>>>>> easy
>>> >>> > > > > > >>>>>>>>>> work
>>> >>> > > > > > >>>>>>>>>>>> for
>>> >>> > > > > > >>>>>>>>>>>>>>>> others
>>> >>> > > > > > >>>>>>>>>>>>>>>>> to
>>> >>> > > > > > >>>>>>>>>>>>>>>>>>> users a metadata table with same
>>> schema.
>>> >>> > > > > > >>>>>>>>>>>>>>>>>>>
>>> >>> > > > > > >>>>>>>>>>>>>>>>>>> [1]
>>> >>> > > > > > >>>>>>>>>>>>>>>>>>>
>>> >>> > > > > > >>>>>>>>>>>>>>>>>>>
>>> >>> > > > > > >>>>>>>>>>>>>>>>>>
>>> >>> > > > > > >>>>>>>>>>>>>>>>>
>>> >>> > > > > > >>>>>>>>>>>>>>>>
>>> >>> > > > > > >>>>>>>>>>>>>>>
>>> >>> > > > > > >>>>>>>>>>>>>>
>>> >>> > > > > > >>>>>>>>>>>>>
>>> >>> > > > > > >>>>>>>>>>>>
>>> >>> > > > > > >>>>>>>>>>>
>>> >>> > > > > > >>>>>>>>>>
>>> >>> > > > > > >>>>>>>>>
>>> >>> > > > > > >>>>>>>>
>>> >>> > > > > > >>>>>>>
>>> >>> > > > > > >>>>>
>>> >>> > > > > > >>>>
>>> >>> > > > > > >>
>>> >>> > > > > >
>>> >>> > > > >
>>> >>> > > >
>>> >>> > >
>>> >>> >
>>> >>>
>>> https://github.com/apache/flink/compare/master...fsk119:flink:state-metadata?expand=1#diff-712a7bc92fe46c405fb0e61b475bb2a005cb7a72bab7df28bbb92744bcb5f465R63
>>> >>> > > > > > >>>>>>>>>>>>>>>>>>>
>>> >>> > > > > > >>>>>>>>>>>>>>>>>>> Best,
>>> >>> > > > > > >>>>>>>>>>>>>>>>>>> Shengkai
>>> >>> > > > > > >>>>>>>>>>>>>>>>>>>
>>> >>> > > > > > >>>>>>>>>>>>>>>>>>> Gabor Somogyi <
>>> gabor.g.somo...@gmail.com>
>>> >>> > > > > > >>>>>>>>> 于2025年3月11日周二
>>> >>> > > > > > >>>>>>>>>>>>> 16:56写道:
>>> >>> > > > > > >>>>>>>>>>>>>>>>>>>
>>> >>> > > > > > >>>>>>>>>>>>>>>>>>>> Hi Shengkai,
>>> >>> > > > > > >>>>>>>>>>>>>>>>>>>>
>>> >>> > > > > > >>>>>>>>>>>>>>>>>>>>> 1. State TTL for Value Columns
>>> >>> > > > > > >>>>>>>>>>>>>>>>>>>>
>>> >>> > > > > > >>>>>>>>>>>>>>>>>>>> From directional perspective I agree
>>> your
>>> >>> > > > > > >>>> idea
>>> >>> > > > > > >>>>>>> how
>>> >>> > > > > > >>>>>>>> it
>>> >>> > > > > > >>>>>>>>>> can
>>> >>> > > > > > >>>>>>>>>>>> be
>>> >>> > > > > > >>>>>>>>>>>>>>>>>> implemented.
>>> >>> > > > > > >>>>>>>>>>>>>>>>>>>> Previously I've mentioned that TTL
>>> >>> > > > > > >>>> information
>>> >>> > > > > > >>>>>>> is
>>> >>> > > > > > >>>>>>>> not
>>> >>> > > > > > >>>>>>>>>>>> exposed
>>> >>> > > > > > >>>>>>>>>>>>>> on
>>> >>> > > > > > >>>>>>>>>>>>>>>> the
>>> >>> > > > > > >>>>>>>>>>>>>>>>>>> state
>>> >>> > > > > > >>>>>>>>>>>>>>>>>>>> processor API (which the SQL state
>>> >>> > > > > > >>>> connector
>>> >>> > > > > > >>>>>>> uses
>>> >>> > > > > > >>>>>>>> to
>>> >>> > > > > > >>>>>>>>>> read
>>> >>> > > > > > >>>>>>>>>>>>> data)
>>> >>> > > > > > >>>>>>>>>>>>>>>>>>>> and unless somebody show me the
>>> opposite
>>> >>> > > > > > >>>> this
>>> >>> > > > > > >>>>>>> FLIP
>>> >>> > > > > > >>>>>>>> is
>>> >>> > > > > > >>>>>>>>>> not
>>> >>> > > > > > >>>>>>>>>>>>> going
>>> >>> > > > > > >>>>>>>>>>>>>>> to
>>> >>> > > > > > >>>>>>>>>>>>>>>>>>> address
>>> >>> > > > > > >>>>>>>>>>>>>>>>>>>> this to avoid feature creep. Our users
>>> >>> > > > > > >> are
>>> >>> > > > > > >>>>> also
>>> >>> > > > > > >>>>>>>>>>> interested
>>> >>> > > > > > >>>>>>>>>>>> in
>>> >>> > > > > > >>>>>>>>>>>>>> TTL
>>> >>> > > > > > >>>>>>>>>>>>>>>> so
>>> >>> > > > > > >>>>>>>>>>>>>>>>>>>> sooner or later we're going to expose
>>> it,
>>> >>> > > > > > >>>> this
>>> >>> > > > > > >>>>>>> is
>>> >>> > > > > > >>>>>>>>>> matter
>>> >>> > > > > > >>>>>>>>>>> of
>>> >>> > > > > > >>>>>>>>>>>>>>>>> scheduling.
>>> >>> > > > > > >>>>>>>>>>>>>>>>>>>>
>>> >>> > > > > > >>>>>>>>>>>>>>>>>>>>> 2. Adding a new connector with
>>> >>> > > > > > >>>>>>>> `savepoint-metadata`
>>> >>> > > > > > >>>>>>>>>>>>>>>>>>>>
>>> >>> > > > > > >>>>>>>>>>>>>>>>>>>> Not sure I understand your point at
>>> all
>>> >>> > > > > > >>>>> related
>>> >>> > > > > > >>>>>>>>>>>> StateCatalog.
>>> >>> > > > > > >>>>>>>>>>>>>>> First
>>> >>> > > > > > >>>>>>>>>>>>>>>>> of
>>> >>> > > > > > >>>>>>>>>>>>>>>>>>> all
>>> >>> > > > > > >>>>>>>>>>>>>>>>>>>> I can't agree more that StateCatalog
>>> is
>>> >>> > > > > > >>>> needed
>>> >>> > > > > > >>>>>>> and
>>> >>> > > > > > >>>>>>>>> is a
>>> >>> > > > > > >>>>>>>>>>>>> planned
>>> >>> > > > > > >>>>>>>>>>>>>>>>>> building
>>> >>> > > > > > >>>>>>>>>>>>>>>>>>>> block in an upcoming
>>> >>> > > > > > >>>>>>>>>>>>>>>>>>>> FLIP but not sure how can it help
>>> now? No
>>> >>> > > > > > >>>>> matter
>>> >>> > > > > > >>>>>>>>> what,
>>> >>> > > > > > >>>>>>>>>>> your
>>> >>> > > > > > >>>>>>>>>>>>>>>> knowledge
>>> >>> > > > > > >>>>>>>>>>>>>>>>>> is
>>> >>> > > > > > >>>>>>>>>>>>>>>>>>>> essential when we add StateCatalog.
>>> Let
>>> >>> > > > > > >> me
>>> >>> > > > > > >>>>>>> expose
>>> >>> > > > > > >>>>>>>> my
>>> >>> > > > > > >>>>>>>>>>>>>>> understanding
>>> >>> > > > > > >>>>>>>>>>>>>>>> in
>>> >>> > > > > > >>>>>>>>>>>>>>>>>>> this
>>> >>> > > > > > >>>>>>>>>>>>>>>>>>>> area:
>>> >>> > > > > > >>>>>>>>>>>>>>>>>>>> * First we need create table
>>> statements
>>> >>> > > > > > >> to
>>> >>> > > > > > >>>>>>> access
>>> >>> > > > > > >>>>>>>>> state
>>> >>> > > > > > >>>>>>>>>>>> data
>>> >>> > > > > > >>>>>>>>>>>>>> and
>>> >>> > > > > > >>>>>>>>>>>>>>>>>> metadata
>>> >>> > > > > > >>>>>>>>>>>>>>>>>>>> * When we have that then we can add
>>> >>> > > > > > >>>>> StateCatalog
>>> >>> > > > > > >>>>>>>>> which
>>> >>> > > > > > >>>>>>>>>>>> could
>>> >>> > > > > > >>>>>>>>>>>>>>>>>> potentially
>>> >>> > > > > > >>>>>>>>>>>>>>>>>>>> ease the life of users by for ex.
>>> giving
>>> >>> > > > > > >>>>>>>>> off-the-shelf
>>> >>> > > > > > >>>>>>>>>>>> tables
>>> >>> > > > > > >>>>>>>>>>>>>>>> without
>>> >>> > > > > > >>>>>>>>>>>>>>>>>>>> sweating with create table statements
>>> >>> > > > > > >>>>>>>>>>>>>>>>>>>>
>>> >>> > > > > > >>>>>>>>>>>>>>>>>>>> User expectations:
>>> >>> > > > > > >>>>>>>>>>>>>>>>>>>> * See state data (this is fulfilled
>>> with
>>> >>> > > > > > >>>> the
>>> >>> > > > > > >>>>>>>> existing
>>> >>> > > > > > >>>>>>>>>>>>>> connector)
>>> >>> > > > > > >>>>>>>>>>>>>>>>>>>> * See metadata about state data like
>>> TTL
>>> >>> > > > > > >>>> (this
>>> >>> > > > > > >>>>>>> can
>>> >>> > > > > > >>>>>>>> be
>>> >>> > > > > > >>>>>>>>>>> added
>>> >>> > > > > > >>>>>>>>>>>>> as
>>> >>> > > > > > >>>>>>>>>>>>>>>>> metadata
>>> >>> > > > > > >>>>>>>>>>>>>>>>>>>> column as you suggested since it
>>> belongs
>>> >>> > > > > > >> to
>>> >>> > > > > > >>>>> the
>>> >>> > > > > > >>>>>>>> data)
>>> >>> > > > > > >>>>>>>>>>>>>>>>>>>> * See metadata about operators (this
>>> can
>>> >>> > > > > > >> be
>>> >>> > > > > > >>>>>>> added
>>> >>> > > > > > >>>>>>>>> from
>>> >>> > > > > > >>>>>>>>>>>>>>>>>>> savepoint-metadata)
>>> >>> > > > > > >>>>>>>>>>>>>>>>>>>>
>>> >>> > > > > > >>>>>>>>>>>>>>>>>>>> Important to highlight that state data
>>> >>> > > > > > >>>> table
>>> >>> > > > > > >>>>>>> format
>>> >>> > > > > > >>>>>>>>>>> differs
>>> >>> > > > > > >>>>>>>>>>>>>> from
>>> >>> > > > > > >>>>>>>>>>>>>>>>> state
>>> >>> > > > > > >>>>>>>>>>>>>>>>>>>> metadata table format. Namely one
>>> table
>>> >>> > > > > > >> has
>>> >>> > > > > > >>>>> rows
>>> >>> > > > > > >>>>>>>> for
>>> >>> > > > > > >>>>>>>>>>> state
>>> >>> > > > > > >>>>>>>>>>>>>> values
>>> >>> > > > > > >>>>>>>>>>>>>>>> and
>>> >>> > > > > > >>>>>>>>>>>>>>>>>>>> another has rows for operators, right?
>>> >>> > > > > > >>>>>>>>>>>>>>>>>>>> I think that's the reason why you've
>>> >>> > > > > > >>>>> pinpointed
>>> >>> > > > > > >>>>>>> out
>>> >>> > > > > > >>>>>>>>>> that
>>> >>> > > > > > >>>>>>>>>>>> the
>>> >>> > > > > > >>>>>>>>>>>>>>>>> suggested
>>> >>> > > > > > >>>>>>>>>>>>>>>>>>>> metadata columns are somewhat clunky.
>>> >>> > > > > > >>>>>>>>>>>>>>>>>>>>
>>> >>> > > > > > >>>>>>>>>>>>>>>>>>>> As a conclusion I agree to add
>>> >>> > > > > > >>>>> ${state-name}_ttl
>>> >>> > > > > > >>>>>>>>>> metadata
>>> >>> > > > > > >>>>>>>>>>>>>> column
>>> >>> > > > > > >>>>>>>>>>>>>>>>> later
>>> >>> > > > > > >>>>>>>>>>>>>>>>>> on
>>> >>> > > > > > >>>>>>>>>>>>>>>>>>>> since it belongs to the state value
>>> and
>>> >>> > > > > > >>>>> adding a
>>> >>> > > > > > >>>>>>>> new
>>> >>> > > > > > >>>>>>>>>>> table
>>> >>> > > > > > >>>>>>>>>>>>> type
>>> >>> > > > > > >>>>>>>>>>>>>>>> (like
>>> >>> > > > > > >>>>>>>>>>>>>>>>>> you
>>> >>> > > > > > >>>>>>>>>>>>>>>>>>>> suggested similar to PG [1])
>>> >>> > > > > > >>>>>>>>>>>>>>>>>>>> for metadata. Please see how Spark
>>> does
>>> >>> > > > > > >>>> that
>>> >>> > > > > > >>>>> too
>>> >>> > > > > > >>>>>>>> [2].
>>> >>> > > > > > >>>>>>>>>>>>>>>>>>>>
>>> >>> > > > > > >>>>>>>>>>>>>>>>>>>> If you have better approach then
>>> please
>>> >>> > > > > > >>>>>>> elaborate
>>> >>> > > > > > >>>>>>>>> with
>>> >>> > > > > > >>>>>>>>>>> more
>>> >>> > > > > > >>>>>>>>>>>>>>> details
>>> >>> > > > > > >>>>>>>>>>>>>>>>> and
>>> >>> > > > > > >>>>>>>>>>>>>>>>>>>> help me to understand your point.
>>> >>> > > > > > >>>>>>>>>>>>>>>>>>>>
>>> >>> > > > > > >>>>>>>>>>>>>>>>>>>>> Up until now we've seen even in TB
>>> >>> > > > > > >>>>> savepoints
>>> >>> > > > > > >>>>>>>> that
>>> >>> > > > > > >>>>>>>>>> the
>>> >>> > > > > > >>>>>>>>>>>>> number
>>> >>> > > > > > >>>>>>>>>>>>>>> of
>>> >>> > > > > > >>>>>>>>>>>>>>>>> keys
>>> >>> > > > > > >>>>>>>>>>>>>>>>>>> can
>>> >>> > > > > > >>>>>>>>>>>>>>>>>>>>> be extremely huge but not the per key
>>> >>> > > > > > >>>> state
>>> >>> > > > > > >>>>>>>> itself.
>>> >>> > > > > > >>>>>>>>>>>>>>>>>>>>> But again, this is a good feature
>>> as-is
>>> >>> > > > > > >>>> and
>>> >>> > > > > > >>>>>>> can
>>> >>> > > > > > >>>>>>>> be
>>> >>> > > > > > >>>>>>>>>>>> handled
>>> >>> > > > > > >>>>>>>>>>>>>> in a
>>> >>> > > > > > >>>>>>>>>>>>>>>>>>> separate
>>> >>> > > > > > >>>>>>>>>>>>>>>>>>>>> jira.
>>> >>> > > > > > >>>>>>>>>>>>>>>>>>>>
>>> >>> > > > > > >>>>>>>>>>>>>>>>>>>> I've just created
>>> >>> > > > > > >>>>>>>>>>>>>>>>
>>> >>> > > > > > >> https://issues.apache.org/jira/browse/FLINK-37456.
>>> >>> > > > > > >>>>>>>>>>>>>>>>>>>>
>>> >>> > > > > > >>>>>>>>>>>>>>>>>>>> [1]
>>> >>> > > > > > >>>>>>>>>>>>>>
>>> >>> > > > > > >>>>>
>>> >>> https://www.postgresql.org/docs/current/view-pg-tables.html
>>> >>> > > > > > >>>>>>>>>>>>>>>>>>>> [2]
>>> >>> > > > > > >>>>>>>>>>>>>>>>>>>>
>>> >>> > > > > > >>>>>>>>>>>>>>>>>>>>
>>> >>> > > > > > >>>>>>>>>>>>>>>>>>>
>>> >>> > > > > > >>>>>>>>>>>>>>>>>>
>>> >>> > > > > > >>>>>>>>>>>>>>>>>
>>> >>> > > > > > >>>>>>>>>>>>>>>>
>>> >>> > > > > > >>>>>>>>>>>>>>>
>>> >>> > > > > > >>>>>>>>>>>>>>
>>> >>> > > > > > >>>>>>>>>>>>>
>>> >>> > > > > > >>>>>>>>>>>>
>>> >>> > > > > > >>>>>>>>>>>
>>> >>> > > > > > >>>>>>>>>>
>>> >>> > > > > > >>>>>>>>>
>>> >>> > > > > > >>>>>>>>
>>> >>> > > > > > >>>>>>>
>>> >>> > > > > > >>>>>
>>> >>> > > > > > >>>>
>>> >>> > > > > > >>
>>> >>> > > > > >
>>> >>> > > > >
>>> >>> > > >
>>> >>> > >
>>> >>> >
>>> >>>
>>> https://www.databricks.com/blog/announcing-state-reader-api-new-statestore-data-source
>>> >>> > > > > > >>>>>>>>>>>>>>>>>>>>
>>> >>> > > > > > >>>>>>>>>>>>>>>>>>>> BR,
>>> >>> > > > > > >>>>>>>>>>>>>>>>>>>> G
>>> >>> > > > > > >>>>>>>>>>>>>>>>>>>>
>>> >>> > > > > > >>>>>>>>>>>>>>>>>>>>
>>> >>> > > > > > >>>>>>>>>>>>>>>>>>>> On Tue, Mar 11, 2025 at 3:55 AM
>>> Shengkai
>>> >>> > > > > > >>>> Fang
>>> >>> > > > > > >>>>> <
>>> >>> > > > > > >>>>>>>>>>>>>> fskm...@gmail.com
>>> >>> > > > > > >>>>>>>>>>>>>>>>
>>> >>> > > > > > >>>>>>>>>>>>>>>>>> wrote:
>>> >>> > > > > > >>>>>>>>>>>>>>>>>>>>
>>> >>> > > > > > >>>>>>>>>>>>>>>>>>>>> Hi, Gabor. Thanks for your response.
>>> >>> > > > > > >>>>>>>>>>>>>>>>>>>>>
>>> >>> > > > > > >>>>>>>>>>>>>>>>>>>>>> 1. State TTL for Value Columns
>>> >>> > > > > > >>>>>>>>>>>>>>>>>>>>>
>>> >>> > > > > > >>>>>>>>>>>>>>>>>>>>> Thank you for addressing the
>>> >>> > > > > > >> limitations
>>> >>> > > > > > >>>>> here.
>>> >>> > > > > > >>>>>>>>>>> However, I
>>> >>> > > > > > >>>>>>>>>>>>>>> believe
>>> >>> > > > > > >>>>>>>>>>>>>>>>> it
>>> >>> > > > > > >>>>>>>>>>>>>>>>>>>> would
>>> >>> > > > > > >>>>>>>>>>>>>>>>>>>>> be beneficial to further clarify the
>>> >>> > > > > > >> API
>>> >>> > > > > > >>>> in
>>> >>> > > > > > >>>>>>> this
>>> >>> > > > > > >>>>>>>>> FLIP
>>> >>> > > > > > >>>>>>>>>>>>>> regarding
>>> >>> > > > > > >>>>>>>>>>>>>>>> how
>>> >>> > > > > > >>>>>>>>>>>>>>>>>>> users
>>> >>> > > > > > >>>>>>>>>>>>>>>>>>>>> can specify the TTL column.
>>> >>> > > > > > >>>>>>>>>>>>>>>>>>>>>
>>> >>> > > > > > >>>>>>>>>>>>>>>>>>>>> One potential approach that comes to
>>> >>> > > > > > >>>> mind is
>>> >>> > > > > > >>>>>>>> using
>>> >>> > > > > > >>>>>>>>> a
>>> >>> > > > > > >>>>>>>>>>>>>>> standardized
>>> >>> > > > > > >>>>>>>>>>>>>>>>>>> naming
>>> >>> > > > > > >>>>>>>>>>>>>>>>>>>>> convention such as ${state-name}_ttl
>>> >>> > > > > > >> for
>>> >>> > > > > > >>>> the
>>> >>> > > > > > >>>>>>>>> metadata
>>> >>> > > > > > >>>>>>>>>>>>> column
>>> >>> > > > > > >>>>>>>>>>>>>>> that
>>> >>> > > > > > >>>>>>>>>>>>>>>>>>> defines
>>> >>> > > > > > >>>>>>>>>>>>>>>>>>>>> the TTL value. In terms of
>>> >>> > > > > > >>>> implementation,
>>> >>> > > > > > >>>>> the
>>> >>> > > > > > >>>>>>>>>>>>>>>> listReadableMetadata
>>> >>> > > > > > >>>>>>>>>>>>>>>>>>>>> function could:
>>> >>> > > > > > >>>>>>>>>>>>>>>>>>>>>
>>> >>> > > > > > >>>>>>>>>>>>>>>>>>>>> 1. Read the table’s columns and
>>> >>> > > > > > >>>>> configuration,
>>> >>> > > > > > >>>>>>>>>>>>>>>>>>>>> 2. Extract all defined state names,
>>> and
>>> >>> > > > > > >>>>>>>>>>>>>>>>>>>>> 3. Return a structured list of
>>> metadata
>>> >>> > > > > > >>>>>>> entries
>>> >>> > > > > > >>>>>>>>>>> formatted
>>> >>> > > > > > >>>>>>>>>>>>> as
>>> >>> > > > > > >>>>>>>>>>>>>>>>>>>>> ${state-name}_ttl.
>>> >>> > > > > > >>>>>>>>>>>>>>>>>>>>>
>>> >>> > > > > > >>>>>>>>>>>>>>>>>>>>> WDYT?
>>> >>> > > > > > >>>>>>>>>>>>>>>>>>>>>
>>> >>> > > > > > >>>>>>>>>>>>>>>>>>>>>> 2. Adding a new connector with
>>> >>> > > > > > >>>>>>>>> `savepoint-metadata`
>>> >>> > > > > > >>>>>>>>>>>>>>>>>>>>>
>>> >>> > > > > > >>>>>>>>>>>>>>>>>>>>> Introducing a new connector type at
>>> >>> > > > > > >> this
>>> >>> > > > > > >>>>> stage
>>> >>> > > > > > >>>>>>>> may
>>> >>> > > > > > >>>>>>>>>>>>>>> unnecessarily
>>> >>> > > > > > >>>>>>>>>>>>>>>>>>>> complicate
>>> >>> > > > > > >>>>>>>>>>>>>>>>>>>>> the system. Given that every table
>>> >>> > > > > > >>>> already
>>> >>> > > > > > >>>>>>>> belongs
>>> >>> > > > > > >>>>>>>>>> to a
>>> >>> > > > > > >>>>>>>>>>>>>>> Catalog,
>>> >>> > > > > > >>>>>>>>>>>>>>>>>> which
>>> >>> > > > > > >>>>>>>>>>>>>>>>>>> is
>>> >>> > > > > > >>>>>>>>>>>>>>>>>>>>> designed to provide a Factory for
>>> >>> > > > > > >>>> building
>>> >>> > > > > > >>>>>>> source
>>> >>> > > > > > >>>>>>>>> or
>>> >>> > > > > > >>>>>>>>>>> sink
>>> >>> > > > > > >>>>>>>>>>>>>>>>>> connectors, I
>>> >>> > > > > > >>>>>>>>>>>>>>>>>>>>> propose integrating a dedicated
>>> >>> > > > > > >>>> StateCatalog
>>> >>> > > > > > >>>>>>>>> instead.
>>> >>> > > > > > >>>>>>>>>>>> This
>>> >>> > > > > > >>>>>>>>>>>>>>>> approach
>>> >>> > > > > > >>>>>>>>>>>>>>>>>>> would
>>> >>> > > > > > >>>>>>>>>>>>>>>>>>>>> allow us to:
>>> >>> > > > > > >>>>>>>>>>>>>>>>>>>>>
>>> >>> > > > > > >>>>>>>>>>>>>>>>>>>>> 1. Leverage the Catalog’s existing
>>> >>> > > > > > >>>>>>> capabilities
>>> >>> > > > > > >>>>>>>> to
>>> >>> > > > > > >>>>>>>>>>> manage
>>> >>> > > > > > >>>>>>>>>>>>> TTL
>>> >>> > > > > > >>>>>>>>>>>>>>>>>> metadata
>>> >>> > > > > > >>>>>>>>>>>>>>>>>>>>> (e.g., state names and TTL logic)
>>> >>> > > > > > >> without
>>> >>> > > > > > >>>>>>>>> duplicating
>>> >>> > > > > > >>>>>>>>>>>>>>>>> functionality.
>>> >>> > > > > > >>>>>>>>>>>>>>>>>>>>> 2. Provide a unified interface for
>>> >>> > > > > > >>>> connector
>>> >>> > > > > > >>>>>>>>>>>> instantiation
>>> >>> > > > > > >>>>>>>>>>>>>> and
>>> >>> > > > > > >>>>>>>>>>>>>>>>>> metadata
>>> >>> > > > > > >>>>>>>>>>>>>>>>>>>>> handling through the Catalog’s
>>> Factory
>>> >>> > > > > > >>>>>>> pattern.
>>> >>> > > > > > >>>>>>>>>>>>>>>>>>>>>
>>> >>> > > > > > >>>>>>>>>>>>>>>>>>>>> Would this design decision better
>>> align
>>> >>> > > > > > >>>> with
>>> >>> > > > > > >>>>>>> our
>>> >>> > > > > > >>>>>>>>>>>>>> architecture’s
>>> >>> > > > > > >>>>>>>>>>>>>>>>>>>>> extensibility and reduce redundancy?
>>> >>> > > > > > >>>>>>>>>>>>>>>>>>>>>
>>> >>> > > > > > >>>>>>>>>>>>>>>>>>>>>> Up until now we've seen even in TB
>>> >>> > > > > > >>>>>>> savepoints
>>> >>> > > > > > >>>>>>>>> that
>>> >>> > > > > > >>>>>>>>>>> the
>>> >>> > > > > > >>>>>>>>>>>>>> number
>>> >>> > > > > > >>>>>>>>>>>>>>>> of
>>> >>> > > > > > >>>>>>>>>>>>>>>>>> keys
>>> >>> > > > > > >>>>>>>>>>>>>>>>>>>> can
>>> >>> > > > > > >>>>>>>>>>>>>>>>>>>>>> be extremely huge but not the per
>>> key
>>> >>> > > > > > >>>>> state
>>> >>> > > > > > >>>>>>>>> itself.
>>> >>> > > > > > >>>>>>>>>>>>>>>>>>>>>> But again, this is a good feature
>>> >>> > > > > > >> as-is
>>> >>> > > > > > >>>>> and
>>> >>> > > > > > >>>>>>> can
>>> >>> > > > > > >>>>>>>>> be
>>> >>> > > > > > >>>>>>>>>>>>> handled
>>> >>> > > > > > >>>>>>>>>>>>>>> in a
>>> >>> > > > > > >>>>>>>>>>>>>>>>>>>> separate
>>> >>> > > > > > >>>>>>>>>>>>>>>>>>>>>> jira.
>>> >>> > > > > > >>>>>>>>>>>>>>>>>>>>>
>>> >>> > > > > > >>>>>>>>>>>>>>>>>>>>> +1 for a separate jira.
>>> >>> > > > > > >>>>>>>>>>>>>>>>>>>>>
>>> >>> > > > > > >>>>>>>>>>>>>>>>>>>>> Best,
>>> >>> > > > > > >>>>>>>>>>>>>>>>>>>>> Shengkai
>>> >>> > > > > > >>>>>>>>>>>>>>>>>>>>>
>>> >>> > > > > > >>>>>>>>>>>>>>>>>>>>> Gabor Somogyi <
>>> >>> > > > > > >> gabor.g.somo...@gmail.com
>>> >>> > > > > > >>>>>
>>> >>> > > > > > >>>>>>>>>>> 于2025年3月10日周一
>>> >>> > > > > > >>>>>>>>>>>>>>> 19:05写道:
>>> >>> > > > > > >>>>>>>>>>>>>>>>>>>>>
>>> >>> > > > > > >>>>>>>>>>>>>>>>>>>>>> Hi Shengkai,
>>> >>> > > > > > >>>>>>>>>>>>>>>>>>>>>>
>>> >>> > > > > > >>>>>>>>>>>>>>>>>>>>>> Please see my comments inline.
>>> >>> > > > > > >>>>>>>>>>>>>>>>>>>>>>
>>> >>> > > > > > >>>>>>>>>>>>>>>>>>>>>> BR,
>>> >>> > > > > > >>>>>>>>>>>>>>>>>>>>>> G
>>> >>> > > > > > >>>>>>>>>>>>>>>>>>>>>>
>>> >>> > > > > > >>>>>>>>>>>>>>>>>>>>>>
>>> >>> > > > > > >>>>>>>>>>>>>>>>>>>>>> On Mon, Mar 3, 2025 at 7:07 AM
>>> >>> > > > > > >> Shengkai
>>> >>> > > > > > >>>>>>> Fang <
>>> >>> > > > > > >>>>>>>>>>>>>>>> fskm...@gmail.com>
>>> >>> > > > > > >>>>>>>>>>>>>>>>>>>> wrote:
>>> >>> > > > > > >>>>>>>>>>>>>>>>>>>>>>
>>> >>> > > > > > >>>>>>>>>>>>>>>>>>>>>>> Hi, Gabor. Thanks for your the
>>> >>> > > > > > >> FLIP.
>>> >>> > > > > > >>>> I
>>> >>> > > > > > >>>>>>> have
>>> >>> > > > > > >>>>>>>>> some
>>> >>> > > > > > >>>>>>>>>>>>>> questions
>>> >>> > > > > > >>>>>>>>>>>>>>>>> about
>>> >>> > > > > > >>>>>>>>>>>>>>>>>>> the
>>> >>> > > > > > >>>>>>>>>>>>>>>>>>>>>> FLIP:
>>> >>> > > > > > >>>>>>>>>>>>>>>>>>>>>>>
>>> >>> > > > > > >>>>>>>>>>>>>>>>>>>>>>> 1. State TTL for Value Columns
>>> >>> > > > > > >>>>>>>>>>>>>>>>>>>>>>> How can users retrieve the state
>>> >>> > > > > > >> TTL
>>> >>> > > > > > >>>>>>>>>> (Time-to-Live)
>>> >>> > > > > > >>>>>>>>>>>> for
>>> >>> > > > > > >>>>>>>>>>>>>>> each
>>> >>> > > > > > >>>>>>>>>>>>>>>>>> value
>>> >>> > > > > > >>>>>>>>>>>>>>>>>>>>>> column?
>>> >>> > > > > > >>>>>>>>>>>>>>>>>>>>>>> From my understanding of the
>>> >>> > > > > > >> current
>>> >>> > > > > > >>>>>>> design,
>>> >>> > > > > > >>>>>>>> it
>>> >>> > > > > > >>>>>>>>>>> seems
>>> >>> > > > > > >>>>>>>>>>>>>> that
>>> >>> > > > > > >>>>>>>>>>>>>>>> this
>>> >>> > > > > > >>>>>>>>>>>>>>>>>>>>>>> functionality is not supported.
>>> >>> > > > > > >> Could
>>> >>> > > > > > >>>>> you
>>> >>> > > > > > >>>>>>>>> clarify
>>> >>> > > > > > >>>>>>>>>>> if
>>> >>> > > > > > >>>>>>>>>>>>>> there
>>> >>> > > > > > >>>>>>>>>>>>>>>> are
>>> >>> > > > > > >>>>>>>>>>>>>>>>>>> plans
>>> >>> > > > > > >>>>>>>>>>>>>>>>>>>> to
>>> >>> > > > > > >>>>>>>>>>>>>>>>>>>>>>> address this limitation?
>>> >>> > > > > > >>>>>>>>>>>>>>>>>>>>>>>
>>> >>> > > > > > >>>>>>>>>>>>>>>>>>>>>>
>>> >>> > > > > > >>>>>>>>>>>>>>>>>>>>>> Since the state processor API is not
>>> >>> > > > > > >>>> yet
>>> >>> > > > > > >>>>>>>> exposing
>>> >>> > > > > > >>>>>>>>>>> this
>>> >>> > > > > > >>>>>>>>>>>>>>>>> information
>>> >>> > > > > > >>>>>>>>>>>>>>>>>>> this
>>> >>> > > > > > >>>>>>>>>>>>>>>>>>>>>> would require several steps.
>>> >>> > > > > > >>>>>>>>>>>>>>>>>>>>>> First, the state processor API
>>> >>> > > > > > >> support
>>> >>> > > > > > >>>>>>> needs to
>>> >>> > > > > > >>>>>>>>> be
>>> >>> > > > > > >>>>>>>>>>>> added
>>> >>> > > > > > >>>>>>>>>>>>>>> which
>>> >>> > > > > > >>>>>>>>>>>>>>>>> can
>>> >>> > > > > > >>>>>>>>>>>>>>>>>> be
>>> >>> > > > > > >>>>>>>>>>>>>>>>>>>>> then
>>> >>> > > > > > >>>>>>>>>>>>>>>>>>>>>> exposed on the SQL API.
>>> >>> > > > > > >>>>>>>>>>>>>>>>>>>>>> This is definitely a future
>>> >>> > > > > > >> improvement
>>> >>> > > > > > >>>>>>> which
>>> >>> > > > > > >>>>>>>> is
>>> >>> > > > > > >>>>>>>>>>> useful
>>> >>> > > > > > >>>>>>>>>>>>> and
>>> >>> > > > > > >>>>>>>>>>>>>>> can
>>> >>> > > > > > >>>>>>>>>>>>>>>>> be
>>> >>> > > > > > >>>>>>>>>>>>>>>>>>>>> handled
>>> >>> > > > > > >>>>>>>>>>>>>>>>>>>>>> in a separate jira.
>>> >>> > > > > > >>>>>>>>>>>>>>>>>>>>>>
>>> >>> > > > > > >>>>>>>>>>>>>>>>>>>>>>
>>> >>> > > > > > >>>>>>>>>>>>>>>>>>>>>>> 2. Metadata Table vs. Metadata
>>> >>> > > > > > >> Column
>>> >>> > > > > > >>>>>>>>>>>>>>>>>>>>>>> The metadata information described
>>> >>> > > > > > >> in
>>> >>> > > > > > >>>>> the
>>> >>> > > > > > >>>>>>>> FLIP
>>> >>> > > > > > >>>>>>>>>>>> appears
>>> >>> > > > > > >>>>>>>>>>>>> to
>>> >>> > > > > > >>>>>>>>>>>>>>> be
>>> >>> > > > > > >>>>>>>>>>>>>>>>>>> intended
>>> >>> > > > > > >>>>>>>>>>>>>>>>>>>>> to
>>> >>> > > > > > >>>>>>>>>>>>>>>>>>>>>>> describe the state files stored at
>>> >>> > > > > > >> a
>>> >>> > > > > > >>>>>>> specific
>>> >>> > > > > > >>>>>>>>>>>> location.
>>> >>> > > > > > >>>>>>>>>>>>>> To
>>> >>> > > > > > >>>>>>>>>>>>>>>> me,
>>> >>> > > > > > >>>>>>>>>>>>>>>>>> this
>>> >>> > > > > > >>>>>>>>>>>>>>>>>>>>>> concept
>>> >>> > > > > > >>>>>>>>>>>>>>>>>>>>>>> aligns more closely with system
>>> >>> > > > > > >>>> tables
>>> >>> > > > > > >>>>>>> like
>>> >>> > > > > > >>>>>>>>>>> pg_tables
>>> >>> > > > > > >>>>>>>>>>>>> in
>>> >>> > > > > > >>>>>>>>>>>>>>>>>> PostgreSQL
>>> >>> > > > > > >>>>>>>>>>>>>>>>>>>> [1]
>>> >>> > > > > > >>>>>>>>>>>>>>>>>>>>>> or
>>> >>> > > > > > >>>>>>>>>>>>>>>>>>>>>>> the INFORMATION_SCHEMA in MySQL
>>> >>> > > > > > >> [2].
>>> >>> > > > > > >>>>>>>>>>>>>>>>>>>>>>>
>>> >>> > > > > > >>>>>>>>>>>>>>>>>>>>>>
>>> >>> > > > > > >>>>>>>>>>>>>>>>>>>>>> Adding a new connector with
>>> >>> > > > > > >>>>>>>> `savepoint-metadata`
>>> >>> > > > > > >>>>>>>>>> is a
>>> >>> > > > > > >>>>>>>>>>>>>>>> possibility
>>> >>> > > > > > >>>>>>>>>>>>>>>>>>> where
>>> >>> > > > > > >>>>>>>>>>>>>>>>>>>>> we
>>> >>> > > > > > >>>>>>>>>>>>>>>>>>>>>> can create such functionality.
>>> >>> > > > > > >>>>>>>>>>>>>>>>>>>>>> I'm not against that, just want to
>>> >>> > > > > > >>>> have a
>>> >>> > > > > > >>>>>>>> common
>>> >>> > > > > > >>>>>>>>>>>>> agreement
>>> >>> > > > > > >>>>>>>>>>>>>>> that
>>> >>> > > > > > >>>>>>>>>>>>>>>>> we
>>> >>> > > > > > >>>>>>>>>>>>>>>>>>>> would
>>> >>> > > > > > >>>>>>>>>>>>>>>>>>>>>> like to move that direction.
>>> >>> > > > > > >>>>>>>>>>>>>>>>>>>>>> (As a side note not just PG but
>>> Spark
>>> >>> > > > > > >>>> also
>>> >>> > > > > > >>>>>>> has
>>> >>> > > > > > >>>>>>>>>>> similar
>>> >>> > > > > > >>>>>>>>>>>>>>> approach
>>> >>> > > > > > >>>>>>>>>>>>>>>>>> and I
>>> >>> > > > > > >>>>>>>>>>>>>>>>>>>>>> basically like the idea).
>>> >>> > > > > > >>>>>>>>>>>>>>>>>>>>>> If we would go that direction
>>> >>> > > > > > >> savepoint
>>> >>> > > > > > >>>>>>>> metadata
>>> >>> > > > > > >>>>>>>>>> can
>>> >>> > > > > > >>>>>>>>>>> be
>>> >>> > > > > > >>>>>>>>>>>>>>> reached
>>> >>> > > > > > >>>>>>>>>>>>>>>>> in
>>> >>> > > > > > >>>>>>>>>>>>>>>>>> a
>>> >>> > > > > > >>>>>>>>>>>>>>>>>>>> way
>>> >>> > > > > > >>>>>>>>>>>>>>>>>>>>>> that one row would represent
>>> >>> > > > > > >>>>>>>>>>>>>>>>>>>>>> an operator with it's values
>>> >>> > > > > > >> something
>>> >>> > > > > > >>>>> like
>>> >>> > > > > > >>>>>>>> this:
>>> >>> > > > > > >>>>>>>>>>>>>>>>>>>>>>
>>> >>> > > > > > >>>>>>>>>>>>>>>>>>>>>>
>>> >>> > > > > > >>>>>>>>>>>>>>>>>>>>>>
>>> >>> > > > > > >>>>>>>>>>>>>>>>>>>>>
>>> >>> > > > > > >>>>>>>>>>>>>>>>>>>>
>>> >>> > > > > > >>>>>>>>>>>>>>>>>>>
>>> >>> > > > > > >>>>>>>>>>>>>>>>>>
>>> >>> > > > > > >>>>>>>>>>>>>>>>>
>>> >>> > > > > > >>>>>>>>>>>>>>>>
>>> >>> > > > > > >>>>>>>>>>>>>>>
>>> >>> > > > > > >>>>>>>>>>>>>>
>>> >>> > > > > > >>>>>>>>>>>>>
>>> >>> > > > > > >>>>>>>>>>>>
>>> >>> > > > > > >>>>>>>>>>>
>>> >>> > > > > > >>>>>>>>>>
>>> >>> > > > > > >>>>>>>>>
>>> >>> > > > > > >>>>>>>>
>>> >>> > > > > > >>>>>>>
>>> >>> > > > > > >>>>>
>>> >>> > > > > > >>>>
>>> >>> > > > > > >>
>>> >>> > > > > >
>>> >>> > > > >
>>> >>> > > >
>>> >>> > >
>>> >>> >
>>> >>>
>>> ┌─────────┬─────────┬─────────┬─────────┬─────────┬─────────┬─────────┬────────┐
>>> >>> > > > > > >>>>>>>>>>>>>>>>>>>>>>
>>> >>> > > > > > >>>>>>>>>>>>>>>>>>>>>>
>>> >>> > > > > > >>>>>>>>>>>>>>>>>>>>>
>>> >>> > > > > > >>>>>>>>>>>>>>>>>>>>
>>> >>> > > > > > >>>>>>>>>>>>>>>>>>>
>>> >>> > > > > > >>>>>>>>>>>>>>>>>>
>>> >>> > > > > > >>>>>>>>>>>>>>>>>
>>> >>> > > > > > >>>>>>>>>>>>>>>>
>>> >>> > > > > > >>>>>>>>>>>>>>>
>>> >>> > > > > > >>>>>>>>>>>>>>
>>> >>> > > > > > >>>>>>>>>>>>>
>>> >>> > > > > > >>>>>>>>>>>>
>>> >>> > > > > > >>>>>>>>>>>
>>> >>> > > > > > >>>>>>>>>>
>>> >>> > > > > > >>>>>>>>>
>>> >>> > > > > > >>>>>>>>
>>> >>> > > > > > >>>>>>>
>>> >>> > > > > > >>>>>
>>> >>> > > > > > >>>>
>>> >>> > > > > > >>
>>> >>> > > > > >
>>> >>> > > > >
>>> >>> > > >
>>> >>> > >
>>> >>> >
>>> >>>
>>> │operatorN│operatorU│operatorH│paralleli│maxParall│subtaskSt│coordinat│totalSta│
>>> >>> > > > > > >>>>>>>>>>>>>>>>>>>>>> │ame      │id       │ash      │sm
>>> >>> > > > > > >>>>>>> │elism
>>> >>> > > > > > >>>>>>>>>>>>>>>>>>>>>> │atesCount│orStateSi│tesSizeI│
>>> >>> > > > > > >>>>>>>>>>>>>>>>>>>>>> │         │         │         │
>>> >>> > > > > > >>>> │
>>> >>> > > > > > >>>>>>>>> │
>>> >>> > > > > > >>>>>>>>>>>>>>>>>>>>>> │zeInBytes│nBytes  │
>>> >>> > > > > > >>>>>>>>>>>>>>>>>>>>>>
>>> >>> > > > > > >>>>>>>>>>>>>>>>>>>>>>
>>> >>> > > > > > >>>>>>>>>>>>>>>>>>>>>
>>> >>> > > > > > >>>>>>>>>>>>>>>>>>>>
>>> >>> > > > > > >>>>>>>>>>>>>>>>>>>
>>> >>> > > > > > >>>>>>>>>>>>>>>>>>
>>> >>> > > > > > >>>>>>>>>>>>>>>>>
>>> >>> > > > > > >>>>>>>>>>>>>>>>
>>> >>> > > > > > >>>>>>>>>>>>>>>
>>> >>> > > > > > >>>>>>>>>>>>>>
>>> >>> > > > > > >>>>>>>>>>>>>
>>> >>> > > > > > >>>>>>>>>>>>
>>> >>> > > > > > >>>>>>>>>>>
>>> >>> > > > > > >>>>>>>>>>
>>> >>> > > > > > >>>>>>>>>
>>> >>> > > > > > >>>>>>>>
>>> >>> > > > > > >>>>>>>
>>> >>> > > > > > >>>>>
>>> >>> > > > > > >>>>
>>> >>> > > > > > >>
>>> >>> > > > > >
>>> >>> > > > >
>>> >>> > > >
>>> >>> > >
>>> >>> >
>>> >>>
>>> ├─────────┼─────────┼─────────┼─────────┼─────────┼─────────┼─────────┼────────┤
>>> >>> > > > > > >>>>>>>>>>>>>>>>>>>>>> │Source:  │datagen-s│47aee9439│2
>>> >>> > > > > > >>>>> │128
>>> >>> > > > > > >>>>>>>>>> │2
>>> >>> > > > > > >>>>>>>>>>>>>>> │16
>>> >>> > > > > > >>>>>>>>>>>>>>>>>>>>>> │546     │
>>> >>> > > > > > >>>>>>>>>>>>>>>>>>>>>> │datagen-s│ource-uid│4d6ea26e2│
>>> >>> > > > > > >>>> │
>>> >>> > > > > > >>>>>>>>> │
>>> >>> > > > > > >>>>>>>>>>>>>> │
>>> >>> > > > > > >>>>>>>>>>>>>>>>>>> │
>>> >>> > > > > > >>>>>>>>>>>>>>>>>>>>>>    │
>>> >>> > > > > > >>>>>>>>>>>>>>>>>>>>>> │ource    │         │d544bef0a│
>>> >>> > > > > > >>>> │
>>> >>> > > > > > >>>>>>>>> │
>>> >>> > > > > > >>>>>>>>>>>>>> │
>>> >>> > > > > > >>>>>>>>>>>>>>>>>>> │
>>> >>> > > > > > >>>>>>>>>>>>>>>>>>>>>>    │
>>> >>> > > > > > >>>>>>>>>>>>>>>>>>>>>> │         │         │37bb5    │
>>> >>> > > > > > >>>> │
>>> >>> > > > > > >>>>>>>>> │
>>> >>> > > > > > >>>>>>>>>>>>>> │
>>> >>> > > > > > >>>>>>>>>>>>>>>>>>> │
>>> >>> > > > > > >>>>>>>>>>>>>>>>>>>>>>    │
>>> >>> > > > > > >>>>>>>>>>>>>>>>>>>>>>
>>> >>> > > > > > >>>>>>>>>>>>>>>>>>>>>>
>>> >>> > > > > > >>>>>>>>>>>>>>>>>>>>>
>>> >>> > > > > > >>>>>>>>>>>>>>>>>>>>
>>> >>> > > > > > >>>>>>>>>>>>>>>>>>>
>>> >>> > > > > > >>>>>>>>>>>>>>>>>>
>>> >>> > > > > > >>>>>>>>>>>>>>>>>
>>> >>> > > > > > >>>>>>>>>>>>>>>>
>>> >>> > > > > > >>>>>>>>>>>>>>>
>>> >>> > > > > > >>>>>>>>>>>>>>
>>> >>> > > > > > >>>>>>>>>>>>>
>>> >>> > > > > > >>>>>>>>>>>>
>>> >>> > > > > > >>>>>>>>>>>
>>> >>> > > > > > >>>>>>>>>>
>>> >>> > > > > > >>>>>>>>>
>>> >>> > > > > > >>>>>>>>
>>> >>> > > > > > >>>>>>>
>>> >>> > > > > > >>>>>
>>> >>> > > > > > >>>>
>>> >>> > > > > > >>
>>> >>> > > > > >
>>> >>> > > > >
>>> >>> > > >
>>> >>> > >
>>> >>> >
>>> >>>
>>> ├─────────┼─────────┼─────────┼─────────┼─────────┼─────────┼─────────┼────────┤
>>> >>> > > > > > >>>>>>>>>>>>>>>>>>>>>> │long-udf-│long-udf-│6ed3f40bf│2
>>> >>> > > > > > >>>>> │128
>>> >>> > > > > > >>>>>>>>>> │2
>>> >>> > > > > > >>>>>>>>>>>>>>> │0
>>> >>> > > > > > >>>>>>>>>>>>>>>>>>>> │0
>>> >>> > > > > > >>>>>>>>>>>>>>>>>>>>>>     │
>>> >>> > > > > > >>>>>>>>>>>>>>>>>>>>>> │with-mast│with-mast│f3c8dfcdf│
>>> >>> > > > > > >>>> │
>>> >>> > > > > > >>>>>>>>> │
>>> >>> > > > > > >>>>>>>>>>>>>> │
>>> >>> > > > > > >>>>>>>>>>>>>>>>>>> │
>>> >>> > > > > > >>>>>>>>>>>>>>>>>>>>>>    │
>>> >>> > > > > > >>>>>>>>>>>>>>>>>>>>>> │er-hook  │er-hook-u│cb95128a1│
>>> >>> > > > > > >>>> │
>>> >>> > > > > > >>>>>>>>> │
>>> >>> > > > > > >>>>>>>>>>>>>> │
>>> >>> > > > > > >>>>>>>>>>>>>>>>>>> │
>>> >>> > > > > > >>>>>>>>>>>>>>>>>>>>>>    │
>>> >>> > > > > > >>>>>>>>>>>>>>>>>>>>>> │         │id       │018f1    │
>>> >>> > > > > > >>>> │
>>> >>> > > > > > >>>>>>>>> │
>>> >>> > > > > > >>>>>>>>>>>>>> │
>>> >>> > > > > > >>>>>>>>>>>>>>>>>>> │
>>> >>> > > > > > >>>>>>>>>>>>>>>>>>>>>>    │
>>> >>> > > > > > >>>>>>>>>>>>>>>>>>>>>>
>>> >>> > > > > > >>>>>>>>>>>>>>>>>>>>>>
>>> >>> > > > > > >>>>>>>>>>>>>>>>>>>>>
>>> >>> > > > > > >>>>>>>>>>>>>>>>>>>>
>>> >>> > > > > > >>>>>>>>>>>>>>>>>>>
>>> >>> > > > > > >>>>>>>>>>>>>>>>>>
>>> >>> > > > > > >>>>>>>>>>>>>>>>>
>>> >>> > > > > > >>>>>>>>>>>>>>>>
>>> >>> > > > > > >>>>>>>>>>>>>>>
>>> >>> > > > > > >>>>>>>>>>>>>>
>>> >>> > > > > > >>>>>>>>>>>>>
>>> >>> > > > > > >>>>>>>>>>>>
>>> >>> > > > > > >>>>>>>>>>>
>>> >>> > > > > > >>>>>>>>>>
>>> >>> > > > > > >>>>>>>>>
>>> >>> > > > > > >>>>>>>>
>>> >>> > > > > > >>>>>>>
>>> >>> > > > > > >>>>>
>>> >>> > > > > > >>>>
>>> >>> > > > > > >>
>>> >>> > > > > >
>>> >>> > > > >
>>> >>> > > >
>>> >>> > >
>>> >>> >
>>> >>>
>>> ├─────────┼─────────┼─────────┼─────────┼─────────┼─────────┼─────────┼────────┤
>>> >>> > > > > > >>>>>>>>>>>>>>>>>>>>>> │value-pro│value-pro│ca4f5fe9a│2
>>> >>> > > > > > >>>>> │128
>>> >>> > > > > > >>>>>>>>>> │2
>>> >>> > > > > > >>>>>>>>>>>>>>> │0
>>> >>> > > > > > >>>>>>>>>>>>>>>>>>>>>> │40726   │
>>> >>> > > > > > >>>>>>>>>>>>>>>>>>>>>> │cess     │cess-uid │637b656f0│
>>> >>> > > > > > >>>> │
>>> >>> > > > > > >>>>>>>>> │
>>> >>> > > > > > >>>>>>>>>>>>>> │
>>> >>> > > > > > >>>>>>>>>>>>>>>>>>> │
>>> >>> > > > > > >>>>>>>>>>>>>>>>>>>>>>    │
>>> >>> > > > > > >>>>>>>>>>>>>>>>>>>>>> │         │         │9ea78b3e7│
>>> >>> > > > > > >>>> │
>>> >>> > > > > > >>>>>>>>> │
>>> >>> > > > > > >>>>>>>>>>>>>> │
>>> >>> > > > > > >>>>>>>>>>>>>>>>>>> │
>>> >>> > > > > > >>>>>>>>>>>>>>>>>>>>>>    │
>>> >>> > > > > > >>>>>>>>>>>>>>>>>>>>>> │         │         │a15b9    │
>>> >>> > > > > > >>>> │
>>> >>> > > > > > >>>>>>>>> │
>>> >>> > > > > > >>>>>>>>>>>>>> │
>>> >>> > > > > > >>>>>>>>>>>>>>>>>>> │
>>> >>> > > > > > >>>>>>>>>>>>>>>>>>>>>>    │
>>> >>> > > > > > >>>>>>>>>>>>>>>>>>>>>>
>>> >>> > > > > > >>>>>>>>>>>>>>>>>>>>>>
>>> >>> > > > > > >>>>>>>>>>>>>>>>>>>>>
>>> >>> > > > > > >>>>>>>>>>>>>>>>>>>>
>>> >>> > > > > > >>>>>>>>>>>>>>>>>>>
>>> >>> > > > > > >>>>>>>>>>>>>>>>>>
>>> >>> > > > > > >>>>>>>>>>>>>>>>>
>>> >>> > > > > > >>>>>>>>>>>>>>>>
>>> >>> > > > > > >>>>>>>>>>>>>>>
>>> >>> > > > > > >>>>>>>>>>>>>>
>>> >>> > > > > > >>>>>>>>>>>>>
>>> >>> > > > > > >>>>>>>>>>>>
>>> >>> > > > > > >>>>>>>>>>>
>>> >>> > > > > > >>>>>>>>>>
>>> >>> > > > > > >>>>>>>>>
>>> >>> > > > > > >>>>>>>>
>>> >>> > > > > > >>>>>>>
>>> >>> > > > > > >>>>>
>>> >>> > > > > > >>>>
>>> >>> > > > > > >>
>>> >>> > > > > >
>>> >>> > > > >
>>> >>> > > >
>>> >>> > >
>>> >>> >
>>> >>>
>>> ├─────────┼─────────┼─────────┼─────────┼─────────┼─────────┼─────────┼────────┤
>>> >>> > > > > > >>>>>>>>>>>>>>>>>>>>>>
>>> >>> > > > > > >>>>>>>>>>>>>>>>>>>>>> This table can then be joined with
>>> >>> > > > > > >> the
>>> >>> > > > > > >>>>>>> actually
>>> >>> > > > > > >>>>>>>>>>>> existing
>>> >>> > > > > > >>>>>>>>>>>>>>>>>> `savepoint`
>>> >>> > > > > > >>>>>>>>>>>>>>>>>>>>>> connector created tables based on
>>> UID
>>> >>> > > > > > >>>> hash
>>> >>> > > > > > >>>>>>>> (which
>>> >>> > > > > > >>>>>>>>>> is
>>> >>> > > > > > >>>>>>>>>>>>> unique
>>> >>> > > > > > >>>>>>>>>>>>>>> and
>>> >>> > > > > > >>>>>>>>>>>>>>>>>>> always
>>> >>> > > > > > >>>>>>>>>>>>>>>>>>>>>> exists).
>>> >>> > > > > > >>>>>>>>>>>>>>>>>>>>>> This would mean that the already
>>> >>> > > > > > >>>> existing
>>> >>> > > > > > >>>>>>> table
>>> >>> > > > > > >>>>>>>>>> would
>>> >>> > > > > > >>>>>>>>>>>>> need
>>> >>> > > > > > >>>>>>>>>>>>>>>> only a
>>> >>> > > > > > >>>>>>>>>>>>>>>>>>>> single
>>> >>> > > > > > >>>>>>>>>>>>>>>>>>>>>> metadata column which is the UID
>>> >>> > > > > > >> hash.
>>> >>> > > > > > >>>>>>>>>>>>>>>>>>>>>> WDYT?
>>> >>> > > > > > >>>>>>>>>>>>>>>>>>>>>> @zakelly, plz share your thoughts
>>> >>> > > > > > >> too.
>>> >>> > > > > > >>>>>>>>>>>>>>>>>>>>>>
>>> >>> > > > > > >>>>>>>>>>>>>>>>>>>>>>
>>> >>> > > > > > >>>>>>>>>>>>>>>>>>>>>>> If we opt to use metadata columns,
>>> >>> > > > > > >>>> every
>>> >>> > > > > > >>>>>>>> record
>>> >>> > > > > > >>>>>>>>>> in
>>> >>> > > > > > >>>>>>>>>>>> the
>>> >>> > > > > > >>>>>>>>>>>>>>> table
>>> >>> > > > > > >>>>>>>>>>>>>>>>>> would
>>> >>> > > > > > >>>>>>>>>>>>>>>>>>>> end
>>> >>> > > > > > >>>>>>>>>>>>>>>>>>>>> up
>>> >>> > > > > > >>>>>>>>>>>>>>>>>>>>>>> having identical values for these
>>> >>> > > > > > >>>>> columns
>>> >>> > > > > > >>>>>>>>> (please
>>> >>> > > > > > >>>>>>>>>>>>> correct
>>> >>> > > > > > >>>>>>>>>>>>>>> me
>>> >>> > > > > > >>>>>>>>>>>>>>>> if
>>> >>> > > > > > >>>>>>>>>>>>>>>>>> I’m
>>> >>> > > > > > >>>>>>>>>>>>>>>>>>>>>>> mistaken). On the other hand, the
>>> >>> > > > > > >>>> state
>>> >>> > > > > > >>>>>>>>> connector
>>> >>> > > > > > >>>>>>>>>>>>>> requires
>>> >>> > > > > > >>>>>>>>>>>>>>>>> users
>>> >>> > > > > > >>>>>>>>>>>>>>>>>> to
>>> >>> > > > > > >>>>>>>>>>>>>>>>>>>>>> specify
>>> >>> > > > > > >>>>>>>>>>>>>>>>>>>>>>> an operator UID or operator UID
>>> >>> > > > > > >> hash,
>>> >>> > > > > > >>>>>>> after
>>> >>> > > > > > >>>>>>>>> which
>>> >>> > > > > > >>>>>>>>>>> it
>>> >>> > > > > > >>>>>>>>>>>>>>> outputs
>>> >>> > > > > > >>>>>>>>>>>>>>>>>>>>> user-defined
>>> >>> > > > > > >>>>>>>>>>>>>>>>>>>>>>> values in its records. This
>>> >>> > > > > > >> approach
>>> >>> > > > > > >>>>> feels
>>> >>> > > > > > >>>>>>>>>> somewhat
>>> >>> > > > > > >>>>>>>>>>>>>>> redundant
>>> >>> > > > > > >>>>>>>>>>>>>>>>> to
>>> >>> > > > > > >>>>>>>>>>>>>>>>>>> me.
>>> >>> > > > > > >>>>>>>>>>>>>>>>>>>>>>>
>>> >>> > > > > > >>>>>>>>>>>>>>>>>>>>>>
>>> >>> > > > > > >>>>>>>>>>>>>>>>>>>>>> If we would add a new
>>> >>> > > > > > >>>> `savepoint-metadata`
>>> >>> > > > > > >>>>>>>>>> connector
>>> >>> > > > > > >>>>>>>>>>>> then
>>> >>> > > > > > >>>>>>>>>>>>>>> this
>>> >>> > > > > > >>>>>>>>>>>>>>>>> can
>>> >>> > > > > > >>>>>>>>>>>>>>>>>> be
>>> >>> > > > > > >>>>>>>>>>>>>>>>>>>>>> addressed.
>>> >>> > > > > > >>>>>>>>>>>>>>>>>>>>>> On the other hand UID and UID hash
>>> >>> > > > > > >> are
>>> >>> > > > > > >>>>>>> having
>>> >>> > > > > > >>>>>>>>>>> either-or
>>> >>> > > > > > >>>>>>>>>>>>>>>>>> relationship
>>> >>> > > > > > >>>>>>>>>>>>>>>>>>>> from
>>> >>> > > > > > >>>>>>>>>>>>>>>>>>>>>> config perspective,
>>> >>> > > > > > >>>>>>>>>>>>>>>>>>>>>> so when a user provides the UID then
>>> >>> > > > > > >>>>> he/she
>>> >>> > > > > > >>>>>>> can
>>> >>> > > > > > >>>>>>>>> be
>>> >>> > > > > > >>>>>>>>>>>>>> interested
>>> >>> > > > > > >>>>>>>>>>>>>>>> in
>>> >>> > > > > > >>>>>>>>>>>>>>>>>> the
>>> >>> > > > > > >>>>>>>>>>>>>>>>>>>> hash
>>> >>> > > > > > >>>>>>>>>>>>>>>>>>>>>> for further calculations
>>> >>> > > > > > >>>>>>>>>>>>>>>>>>>>>> (the whole Flink internals are
>>> >>> > > > > > >>>> depending
>>> >>> > > > > > >>>>> on
>>> >>> > > > > > >>>>>>> the
>>> >>> > > > > > >>>>>>>>>>> hash).
>>> >>> > > > > > >>>>>>>>>>>>>>> Printing
>>> >>> > > > > > >>>>>>>>>>>>>>>>> out
>>> >>> > > > > > >>>>>>>>>>>>>>>>>>> the
>>> >>> > > > > > >>>>>>>>>>>>>>>>>>>>>> human readable UID
>>> >>> > > > > > >>>>>>>>>>>>>>>>>>>>>> is an explicit requirement from the
>>> >>> > > > > > >>>> user
>>> >>> > > > > > >>>>>>> side
>>> >>> > > > > > >>>>>>>>>> because
>>> >>> > > > > > >>>>>>>>>>>>>> hashes
>>> >>> > > > > > >>>>>>>>>>>>>>>> are
>>> >>> > > > > > >>>>>>>>>>>>>>>>>> not
>>> >>> > > > > > >>>>>>>>>>>>>>>>>>>>> human
>>> >>> > > > > > >>>>>>>>>>>>>>>>>>>>>> readable.
>>> >>> > > > > > >>>>>>>>>>>>>>>>>>>>>>
>>> >>> > > > > > >>>>>>>>>>>>>>>>>>>>>>
>>> >>> > > > > > >>>>>>>>>>>>>>>>>>>>>>> 3. Handling LIST and MAP States in
>>> >>> > > > > > >>>> the
>>> >>> > > > > > >>>>>>> State
>>> >>> > > > > > >>>>>>>>>>>> Connector
>>> >>> > > > > > >>>>>>>>>>>>>>>>>>>>>>> I have concerns about how the
>>> >>> > > > > > >> current
>>> >>> > > > > > >>>>>>> design
>>> >>> > > > > > >>>>>>>>>>> handles
>>> >>> > > > > > >>>>>>>>>>>>> LIST
>>> >>> > > > > > >>>>>>>>>>>>>>> and
>>> >>> > > > > > >>>>>>>>>>>>>>>>> MAP
>>> >>> > > > > > >>>>>>>>>>>>>>>>>>>>> states.
>>> >>> > > > > > >>>>>>>>>>>>>>>>>>>>>>> Specifically, the state connector
>>> >>> > > > > > >>>> uses
>>> >>> > > > > > >>>>>>> Flink
>>> >>> > > > > > >>>>>>>>>> SQL’s
>>> >>> > > > > > >>>>>>>>>>>> MAP
>>> >>> > > > > > >>>>>>>>>>>>>> and
>>> >>> > > > > > >>>>>>>>>>>>>>>>> ARRAY
>>> >>> > > > > > >>>>>>>>>>>>>>>>>>>> types,
>>> >>> > > > > > >>>>>>>>>>>>>>>>>>>>>>> which implies that it attempts to
>>> >>> > > > > > >>>> load
>>> >>> > > > > > >>>>>>> entire
>>> >>> > > > > > >>>>>>>>> MAP
>>> >>> > > > > > >>>>>>>>>>> or
>>> >>> > > > > > >>>>>>>>>>>>> LIST
>>> >>> > > > > > >>>>>>>>>>>>>>>>> states
>>> >>> > > > > > >>>>>>>>>>>>>>>>>>> into
>>> >>> > > > > > >>>>>>>>>>>>>>>>>>>>>>> memory.
>>> >>> > > > > > >>>>>>>>>>>>>>>>>>>>>>>
>>> >>> > > > > > >>>>>>>>>>>>>>>>>>>>>>> However, in many real-world
>>> >>> > > > > > >>>> scenarios,
>>> >>> > > > > > >>>>>>> these
>>> >>> > > > > > >>>>>>>>>> states
>>> >>> > > > > > >>>>>>>>>>>> can
>>> >>> > > > > > >>>>>>>>>>>>>>> grow
>>> >>> > > > > > >>>>>>>>>>>>>>>>> very
>>> >>> > > > > > >>>>>>>>>>>>>>>>>>>>> large.
>>> >>> > > > > > >>>>>>>>>>>>>>>>>>>>>>> Typically, the state API addresses
>>> >>> > > > > > >>>> this
>>> >>> > > > > > >>>>> by
>>> >>> > > > > > >>>>>>>>>>> providing
>>> >>> > > > > > >>>>>>>>>>>> an
>>> >>> > > > > > >>>>>>>>>>>>>>>>> iterator
>>> >>> > > > > > >>>>>>>>>>>>>>>>>> to
>>> >>> > > > > > >>>>>>>>>>>>>>>>>>>>>>> traverse elements within the state
>>> >>> > > > > > >>>>>>>>> incrementally.
>>> >>> > > > > > >>>>>>>>>>> I’m
>>> >>> > > > > > >>>>>>>>>>>>>>> unsure
>>> >>> > > > > > >>>>>>>>>>>>>>>>>>> whether
>>> >>> > > > > > >>>>>>>>>>>>>>>>>>>>> I’ve
>>> >>> > > > > > >>>>>>>>>>>>>>>>>>>>>>> missed something in FLIP-496 or
>>> >>> > > > > > >>>>> FLIP-512,
>>> >>> > > > > > >>>>>>> but
>>> >>> > > > > > >>>>>>>>> it
>>> >>> > > > > > >>>>>>>>>>>> seems
>>> >>> > > > > > >>>>>>>>>>>>>> that
>>> >>> > > > > > >>>>>>>>>>>>>>>> the
>>> >>> > > > > > >>>>>>>>>>>>>>>>>>>> current
>>> >>> > > > > > >>>>>>>>>>>>>>>>>>>>>>> design might struggle with
>>> >>> > > > > > >>>> scalability
>>> >>> > > > > > >>>>> in
>>> >>> > > > > > >>>>>>>> such
>>> >>> > > > > > >>>>>>>>>>> cases.
>>> >>> > > > > > >>>>>>>>>>>>>>>>>>>>>>>
>>> >>> > > > > > >>>>>>>>>>>>>>>>>>>>>>
>>> >>> > > > > > >>>>>>>>>>>>>>>>>>>>>> You see it good, the current
>>> >>> > > > > > >>>>> implementation
>>> >>> > > > > > >>>>>>>> keeps
>>> >>> > > > > > >>>>>>>>>>> state
>>> >>> > > > > > >>>>>>>>>>>>>> for a
>>> >>> > > > > > >>>>>>>>>>>>>>>>>> single
>>> >>> > > > > > >>>>>>>>>>>>>>>>>>>> key
>>> >>> > > > > > >>>>>>>>>>>>>>>>>>>>> in
>>> >>> > > > > > >>>>>>>>>>>>>>>>>>>>>> memory.
>>> >>> > > > > > >>>>>>>>>>>>>>>>>>>>>> Back in the days we've considered
>>> >>> > > > > > >> this
>>> >>> > > > > > >>>>>>>> potential
>>> >>> > > > > > >>>>>>>>>>> issue
>>> >>> > > > > > >>>>>>>>>>>>> and
>>> >>> > > > > > >>>>>>>>>>>>>>>>>> concluded
>>> >>> > > > > > >>>>>>>>>>>>>>>>>>>> that
>>> >>> > > > > > >>>>>>>>>>>>>>>>>>>>>> this is not necessarily
>>> >>> > > > > > >>>>>>>>>>>>>>>>>>>>>> needed for the initial version and
>>> >>> > > > > > >> can
>>> >>> > > > > > >>>> be
>>> >>> > > > > > >>>>>>> done
>>> >>> > > > > > >>>>>>>>> as a
>>> >>> > > > > > >>>>>>>>>>>> later
>>> >>> > > > > > >>>>>>>>>>>>>>>>>>> improvement.
>>> >>> > > > > > >>>>>>>>>>>>>>>>>>>>>>
>>> >>> > > > > > >>>>>>>>>>>>>>>>>>>>>> Up until now we've seen even in TB
>>> >>> > > > > > >>>>>>> savepoints
>>> >>> > > > > > >>>>>>>>> that
>>> >>> > > > > > >>>>>>>>>>> the
>>> >>> > > > > > >>>>>>>>>>>>>> number
>>> >>> > > > > > >>>>>>>>>>>>>>>> of
>>> >>> > > > > > >>>>>>>>>>>>>>>>>> keys
>>> >>> > > > > > >>>>>>>>>>>>>>>>>>>> can
>>> >>> > > > > > >>>>>>>>>>>>>>>>>>>>>> be extremely huge but not the per
>>> key
>>> >>> > > > > > >>>>> state
>>> >>> > > > > > >>>>>>>>> itself.
>>> >>> > > > > > >>>>>>>>>>>>>>>>>>>>>> But again, this is a good feature
>>> >>> > > > > > >> as-is
>>> >>> > > > > > >>>>> and
>>> >>> > > > > > >>>>>>> can
>>> >>> > > > > > >>>>>>>>> be
>>> >>> > > > > > >>>>>>>>>>>>> handled
>>> >>> > > > > > >>>>>>>>>>>>>>> in a
>>> >>> > > > > > >>>>>>>>>>>>>>>>>>>> separate
>>> >>> > > > > > >>>>>>>>>>>>>>>>>>>>>> jira.
>>> >>> > > > > > >>>>>>>>>>>>>>>>>>>>>>
>>> >>> > > > > > >>>>>>>>>>>>>>>>>>>>>>
>>> >>> > > > > > >>>>>>>>>>>>>>>>>>>>>>>
>>> >>> > > > > > >>>>>>>>>>>>>>>>>>>>>>> Best,
>>> >>> > > > > > >>>>>>>>>>>>>>>>>>>>>>> Shengkai
>>> >>> > > > > > >>>>>>>>>>>>>>>>>>>>>>>
>>> >>> > > > > > >>>>>>>>>>>>>>>>>>>>>>> [1]
>>> >>> > > > > > >>>>>>>>>>>>>>>>>
>>> >>> > > > > > >>>>>>>>>
>>> >>> > > https://www.postgresql.org/docs/current/view-pg-tables.html
>>> >>> > > > > > >>>>>>>>>>>>>>>>>>>>>>> [2]
>>> >>> > > > > > >>>>>>>>>>>>>>>>>>>>>>>
>>> >>> > > > > > >>>>>>>>>>>>>>>>>>>>>>>
>>> >>> > > > > > >>>>>>>>>>>>>>>>>>>>>>
>>> >>> > > > > > >>>>>>>>>>>>>>>>>>>>>
>>> >>> > > > > > >>>>>>>>>>>>>>>>>>>>
>>> >>> > > > > > >>>>>>>>>>>>>>>>>>>
>>> >>> > > > > > >>>>>>>>>>>>>>>>>>
>>> >>> > > > > > >>>>>>>>>>>>>>>>>
>>> >>> > > > > > >>>>>>>>>>>>>>>>
>>> >>> > > > > > >>>>>>>>>>>>>>>
>>> >>> > > > > > >>>>>>>>>>>>>>
>>> >>> > > > > > >>>>>>>>>>>>>
>>> >>> > > > > > >>>>>>>>>>>>
>>> >>> > > > > > >>>>>>>>>>>
>>> >>> > > > > > >>>>>>>>>>
>>> >>> > > > > > >>>>>>>>>
>>> >>> > > > > > >>>>>>>>
>>> >>> > > > > > >>>>>>>
>>> >>> > > > > > >>>>>
>>> >>> > > > > > >>>>
>>> >>> > > > > > >>
>>> >>> > > > > >
>>> >>> > > > >
>>> >>> > > >
>>> >>> > >
>>> >>> >
>>> >>>
>>> https://dev.mysql.com/doc/refman/8.4/en/information-schema-tables-table.html
>>> >>> > > > > > >>>>>>>>>>>>>>>>>>>>>>>
>>> >>> > > > > > >>>>>>>>>>>>>>>>>>>>>>> Gabor Somogyi <
>>> >>> > > > > > >>>>> gabor.g.somo...@gmail.com>
>>> >>> > > > > > >>>>>>>>>>>> 于2025年3月3日周一
>>> >>> > > > > > >>>>>>>>>>>>>>>>> 02:00写道:
>>> >>> > > > > > >>>>>>>>>>>>>>>>>>>>>>>
>>> >>> > > > > > >>>>>>>>>>>>>>>>>>>>>>>> Hi Zakelly,
>>> >>> > > > > > >>>>>>>>>>>>>>>>>>>>>>>>
>>> >>> > > > > > >>>>>>>>>>>>>>>>>>>>>>>> In order to shoot for simplicity
>>> >>> > > > > > >>>>>>> `METADATA
>>> >>> > > > > > >>>>>>>>>>> VIRTUAL`
>>> >>> > > > > > >>>>>>>>>>>>> as
>>> >>> > > > > > >>>>>>>>>>>>>>> key
>>> >>> > > > > > >>>>>>>>>>>>>>>>>> words
>>> >>> > > > > > >>>>>>>>>>>>>>>>>>>> for
>>> >>> > > > > > >>>>>>>>>>>>>>>>>>>>>>>> definition is the target.
>>> >>> > > > > > >>>>>>>>>>>>>>>>>>>>>>>> When it's not super complex the
>>> >>> > > > > > >>>> latter
>>> >>> > > > > > >>>>>>> can
>>> >>> > > > > > >>>>>>>> be
>>> >>> > > > > > >>>>>>>>>>> added
>>> >>> > > > > > >>>>>>>>>>>>>> too.
>>> >>> > > > > > >>>>>>>>>>>>>>>>>>>>>>>>
>>> >>> > > > > > >>>>>>>>>>>>>>>>>>>>>>>> BR,
>>> >>> > > > > > >>>>>>>>>>>>>>>>>>>>>>>> G
>>> >>> > > > > > >>>>>>>>>>>>>>>>>>>>>>>>
>>> >>> > > > > > >>>>>>>>>>>>>>>>>>>>>>>>
>>> >>> > > > > > >>>>>>>>>>>>>>>>>>>>>>>> On Sun, Mar 2, 2025 at 3:37 PM
>>> >>> > > > > > >>>> Zakelly
>>> >>> > > > > > >>>>>>> Lan
>>> >>> > > > > > >>>>>>>> <
>>> >>> > > > > > >>>>>>>>>>>>>>>>>>> zakelly....@gmail.com>
>>> >>> > > > > > >>>>>>>>>>>>>>>>>>>>>>> wrote:
>>> >>> > > > > > >>>>>>>>>>>>>>>>>>>>>>>>
>>> >>> > > > > > >>>>>>>>>>>>>>>>>>>>>>>>> Hi Gabor,
>>> >>> > > > > > >>>>>>>>>>>>>>>>>>>>>>>>>
>>> >>> > > > > > >>>>>>>>>>>>>>>>>>>>>>>>> +1 for this.
>>> >>> > > > > > >>>>>>>>>>>>>>>>>>>>>>>>>
>>> >>> > > > > > >>>>>>>>>>>>>>>>>>>>>>>>> Will the metadata column use
>>> >>> > > > > > >>>>> `METADATA
>>> >>> > > > > > >>>>>>>>>> VIRTUAL`
>>> >>> > > > > > >>>>>>>>>>>> as
>>> >>> > > > > > >>>>>>>>>>>>>> key
>>> >>> > > > > > >>>>>>>>>>>>>>>>> words
>>> >>> > > > > > >>>>>>>>>>>>>>>>>>> for
>>> >>> > > > > > >>>>>>>>>>>>>>>>>>>>>>>>> definition, or `METADATA FROM
>>> >>> > > > > > >> xxx
>>> >>> > > > > > >>>>>>>> VIRTUAL`
>>> >>> > > > > > >>>>>>>>>> for
>>> >>> > > > > > >>>>>>>>>>>>>>> renaming,
>>> >>> > > > > > >>>>>>>>>>>>>>>>> just
>>> >>> > > > > > >>>>>>>>>>>>>>>>>>>> like
>>> >>> > > > > > >>>>>>>>>>>>>>>>>>>>>> the
>>> >>> > > > > > >>>>>>>>>>>>>>>>>>>>>>>>> Kafka table?
>>> >>> > > > > > >>>>>>>>>>>>>>>>>>>>>>>>>
>>> >>> > > > > > >>>>>>>>>>>>>>>>>>>>>>>>>
>>> >>> > > > > > >>>>>>>>>>>>>>>>>>>>>>>>> Best,
>>> >>> > > > > > >>>>>>>>>>>>>>>>>>>>>>>>> Zakelly
>>> >>> > > > > > >>>>>>>>>>>>>>>>>>>>>>>>>
>>> >>> > > > > > >>>>>>>>>>>>>>>>>>>>>>>>> On Sat, Mar 1, 2025 at 1:31 PM
>>> >>> > > > > > >>>> Gabor
>>> >>> > > > > > >>>>>>>>> Somogyi
>>> >>> > > > > > >>>>>>>>>> <
>>> >>> > > > > > >>>>>>>>>>>>>>>>>>>>>>> gabor.g.somo...@gmail.com>
>>> >>> > > > > > >>>>>>>>>>>>>>>>>>>>>>>>> wrote:
>>> >>> > > > > > >>>>>>>>>>>>>>>>>>>>>>>>>
>>> >>> > > > > > >>>>>>>>>>>>>>>>>>>>>>>>>> Hi All,
>>> >>> > > > > > >>>>>>>>>>>>>>>>>>>>>>>>>>
>>> >>> > > > > > >>>>>>>>>>>>>>>>>>>>>>>>>> I'd like to start a
>>> >>> > > > > > >> discussion
>>> >>> > > > > > >>>> of
>>> >>> > > > > > >>>>>>>>> FLIP-512:
>>> >>> > > > > > >>>>>>>>>>> Add
>>> >>> > > > > > >>>>>>>>>>>>>> meta
>>> >>> > > > > > >>>>>>>>>>>>>>>>>>>> information
>>> >>> > > > > > >>>>>>>>>>>>>>>>>>>>> to
>>> >>> > > > > > >>>>>>>>>>>>>>>>>>>>>>> SQL
>>> >>> > > > > > >>>>>>>>>>>>>>>>>>>>>>>>>> state connector [1].
>>> >>> > > > > > >>>>>>>>>>>>>>>>>>>>>>>>>> Feel free to add your
>>> >>> > > > > > >> thoughts
>>> >>> > > > > > >>>> to
>>> >>> > > > > > >>>>>>> make
>>> >>> > > > > > >>>>>>>>> this
>>> >>> > > > > > >>>>>>>>>>>>> feature
>>> >>> > > > > > >>>>>>>>>>>>>>>>> better.
>>> >>> > > > > > >>>>>>>>>>>>>>>>>>>>>>>>>>
>>> >>> > > > > > >>>>>>>>>>>>>>>>>>>>>>>>>> [1]
>>> >>> > > > > > >>>>>>>>>>>>>>>>>>>>>>>>>>
>>> >>> > > > > > >>>>>>>>>>>>>>>>>>>>>>>>>>
>>> >>> > > > > > >>>>>>>>>>>>>>>>>>>>>>>>>
>>> >>> > > > > > >>>>>>>>>>>>>>>>>>>>>>>>
>>> >>> > > > > > >>>>>>>>>>>>>>>>>>>>>>>
>>> >>> > > > > > >>>>>>>>>>>>>>>>>>>>>>
>>> >>> > > > > > >>>>>>>>>>>>>>>>>>>>>
>>> >>> > > > > > >>>>>>>>>>>>>>>>>>>>
>>> >>> > > > > > >>>>>>>>>>>>>>>>>>>
>>> >>> > > > > > >>>>>>>>>>>>>>>>>>
>>> >>> > > > > > >>>>>>>>>>>>>>>>>
>>> >>> > > > > > >>>>>>>>>>>>>>>>
>>> >>> > > > > > >>>>>>>>>>>>>>>
>>> >>> > > > > > >>>>>>>>>>>>>>
>>> >>> > > > > > >>>>>>>>>>>>>
>>> >>> > > > > > >>>>>>>>>>>>
>>> >>> > > > > > >>>>>>>>>>>
>>> >>> > > > > > >>>>>>>>>>
>>> >>> > > > > > >>>>>>>>>
>>> >>> > > > > > >>>>>>>>
>>> >>> > > > > > >>>>>>>
>>> >>> > > > > > >>>>>
>>> >>> > > > > > >>>>
>>> >>> > > > > > >>
>>> >>> > > > > >
>>> >>> > > > >
>>> >>> > > >
>>> >>> > >
>>> >>> >
>>> >>>
>>> https://cwiki.apache.org/confluence/display/FLINK/FLIP-512%3A+Add+meta+information+to+SQL+state+connector
>>> >>> > > > > > >>>>>>>>>>>>>>>>>>>>>>>>>>
>>> >>> > > > > > >>>>>>>>>>>>>>>>>>>>>>>>>> BR,
>>> >>> > > > > > >>>>>>>>>>>>>>>>>>>>>>>>>> G
>>> >>> > > > > > >>>>>>>>>>>>>>>>>>>>>>>>>>
>>> >>> > > > > > >>>>>>>>>>>>>>>>>>>>>>>>>
>>> >>> > > > > > >>>>>>>>>>>>>>>>>>>>>>>>
>>> >>> > > > > > >>>>>>>>>>>>>>>>>>>>>>>
>>> >>> > > > > > >>>>>>>>>>>>>>>>>>>>>>
>>> >>> > > > > > >>>>>>>>>>>>>>>>>>>>>
>>> >>> > > > > > >>>>>>>>>>>>>>>>>>>>
>>> >>> > > > > > >>>>>>>>>>>>>>>>>>>
>>> >>> > > > > > >>>>>>>>>>>>>>>>>>
>>> >>> > > > > > >>>>>>>>>>>>>>>>>
>>> >>> > > > > > >>>>>>>>>>>>>>>>
>>> >>> > > > > > >>>>>>>>>>>>>>>
>>> >>> > > > > > >>>>>>>>>>>>>>
>>> >>> > > > > > >>>>>>>>>>>>>
>>> >>> > > > > > >>>>>>>>>>>>
>>> >>> > > > > > >>>>>>>>>>>
>>> >>> > > > > > >>>>>>>>>>
>>> >>> > > > > > >>>>>>>>>
>>> >>> > > > > > >>>>>>>>
>>> >>> > > > > > >>>>>>>
>>> >>> > > > > > >>>>>>
>>> >>> > > > > > >>>>>
>>> >>> > > > > > >>>>
>>> >>> > > > > > >>>
>>> >>> > > > > > >>
>>> >>> > > > > >
>>> >>> > > > > >
>>> >>> > > > >
>>> >>> > > >
>>> >>> > >
>>> >>> >
>>> >>>
>>> >>
>>>
>>

Reply via email to