One more question about the FLIP.

I think the output schema is definitely a public API to users. If users use
the `CREATE FUNCTION` statement, is it means the class path is also a
public API to users. Alternatively, this is merely an experimental feature
and we don't have any promise about this function.

Best,
Shengkai

Shengkai Fang <fskm...@gmail.com> 于2025年3月28日周五 10:20写道:

> +1 to use PTF.
>
> I would like to raise a consideration regarding the usage implementation:
> Would it be necessary to allow users to utilize the CREATE FUNCTION
> statement for registering the PTF?
>
> Currently, Flink SQL supports letting external systems register modules
> and leverage these modules to centrally manage all function definitions.
> Given this architectural approach, I’m curious if the plan involves
> introducing additional functions in the future. If so, I would advocate for
> introducing a dedicated state module to centralize such management. This
> would empower users to:
>
> 1. Simply execute the LOAD MODULE command to load the required module, and
> 2. Directly invoke read_metadata thereafter.
>
> For more details about the module, please refer to this document[1].
>
> Best,
> Shengkai
>
> [1]
> https://nightlies.apache.org/flink/flink-docs-master/docs/dev/table/modules/
>
> Gabor Somogyi <gabor.g.somo...@gmail.com> 于2025年3月28日周五 00:26写道:
>
>> Just found out that PTF in batch mode is not supported, plz see the dev
>> mailing about it [1].
>>
>> [1] https://lists.apache.org/thread/ytm9m1qt4pq2q2gjngfktrn8vrlvkf07
>>
>> BR,
>> G
>>
>>
>> On Thu, Mar 27, 2025 at 3:38 PM Gabor Somogyi <gabor.g.somo...@gmail.com>
>> wrote:
>>
>> > In the meantime I've just updated the FLIP according to this to be
>> > optimistic 🙂
>> >
>> > BR,
>> > G
>> >
>> > On Thu, Mar 27, 2025 at 2:15 PM Gabor Somogyi <
>> gabor.g.somo...@gmail.com>
>> > wrote:
>> >
>> >> Considering all the facts I also +1 on PTF. Even if something is
>> missing
>> >> we can add later.
>> >>
>> >> @Zakelly Lan <zakelly....@gmail.com> @Shengkai Fang are you also on
>> the
>> >> same page or have something to add?
>> >>
>> >> BR,
>> >> G
>> >>
>> >>
>> >> On Thu, Mar 27, 2025 at 1:50 PM Lincoln Lee <lincoln.8...@gmail.com>
>> >> wrote:
>> >>
>> >>> +1 for PTF
>> >>>
>> >>> > Is it possible to describe such function to see the column
>> names/types?
>> >>>
>> >>> Although Flink SQL does not directly support this feature, users can
>> >>> achieve
>> >>> similar results with the help of `explain` syntax, e.g.
>> >>> 'explain select * from read_state_metadata(...)'
>> >>>
>> >>>
>> >>> Best,
>> >>> Lincoln Lee
>> >>>
>> >>>
>> >>> Gyula Fóra <gyula.f...@gmail.com> 于2025年3月27日周四 20:41写道:
>> >>>
>> >>> > Hey!
>> >>> >
>> >>> > I think the PTF approach strikes a great balance in simplicity and
>> the
>> >>> > capabilities that we get out of it.
>> >>> >
>> >>> > I think this could be a completely viable alternative to the
>> dedicated
>> >>> > connector, +1.
>> >>> >
>> >>> > Cheers,
>> >>> > Gyula
>> >>> >
>> >>> > On Thu, Mar 27, 2025 at 10:37 AM Shengkai Fang <fskm...@gmail.com>
>> >>> wrote:
>> >>> >
>> >>> > > Hi, Gabor.
>> >>> > >
>> >>> > > > Do I understand correctly that this is 2.x only feature and we
>> >>> can't
>> >>> > > backport it to 1.x line
>> >>> > >
>> >>> > > Yes. PTF is only supported in 2.x verison.
>> >>> > >
>> >>> > > > Is it possible to describe such function to see the column
>> >>> names/types?
>> >>> > >
>> >>> > > Flink SQL doesn't support this feature, but postgres[2] or
>> mysql[1]
>> >>> has
>> >>> > > similar feature.
>> >>> > >
>> >>> > > [1]
>> >>> https://dev.mysql.com/doc/refman/8.4/en/show-create-procedure.html
>> >>> > > [2]
>> >>> > >
>> >>> > >
>> >>> >
>> >>>
>> https://stackoverflow.com/questions/6898453/show-the-code-of-a-function-procedure-and-trigger-in-postgresql
>> >>> > >
>> >>> > > Best,
>> >>> > > Shengkai
>> >>> > >
>> >>> > >
>> >>> > > Gabor Somogyi <gabor.g.somo...@gmail.com> 于2025年3月27日周四 16:25写道:
>> >>> > >
>> >>> > > > Hi Shengkai,
>> >>> > > >
>> >>> > > > Thanks for your effort with the example, this looks promising.
>> >>> > > > I like the fact that users wouldn't need to sweat with complex
>> >>> create
>> >>> > > table
>> >>> > > > statements.
>> >>> > > >
>> >>> > > > Couple of questions:
>> >>> > > > * Do I understand correctly that this is 2.x only feature and we
>> >>> can't
>> >>> > > > backport it to 1.x line?
>> >>> > > > I'm not intended to do any backport, just would like to know the
>> >>> > > technical
>> >>> > > > constraints.
>> >>> > > > * Is it possible to describe such function to see the column
>> >>> > names/types?
>> >>> > > >
>> >>> > > > BR,
>> >>> > > > G
>> >>> > > >
>> >>> > > >
>> >>> > > > On Thu, Mar 27, 2025 at 3:17 AM Shengkai Fang <
>> fskm...@gmail.com>
>> >>> > wrote:
>> >>> > > >
>> >>> > > > > Many thanks for your reminder, Leonard. Here's the link I
>> >>> > mentioned[1].
>> >>> > > > >
>> >>> > > > > Best,
>> >>> > > > > Shengkai
>> >>> > > > >
>> >>> > > > > [1] https://github.com/apache/flink/pull/26358
>> >>> > > > >
>> >>> > > > > Leonard Xu <xbjt...@gmail.com> 于2025年3月27日周四 10:05写道:
>> >>> > > > >
>> >>> > > > > > Your link is broken, Shengkai
>> >>> > > > > >
>> >>> > > > > > Best,
>> >>> > > > > > Leonard
>> >>> > > > > >
>> >>> > > > > > > 2025年3月27日 10:01,Shengkai Fang <fskm...@gmail.com> 写道:
>> >>> > > > > > >
>> >>> > > > > > > Hi, All.
>> >>> > > > > > >
>> >>> > > > > > > I write a simple demo to illustrate my idea. Hope this
>> helps.
>> >>> > > > > > >
>> >>> > > > > > > Best,
>> >>> > > > > > > Shengkai
>> >>> > > > > > >
>> >>> > > > > > >
>> >>> > > > > >
>> >>> > > > >
>> >>> > > >
>> >>> > >
>> >>> >
>> >>>
>> https://github.com/apache/flink/compare/master...fsk119:flink:example?expand=1
>> >>> > > > > > >
>> >>> > > > > > > Gabor Somogyi <gabor.g.somo...@gmail.com> 于2025年3月26日周三
>> >>> 15:54写道:
>> >>> > > > > > >
>> >>> > > > > > >>> I'm fine with a seperate SQL connector for metadata, so
>> >>> maybe
>> >>> > we
>> >>> > > > > could
>> >>> > > > > > >> update the FLIP about our discussion?
>> >>> > > > > > >>
>> >>> > > > > > >> Sorry, I've forgotten this part. Yeah, no matter we
>> choose
>> >>> I'm
>> >>> > > going
>> >>> > > > > to
>> >>> > > > > > >> update the FLIP.
>> >>> > > > > > >>
>> >>> > > > > > >> G
>> >>> > > > > > >>
>> >>> > > > > > >>
>> >>> > > > > > >> On Wed, Mar 26, 2025 at 8:51 AM Gabor Somogyi <
>> >>> > > > > > gabor.g.somo...@gmail.com>
>> >>> > > > > > >> wrote:
>> >>> > > > > > >>
>> >>> > > > > > >>> Hi All,
>> >>> > > > > > >>>
>> >>> > > > > > >>> I've also lack of the knowledge of PTF so I've read just
>> >>> the
>> >>> > > > > motivation
>> >>> > > > > > >>> part:
>> >>> > > > > > >>>
>> >>> > > > > > >>> "The SQL 2016 standard introduced a way of defining
>> custom
>> >>> SQL
>> >>> > > > > > operators
>> >>> > > > > > >>> defined by ISO/IEC 19075-7:2021 (Part 7: Polymorphic
>> table
>> >>> > > > > functions).
>> >>> > > > > > >>> ~200 pages define how this new kind of function can
>> >>> consume and
>> >>> > > > > produce
>> >>> > > > > > >>> tables with various execution properties.
>> >>> > > > > > >>> Unfortunately, this part of the standard is not publicly
>> >>> > > > available."
>> >>> > > > > > >>>
>> >>> > > > > > >>> Of course we can take a look at some examples but do we
>> >>> really
>> >>> > > want
>> >>> > > > > to
>> >>> > > > > > >>> expose state data with this construct
>> >>> > > > > > >>> which is described in ~200 pages and part of the
>> standard
>> >>> is
>> >>> > not
>> >>> > > > > > publicly
>> >>> > > > > > >>> available? 🙂
>> >>> > > > > > >>> I mean the dataset is couple of rows and the use-case is
>> >>> join
>> >>> > > with
>> >>> > > > > > >> another
>> >>> > > > > > >>> table like with state data.
>> >>> > > > > > >>> If somebody can give advantages I would buy that but
>> from
>> >>> my
>> >>> > > > limited
>> >>> > > > > > >>> understanding this would be an overkill here.
>> >>> > > > > > >>>
>> >>> > > > > > >>> BR,
>> >>> > > > > > >>> G
>> >>> > > > > > >>>
>> >>> > > > > > >>>
>> >>> > > > > > >>> On Wed, Mar 26, 2025 at 8:28 AM Gyula Fóra <
>> >>> > gyula.f...@gmail.com
>> >>> > > >
>> >>> > > > > > wrote:
>> >>> > > > > > >>>
>> >>> > > > > > >>>> Hi Zakelly , Shengkai!
>> >>> > > > > > >>>>
>> >>> > > > > > >>>> I don't know too much about PTFs, it would be
>> interesting
>> >>> to
>> >>> > see
>> >>> > > > how
>> >>> > > > > > the
>> >>> > > > > > >>>> usage would look in practice.
>> >>> > > > > > >>>>
>> >>> > > > > > >>>> Do you have some mockup/example in mind how the PTF
>> would
>> >>> look
>> >>> > > for
>> >>> > > > > > >> example
>> >>> > > > > > >>>> when want to:
>> >>> > > > > > >>>> - Simply display/aggregate whats in the metadata
>> >>> > > > > > >>>> - Join keyed state with some metadata columns
>> >>> > > > > > >>>>
>> >>> > > > > > >>>> Thanks
>> >>> > > > > > >>>> Gyula
>> >>> > > > > > >>>>
>> >>> > > > > > >>>> On Wed, Mar 26, 2025 at 7:33 AM Zakelly Lan <
>> >>> > > > zakelly....@gmail.com>
>> >>> > > > > > >>>> wrote:
>> >>> > > > > > >>>>
>> >>> > > > > > >>>>> Hi everyone,
>> >>> > > > > > >>>>>
>> >>> > > > > > >>>>> I'm fine with a seperate SQL connector for metadata,
>> so
>> >>> maybe
>> >>> > > we
>> >>> > > > > > could
>> >>> > > > > > >>>>> update the FLIP about our discussion? And Shengkai
>> >>> provides a
>> >>> > > PTF
>> >>> > > > > > >>>>> implementation, does that also meet the requirement?
>> >>> > > > > > >>>>>
>> >>> > > > > > >>>>>
>> >>> > > > > > >>>>> Best,
>> >>> > > > > > >>>>> Zakelly
>> >>> > > > > > >>>>>
>> >>> > > > > > >>>>> On Thu, Mar 20, 2025 at 4:47 PM Gabor Somogyi <
>> >>> > > > > > >>>> gabor.g.somo...@gmail.com>
>> >>> > > > > > >>>>> wrote:
>> >>> > > > > > >>>>>
>> >>> > > > > > >>>>>> Hi All,
>> >>> > > > > > >>>>>>
>> >>> > > > > > >>>>>> @Zakelly: Gyula summarised it correctly what I meant
>> so
>> >>> > please
>> >>> > > > > treat
>> >>> > > > > > >>>> the
>> >>> > > > > > >>>>>> content as mine.
>> >>> > > > > > >>>>>> As an addition I'm not against to add CLI at all, I'm
>> >>> just
>> >>> > > > stating
>> >>> > > > > > >>>> that
>> >>> > > > > > >>>>> in
>> >>> > > > > > >>>>>> some cases like this, users would like to have
>> >>> > > > > > >>>>>> a self-serving solution where they can provide SQL
>> >>> > statements
>> >>> > > > > which
>> >>> > > > > > >>>> can
>> >>> > > > > > >>>>>> trigger alerts automatically.
>> >>> > > > > > >>>>>>
>> >>> > > > > > >>>>>> My personal opinion is that CLI would be beneficial
>> for
>> >>> > > several
>> >>> > > > > > >>>> cases. A
>> >>> > > > > > >>>>>> good example is when users want to restart job
>> >>> > > > > > >>>>>> from specific Kafka offsets which are persisted in a
>> >>> > > savepoint.
>> >>> > > > > For
>> >>> > > > > > >>>> such
>> >>> > > > > > >>>>>> scenario users are more than happy since they
>> >>> > > > > > >>>>>> expect manual intervention with full control. So all
>> in
>> >>> all
>> >>> > > one
>> >>> > > > > can
>> >>> > > > > > >>>> count
>> >>> > > > > > >>>>>> on my +1 when CLI FLIP would come up...
>> >>> > > > > > >>>>>>
>> >>> > > > > > >>>>>> BR,
>> >>> > > > > > >>>>>> G
>> >>> > > > > > >>>>>>
>> >>> > > > > > >>>>>>
>> >>> > > > > > >>>>>> On Thu, Mar 20, 2025 at 8:20 AM Gyula Fóra <
>> >>> > > > gyula.f...@gmail.com>
>> >>> > > > > > >>>> wrote:
>> >>> > > > > > >>>>>>
>> >>> > > > > > >>>>>>> Hi!
>> >>> > > > > > >>>>>>>
>> >>> > > > > > >>>>>>> @Zakelly Lan <zakelly....@gmail.com>
>> >>> > > > > > >>>>>>> I think what Gabor means is that users want to have
>> >>> > > predefined
>> >>> > > > > SQL
>> >>> > > > > > >>>>> scripts
>> >>> > > > > > >>>>>>> to perform state analysis tasks to debug/identify
>> >>> problems.
>> >>> > > > > > >>>>>>> Such as write a SQL script that joins the metadata
>> >>> table
>> >>> > with
>> >>> > > > the
>> >>> > > > > > >>>> state
>> >>> > > > > > >>>>>>> and
>> >>> > > > > > >>>>>>> do some analytics on it.
>> >>> > > > > > >>>>>>>
>> >>> > > > > > >>>>>>> If we have a meta table then the SQL script that
>> can do
>> >>> > this
>> >>> > > is
>> >>> > > > > > >> fixed
>> >>> > > > > > >>>>> and
>> >>> > > > > > >>>>>>> users can trigger this on demand by simply
>> providing a
>> >>> new
>> >>> > > > > > >> savepoint
>> >>> > > > > > >>>>> path.
>> >>> > > > > > >>>>>>>
>> >>> > > > > > >>>>>>> If we have a different mechanism to extract metadata
>> >>> that
>> >>> > is
>> >>> > > > not
>> >>> > > > > > >> SQL
>> >>> > > > > > >>>>>>> native
>> >>> > > > > > >>>>>>> then manual steps need to be executed and a custom
>> SQL
>> >>> > script
>> >>> > > > > would
>> >>> > > > > > >>>> need
>> >>> > > > > > >>>>>>> to
>> >>> > > > > > >>>>>>> be written that adds the manually extracted metadata
>> >>> into
>> >>> > the
>> >>> > > > > > >> script.
>> >>> > > > > > >>>>>>>
>> >>> > > > > > >>>>>>> Cheers,
>> >>> > > > > > >>>>>>> Gyula
>> >>> > > > > > >>>>>>>
>> >>> > > > > > >>>>>>> On Thu, Mar 20, 2025 at 4:32 AM Zakelly Lan <
>> >>> > > > > zakelly....@gmail.com
>> >>> > > > > > >>>
>> >>> > > > > > >>>>>>> wrote:
>> >>> > > > > > >>>>>>>
>> >>> > > > > > >>>>>>>> Hi all,
>> >>> > > > > > >>>>>>>>
>> >>> > > > > > >>>>>>>> Thanks for your answers! Getting everyone aligned
>> on
>> >>> this
>> >>> > > > topic
>> >>> > > > > > >> is
>> >>> > > > > > >>>>>>>> challenging, but it’s definitely worth the effort
>> >>> since it
>> >>> > > > will
>> >>> > > > > > >>>> help
>> >>> > > > > > >>>>>>>> streamline things moving forward.
>> >>> > > > > > >>>>>>>>
>> >>> > > > > > >>>>>>>> @Gabor are you saying that users are using some
>> >>> scripts to
>> >>> > > > > define
>> >>> > > > > > >>>> the
>> >>> > > > > > >>>>>>> SQL
>> >>> > > > > > >>>>>>>> metadata connector and get the information, right?
>> If
>> >>> so,
>> >>> > > > would
>> >>> > > > > a
>> >>> > > > > > >>>> CLI
>> >>> > > > > > >>>>>>> tool
>> >>> > > > > > >>>>>>>> be more convenient? It's easy to invoke and can get
>> >>> the
>> >>> > > result
>> >>> > > > > > >>>>> swiftly.
>> >>> > > > > > >>>>>>> And
>> >>> > > > > > >>>>>>>> there should be some other systems to track the
>> >>> checkpoint
>> >>> > > > > > >> lineage
>> >>> > > > > > >>>> and
>> >>> > > > > > >>>>>>>> analyze if there are outliers in metadata (e.g.
>> state
>> >>> size
>> >>> > > of
>> >>> > > > > one
>> >>> > > > > > >>>>>>> operator)
>> >>> > > > > > >>>>>>>> right? Well, maybe I missed something so please
>> >>> correct me
>> >>> > > if
>> >>> > > > > I'm
>> >>> > > > > > >>>>> wrong.
>> >>> > > > > > >>>>>>>>
>> >>> > > > > > >>>>>>>> I think the overall vision in Flink SQL is to
>> provide
>> >>> a
>> >>> > SQL
>> >>> > > > > > >> native
>> >>> > > > > > >>>>>>>>> environment where we can serve complex use-cases
>> >>> like you
>> >>> > > > would
>> >>> > > > > > >>>>> expect
>> >>> > > > > > >>>>>>>> in a
>> >>> > > > > > >>>>>>>>> regular database.
>> >>> > > > > > >>>>>>>>
>> >>> > > > > > >>>>>>>>
>> >>> > > > > > >>>>>>>> @Gyula Well, this is a good point. From the
>> >>> perspective of
>> >>> > > > > > >>>>> comprehensive
>> >>> > > > > > >>>>>>>> SQL experience, I'd +1 for treating metadata as
>> data.
>> >>> > > > Although I
>> >>> > > > > > >>>> doubt
>> >>> > > > > > >>>>>>> if
>> >>> > > > > > >>>>>>>> there is a need for processing metadata, I won't be
>> >>> > against
>> >>> > > a
>> >>> > > > > > >>>> separate
>> >>> > > > > > >>>>>>>> connector.
>> >>> > > > > > >>>>>>>>
>> >>> > > > > > >>>>>>>> Regarding the CLI tool, I still think it’s worth
>> >>> > > implementing.
>> >>> > > > > > >>>> Such a
>> >>> > > > > > >>>>>>> tool
>> >>> > > > > > >>>>>>>> could provide savepoint information before resuming
>> >>> from a
>> >>> > > > > > >>>> savepoint,
>> >>> > > > > > >>>>>>> which
>> >>> > > > > > >>>>>>>> would enhance the user experience in CLI-based
>> >>> workflows.
>> >>> > It
>> >>> > > > > > >> would
>> >>> > > > > > >>>> be
>> >>> > > > > > >>>>>>> good
>> >>> > > > > > >>>>>>>> if someone could implement this feature. We
>> shouldn’t
>> >>> > worry
>> >>> > > > > about
>> >>> > > > > > >>>>>>> whether
>> >>> > > > > > >>>>>>>> this tool might be retired in the future.
>> Regardless
>> >>> of
>> >>> > the
>> >>> > > > > > >>>> SQL-based
>> >>> > > > > > >>>>>>>> solution we eventually adopt, this capability will
>> >>> remain
>> >>> > > > > > >> essential
>> >>> > > > > > >>>>> for
>> >>> > > > > > >>>>>>> CLI
>> >>> > > > > > >>>>>>>> users. This is another topic.
>> >>> > > > > > >>>>>>>>
>> >>> > > > > > >>>>>>>>
>> >>> > > > > > >>>>>>>> Best,
>> >>> > > > > > >>>>>>>> Zakelly
>> >>> > > > > > >>>>>>>>
>> >>> > > > > > >>>>>>>>
>> >>> > > > > > >>>>>>>> On Thu, Mar 20, 2025 at 10:37 AM Shengkai Fang <
>> >>> > > > > > >> fskm...@gmail.com>
>> >>> > > > > > >>>>>>> wrote:
>> >>> > > > > > >>>>>>>>
>> >>> > > > > > >>>>>>>>> Hi.
>> >>> > > > > > >>>>>>>>>
>> >>> > > > > > >>>>>>>>> After reading the doc[1], I think Spark provides a
>> >>> > function
>> >>> > > > for
>> >>> > > > > > >>>>> users
>> >>> > > > > > >>>>>>> to
>> >>> > > > > > >>>>>>>>> consume the metadata from the savepoint.  In Flink
>> >>> SQL,
>> >>> > > > similar
>> >>> > > > > > >>>>>>>>> functionality is implemented through Polymorphic
>> >>> Table
>> >>> > > > > > >> Functions
>> >>> > > > > > >>>>>>> (PTF) as
>> >>> > > > > > >>>>>>>>> proposed in FLIP-440[2]. Below is a code
>> example[3]
>> >>> > > > > > >> illustrating
>> >>> > > > > > >>>>> this
>> >>> > > > > > >>>>>>>>> concept:
>> >>> > > > > > >>>>>>>>>
>> >>> > > > > > >>>>>>>>> ```
>> >>> > > > > > >>>>>>>>>    public static class ScalarArgsFunction extends
>> >>> > > > > > >>>>>>>>> TestProcessTableFunctionBase {
>> >>> > > > > > >>>>>>>>>        public void eval(Integer i, Boolean b) {
>> >>> > > > > > >>>>>>>>>            collectObjects(i, b);
>> >>> > > > > > >>>>>>>>>        }
>> >>> > > > > > >>>>>>>>>    }
>> >>> > > > > > >>>>>>>>> ```
>> >>> > > > > > >>>>>>>>>
>> >>> > > > > > >>>>>>>>> ```
>> >>> > > > > > >>>>>>>>> INSERT INTO sink SELECT * FROM f(i => 42, b =>
>> >>> > CAST('TRUE'
>> >>> > > AS
>> >>> > > > > > >>>>>>> BOOLEAN))
>> >>> > > > > > >>>>>>>>> ``
>> >>> > > > > > >>>>>>>>>
>> >>> > > > > > >>>>>>>>> So we can add a builtin function named
>> >>> > > `read_state_metadata`
>> >>> > > > to
>> >>> > > > > > >>>> read
>> >>> > > > > > >>>>>>>>> savepoint data.
>> >>> > > > > > >>>>>>>>>
>> >>> > > > > > >>>>>>>>> Best,
>> >>> > > > > > >>>>>>>>> Shengkai
>> >>> > > > > > >>>>>>>>>
>> >>> > > > > > >>>>>>>>> [1]
>> >>> > > > > > >>>>>>>>>
>> >>> > > > > > >>>>>>>>>
>> >>> > > > > > >>>>>>>>
>> >>> > > > > > >>>>>>>
>> >>> > > > > > >>>>>
>> >>> > > > > > >>>>
>> >>> > > > > > >>
>> >>> > > > > >
>> >>> > > > >
>> >>> > > >
>> >>> > >
>> >>> >
>> >>>
>> https://docs.databricks.com/aws/en/structured-streaming/read-state?language=SQL
>> >>> > > > > > >>>>>>>>> [2]
>> >>> > > > > > >>>>>>>>>
>> >>> > > > > > >>>>>>>>
>> >>> > > > > > >>>>>>>
>> >>> > > > > > >>>>>
>> >>> > > > > > >>>>
>> >>> > > > > > >>
>> >>> > > > > >
>> >>> > > > >
>> >>> > > >
>> >>> > >
>> >>> >
>> >>>
>> https://cwiki.apache.org/confluence/pages/viewpage.action?pageId=298781093
>> >>> > > > > > >>>>>>>>> [3]
>> >>> > > > > > >>>>>>>>>
>> >>> > > > > > >>>>>>>>>
>> >>> > > > > > >>>>>>>>
>> >>> > > > > > >>>>>>>
>> >>> > > > > > >>>>>
>> >>> > > > > > >>>>
>> >>> > > > > > >>
>> >>> > > > > >
>> >>> > > > >
>> >>> > > >
>> >>> > >
>> >>> >
>> >>>
>> https://github.com/apache/flink/blob/master/flink-table/flink-table-planner/src/test/java/org/apache/flink/table/planner/plan/nodes/exec/stream/ProcessTableFunctionTestPrograms.java#L140
>> >>> > > > > > >>>>>>>>>
>> >>> > > > > > >>>>>>>>> Gyula Fóra <gyula.f...@gmail.com> 于2025年3月19日周三
>> >>> 18:37写道:
>> >>> > > > > > >>>>>>>>>
>> >>> > > > > > >>>>>>>>>> Hi All!
>> >>> > > > > > >>>>>>>>>>
>> >>> > > > > > >>>>>>>>>> Thank you for the answers and concerns from
>> >>> everyone.
>> >>> > > > > > >>>>>>>>>>
>> >>> > > > > > >>>>>>>>>> On the CLI vs State Metadata Connector/Table
>> >>> question I
>> >>> > > > would
>> >>> > > > > > >>>> also
>> >>> > > > > > >>>>>>> like
>> >>> > > > > > >>>>>>>>> to
>> >>> > > > > > >>>>>>>>>> step back a little and look at the bigger
>> picture.
>> >>> > > > > > >>>>>>>>>>
>> >>> > > > > > >>>>>>>>>> I think the overall vision in Flink SQL is to
>> >>> provide a
>> >>> > > SQL
>> >>> > > > > > >>>> native
>> >>> > > > > > >>>>>>>>>> environment where we can serve complex use-cases
>> >>> like
>> >>> > you
>> >>> > > > > > >> would
>> >>> > > > > > >>>>>>> expect
>> >>> > > > > > >>>>>>>>> in a
>> >>> > > > > > >>>>>>>>>> regular database.
>> >>> > > > > > >>>>>>>>>> Most features, developments in the recent years
>> have
>> >>> > gone
>> >>> > > > > > >> this
>> >>> > > > > > >>>>> way.
>> >>> > > > > > >>>>>>>>>>
>> >>> > > > > > >>>>>>>>>> The State Metadata Table would be a natural and
>> >>> > > > > > >> straightforward
>> >>> > > > > > >>>>> fit
>> >>> > > > > > >>>>>>>> here.
>> >>> > > > > > >>>>>>>>>> So from my side, +1 for that.
>> >>> > > > > > >>>>>>>>>>
>> >>> > > > > > >>>>>>>>>> However I could understand if we are not ready to
>> >>> add a
>> >>> > > new
>> >>> > > > > > >>>>>>>>>> connector/format due to maintenance concerns
>> (and in
>> >>> > > general
>> >>> > > > > > >>>>> concern
>> >>> > > > > > >>>>>>>>> about
>> >>> > > > > > >>>>>>>>>> the design).
>> >>> > > > > > >>>>>>>>>> If that's the issue then we should spend more
>> time
>> >>> on
>> >>> > the
>> >>> > > > > > >>>> design
>> >>> > > > > > >>>>> to
>> >>> > > > > > >>>>>>> get
>> >>> > > > > > >>>>>>>>>> comfortable with the approach and seek feedback
>> >>> from the
>> >>> > > > > > >> wider
>> >>> > > > > > >>>>>>>> community
>> >>> > > > > > >>>>>>>>>>
>> >>> > > > > > >>>>>>>>>> I am -1 for the CLI/tooling approach as that will
>> >>> not
>> >>> > > > provide
>> >>> > > > > > >>>> the
>> >>> > > > > > >>>>>>>>>> featureset we are looking for that is not already
>> >>> > covered
>> >>> > > by
>> >>> > > > > > >>>> the
>> >>> > > > > > >>>>>>> Java
>> >>> > > > > > >>>>>>>>>> connector. And that approach would come with the
>> >>> same
>> >>> > > > > > >>>> maintenance
>> >>> > > > > > >>>>>>>>>> implications.
>> >>> > > > > > >>>>>>>>>>
>> >>> > > > > > >>>>>>>>>> Cheers
>> >>> > > > > > >>>>>>>>>> Gyula
>> >>> > > > > > >>>>>>>>>>
>> >>> > > > > > >>>>>>>>>>
>> >>> > > > > > >>>>>>>>>> On Wed, Mar 19, 2025 at 11:24 AM Gabor Somogyi <
>> >>> > > > > > >>>>>>>>> gabor.g.somo...@gmail.com>
>> >>> > > > > > >>>>>>>>>> wrote:
>> >>> > > > > > >>>>>>>>>>
>> >>> > > > > > >>>>>>>>>>> Hi Zaklely, Shengkai
>> >>> > > > > > >>>>>>>>>>>
>> >>> > > > > > >>>>>>>>>>> Several topics are going on so adding gist
>> answers
>> >>> to
>> >>> > > them.
>> >>> > > > > > >>>> When
>> >>> > > > > > >>>>>>> some
>> >>> > > > > > >>>>>>>>>> topic
>> >>> > > > > > >>>>>>>>>>> is not touched please highlight it.
>> >>> > > > > > >>>>>>>>>>>
>> >>> > > > > > >>>>>>>>>>> @Shengkai: I've read through all the previous
>> FLIPs
>> >>> > > related
>> >>> > > > > > >>>>>>> catalogs
>> >>> > > > > > >>>>>>>>> and
>> >>> > > > > > >>>>>>>>>> if
>> >>> > > > > > >>>>>>>>>>> we would like to keep the concepts there
>> >>> > > > > > >>>>>>>>>>> then one-to-one mapping relationship between
>> >>> savepoint
>> >>> > > and
>> >>> > > > > > >>>>> catalog
>> >>> > > > > > >>>>>>>> is a
>> >>> > > > > > >>>>>>>>>>> reasonable direction. In short I'm happy that
>> >>> > > > > > >>>>>>>>>>> you've highlighted this and agree as a whole.
>> I've
>> >>> > > written
>> >>> > > > > > >> it
>> >>> > > > > > >>>>> down
>> >>> > > > > > >>>>>>>>>>> previously, just want to double confirm that
>> state
>> >>> > > catalog
>> >>> > > > > > >> is
>> >>> > > > > > >>>>>>>>>>> essential and planned. When we reach this point
>> >>> then
>> >>> > your
>> >>> > > > > > >>>> input
>> >>> > > > > > >>>>> is
>> >>> > > > > > >>>>>>>> more
>> >>> > > > > > >>>>>>>>>>> than welcome.
>> >>> > > > > > >>>>>>>>>>>
>> >>> > > > > > >>>>>>>>>>> @Zakelly: We've tried the CLI and separate
>> library
>> >>> > > > > > >> approaches
>> >>> > > > > > >>>>> with
>> >>> > > > > > >>>>>>>>> users
>> >>> > > > > > >>>>>>>>>>> already and these are not something which is
>> >>> welcome
>> >>> > > > > > >> because
>> >>> > > > > > >>>> of
>> >>> > > > > > >>>>>>> the
>> >>> > > > > > >>>>>>>>>>> following:
>> >>> > > > > > >>>>>>>>>>> * Users want to have automated tasks and not
>> manual
>> >>> > > > > > >>>> CLI/library
>> >>> > > > > > >>>>>>>> output
>> >>> > > > > > >>>>>>>>>>> parsing. This can be hacked around but our
>> >>> experience
>> >>> > is
>> >>> > > > > > >>>>> negative
>> >>> > > > > > >>>>>>> on
>> >>> > > > > > >>>>>>>>> this
>> >>> > > > > > >>>>>>>>>>> because it's just brittle.
>> >>> > > > > > >>>>>>>>>>> * From development perspective It's way much
>> bigger
>> >>> > > effort
>> >>> > > > > > >>>> than
>> >>> > > > > > >>>>> a
>> >>> > > > > > >>>>>>>>>> connector
>> >>> > > > > > >>>>>>>>>>> (hard to test, packaging/version handling is and
>> >>> extra
>> >>> > > > > > >> layer
>> >>> > > > > > >>>> of
>> >>> > > > > > >>>>>>>>>> complexity,
>> >>> > > > > > >>>>>>>>>>> external FS authentication is pain for users,
>> >>> expecting
>> >>> > > > > > >> them
>> >>> > > > > > >>>> to
>> >>> > > > > > >>>>>>>>> download
>> >>> > > > > > >>>>>>>>>>> savepoints also)
>> >>> > > > > > >>>>>>>>>>> * Purely personal opinion but if we would find
>> >>> better
>> >>> > > ways
>> >>> > > > > > >>>> later
>> >>> > > > > > >>>>>>> then
>> >>> > > > > > >>>>>>>>>>> retire a CLI is not more lightweight than
>> retire a
>> >>> > > > > > >> connector
>> >>> > > > > > >>>>>>>>>>>
>> >>> > > > > > >>>>>>>>>>>> It would be great if you give some examples on
>> how
>> >>> > user
>> >>> > > > > > >>>> could
>> >>> > > > > > >>>>>>>>> leverage
>> >>> > > > > > >>>>>>>>>>> the separate connector to process the metadata.
>> >>> > > > > > >>>>>>>>>>>
>> >>> > > > > > >>>>>>>>>>> The most simplest cases:
>> >>> > > > > > >>>>>>>>>>> * give me the overgroving state uids
>> >>> > > > > > >>>>>>>>>>> * give me the not known (new or renamed) state
>> uids
>> >>> > > > > > >>>>>>>>>>> * give me the state uids where state size
>> >>> drastically
>> >>> > > > > > >> dropped
>> >>> > > > > > >>>>>>> compare
>> >>> > > > > > >>>>>>>>> to
>> >>> > > > > > >>>>>>>>>> a
>> >>> > > > > > >>>>>>>>>>> previous savepoint (accidental state loss)
>> >>> > > > > > >>>>>>>>>>>
>> >>> > > > > > >>>>>>>>>>> Since it was mentioned: as a general offtopic
>> >>> teaser,
>> >>> > > yeah
>> >>> > > > > > >> it
>> >>> > > > > > >>>>>>> would
>> >>> > > > > > >>>>>>>> be
>> >>> > > > > > >>>>>>>>>> good
>> >>> > > > > > >>>>>>>>>>> to have some sort of checkpoint/savepoint
>> lineage
>> >>> or
>> >>> > > > > > >> however
>> >>> > > > > > >>>> we
>> >>> > > > > > >>>>>>> call
>> >>> > > > > > >>>>>>>>> it.
>> >>> > > > > > >>>>>>>>>>> Since we've not yet reached this point there
>> are no
>> >>> > > > > > >> technical
>> >>> > > > > > >>>>>>>> details,
>> >>> > > > > > >>>>>>>>>> it's
>> >>> > > > > > >>>>>>>>>>> more like a vision. It's a common pattern that
>> >>> > > > > > >>>>>>>>>>> jobs are physically running but somehow the
>> state
>> >>> > > > > > >> processing
>> >>> > > > > > >>>> is
>> >>> > > > > > >>>>>>> stuck
>> >>> > > > > > >>>>>>>>> and
>> >>> > > > > > >>>>>>>>>>> it would be good to add some way to find it out
>> >>> > > > > > >>>> automatically.
>> >>> > > > > > >>>>>>>>>>> The important saying here is automation and not
>> >>> manual
>> >>> > > > > > >>>>> evaluation
>> >>> > > > > > >>>>>>>> since
>> >>> > > > > > >>>>>>>>>>> handling 10k+ jobs is just not allowing that.
>> >>> > > > > > >>>>>>>>>>>
>> >>> > > > > > >>>>>>>>>>> BR,
>> >>> > > > > > >>>>>>>>>>> G
>> >>> > > > > > >>>>>>>>>>>
>> >>> > > > > > >>>>>>>>>>>
>> >>> > > > > > >>>>>>>>>>> On Wed, Mar 19, 2025 at 6:46 AM Shengkai Fang <
>> >>> > > > > > >>>>> fskm...@gmail.com>
>> >>> > > > > > >>>>>>>>> wrote:
>> >>> > > > > > >>>>>>>>>>>
>> >>> > > > > > >>>>>>>>>>>> Hi, All.
>> >>> > > > > > >>>>>>>>>>>>
>> >>> > > > > > >>>>>>>>>>>> About State Catalog, I want to share more
>> thoughts
>> >>> > about
>> >>> > > > > > >>>> this.
>> >>> > > > > > >>>>>>>>>>>>
>> >>> > > > > > >>>>>>>>>>>> In the initial design concept, I understood
>> that a
>> >>> > > > > > >>>> savepoint
>> >>> > > > > > >>>>>>> and a
>> >>> > > > > > >>>>>>>>>> state
>> >>> > > > > > >>>>>>>>>>>> catalog have a one-to-one mapping relationship.
>> >>> Each
>> >>> > > > > > >>>> operator
>> >>> > > > > > >>>>>>>>>> corresponds
>> >>> > > > > > >>>>>>>>>>>> to a database, and the state of each operator
>> is
>> >>> > > > > > >>>> represented
>> >>> > > > > > >>>>> as
>> >>> > > > > > >>>>>>>>>>> individual
>> >>> > > > > > >>>>>>>>>>>> tables. The rationale behind this design is:
>> >>> > > > > > >>>>>>>>>>>>
>> >>> > > > > > >>>>>>>>>>>> *State Diversity*: An operator may involve
>> >>> multiple
>> >>> > > types
>> >>> > > > > > >>>> of
>> >>> > > > > > >>>>>>>> states.
>> >>> > > > > > >>>>>>>>>> For
>> >>> > > > > > >>>>>>>>>>>> example, in our VVR design, a "multi-join"
>> >>> operator
>> >>> > uses
>> >>> > > > > > >>>> keyed
>> >>> > > > > > >>>>>>>> states
>> >>> > > > > > >>>>>>>>>> for
>> >>> > > > > > >>>>>>>>>>>> two input streams and a broadcast state for the
>> >>> third
>> >>> > > > > > >>>> stream.
>> >>> > > > > > >>>>>>> This
>> >>> > > > > > >>>>>>>>>> makes
>> >>> > > > > > >>>>>>>>>>> it
>> >>> > > > > > >>>>>>>>>>>> challenging to represent all states of an
>> operator
>> >>> > > > > > >> within a
>> >>> > > > > > >>>>>>> single
>> >>> > > > > > >>>>>>>>>> table.
>> >>> > > > > > >>>>>>>>>>>> *Scalability*: Internally, an operator might
>> have
>> >>> > > > > > >> multiple
>> >>> > > > > > >>>>> keyed
>> >>> > > > > > >>>>>>>>> states
>> >>> > > > > > >>>>>>>>>>>> (e.g., value state and list state). However,
>> large
>> >>> > list
>> >>> > > > > > >>>> states
>> >>> > > > > > >>>>>>> may
>> >>> > > > > > >>>>>>>>> not
>> >>> > > > > > >>>>>>>>>>> fit
>> >>> > > > > > >>>>>>>>>>>> entirely in memory. To address this, we
>> recommend
>> >>> > > > > > >>>> implementing
>> >>> > > > > > >>>>>>> each
>> >>> > > > > > >>>>>>>>>> state
>> >>> > > > > > >>>>>>>>>>>> as a separate table.
>> >>> > > > > > >>>>>>>>>>>>
>> >>> > > > > > >>>>>>>>>>>> To resolve the loosely coupled relationships
>> >>> between
>> >>> > > > > > >>>> operator
>> >>> > > > > > >>>>>>>> states,
>> >>> > > > > > >>>>>>>>>> we
>> >>> > > > > > >>>>>>>>>>>> propose embedding predefined views within the
>> >>> catalog.
>> >>> > > > > > >>>> These
>> >>> > > > > > >>>>>>> views
>> >>> > > > > > >>>>>>>>>>> simplify
>> >>> > > > > > >>>>>>>>>>>> user understanding of operator implementations
>> and
>> >>> > > > > > >> provide
>> >>> > > > > > >>>> a
>> >>> > > > > > >>>>>>> more
>> >>> > > > > > >>>>>>>>>>> intuitive
>> >>> > > > > > >>>>>>>>>>>> perspective. For instance, a join operator may
>> >>> have
>> >>> > > > > > >>>> multiple
>> >>> > > > > > >>>>>>> state
>> >>> > > > > > >>>>>>>>>>>> implementations (depending on whether the join
>> key
>> >>> > > > > > >> includes
>> >>> > > > > > >>>>>>> unique
>> >>> > > > > > >>>>>>>>>>>> attributes), but users primarily care about the
>> >>> data
>> >>> > > > > > >>>>> associated
>> >>> > > > > > >>>>>>>> with
>> >>> > > > > > >>>>>>>>> a
>> >>> > > > > > >>>>>>>>>>>> specific join key across input streams.
>> >>> > > > > > >>>>>>>>>>>>
>> >>> > > > > > >>>>>>>>>>>> Returning to the one-to-one mapping between
>> >>> savepoints
>> >>> > > > > > >> and
>> >>> > > > > > >>>>>>>> catalogs,
>> >>> > > > > > >>>>>>>>> we
>> >>> > > > > > >>>>>>>>>>> aim
>> >>> > > > > > >>>>>>>>>>>> to manage multiple user state catalogs through
>> a
>> >>> > catalog
>> >>> > > > > > >>>>> store.
>> >>> > > > > > >>>>>>>> When
>> >>> > > > > > >>>>>>>>> a
>> >>> > > > > > >>>>>>>>>>> user
>> >>> > > > > > >>>>>>>>>>>> triggers a savepoint for a job on the platform:
>> >>> > > > > > >>>>>>>>>>>>
>> >>> > > > > > >>>>>>>>>>>> 1. The platform sends a REST request to the
>> >>> > JobManager.
>> >>> > > > > > >>>>>>>>>>>> 2. Simultaneously, it registers a new state
>> >>> catalog in
>> >>> > > > > > >> the
>> >>> > > > > > >>>>>>> catalog
>> >>> > > > > > >>>>>>>>>> store,
>> >>> > > > > > >>>>>>>>>>>> enabling immediate analysis of state data on
>> the
>> >>> > > > > > >> platform.
>> >>> > > > > > >>>>>>>>>>>> 3. Deleting a savepoint would also trigger the
>> >>> removal
>> >>> > > of
>> >>> > > > > > >>>> its
>> >>> > > > > > >>>>>>>>>> associated
>> >>> > > > > > >>>>>>>>>>>> catalog.
>> >>> > > > > > >>>>>>>>>>>>
>> >>> > > > > > >>>>>>>>>>>> This vision assumes that states are
>> >>> self-describing or
>> >>> > > > > > >>>> that a
>> >>> > > > > > >>>>>>> state
>> >>> > > > > > >>>>>>>>>>>> metaservice is introduced to analyze savepoint
>> >>> > > > > > >> structures.
>> >>> > > > > > >>>>>>>>>>>>
>> >>> > > > > > >>>>>>>>>>>>> How can users create logic to identify
>> >>> differences
>> >>> > > > > > >>>> between
>> >>> > > > > > >>>>>>>> multiple
>> >>> > > > > > >>>>>>>>>>>> savepoints?
>> >>> > > > > > >>>>>>>>>>>>
>> >>> > > > > > >>>>>>>>>>>> Since savepoints and state catalogs are
>> one-to-one
>> >>> > > > > > >> mapped,
>> >>> > > > > > >>>>> users
>> >>> > > > > > >>>>>>>> can
>> >>> > > > > > >>>>>>>>>>> query
>> >>> > > > > > >>>>>>>>>>>> metadata via their respective catalogs. For
>> >>> example:
>> >>> > > > > > >>>>>>>>>>>>
>> >>> > > > > > >>>>>>>>>>>> 1.
>> >>> > > > > > >>>>>
>> >>> `savepoint-${id}`.`system`.`metadata_table`.`<operator-name>`
>> >>> > > > > > >>>>>>>>>> provides
>> >>> > > > > > >>>>>>>>>>>> operator-specific metadata (e.g., state size,
>> >>> type).
>> >>> > > > > > >>>>>>>>>>>> 2. Comparing metadata tables (e.g., schema
>> >>> versions,
>> >>> > > > > > >> state
>> >>> > > > > > >>>>> entry
>> >>> > > > > > >>>>>>>>>> counts)
>> >>> > > > > > >>>>>>>>>>>> across catalogs reveals structural or
>> quantitative
>> >>> > > > > > >>>>> differences.
>> >>> > > > > > >>>>>>>>>>>> 3. For deeper analysis, users could write SQL
>> >>> queries
>> >>> > to
>> >>> > > > > > >>>>> compare
>> >>> > > > > > >>>>>>>>>> specific
>> >>> > > > > > >>>>>>>>>>>> state partitions or leverage the metaservice to
>> >>> track
>> >>> > > > > > >> state
>> >>> > > > > > >>>>>>>> evolution
>> >>> > > > > > >>>>>>>>>>>> (e.g., added/removed operators, modified state
>> >>> > > > > > >>>>> configurations).
>> >>> > > > > > >>>>>>>>>>>>
>> >>> > > > > > >>>>>>>>>>>> If we plan to introduce a state catalog in the
>> >>> > future, I
>> >>> > > > > > >>>> would
>> >>> > > > > > >>>>>>> lean
>> >>> > > > > > >>>>>>>>>>> toward
>> >>> > > > > > >>>>>>>>>>>> using metadata tables. If a utility tool can
>> >>> address
>> >>> > the
>> >>> > > > > > >>>>>>> challenges
>> >>> > > > > > >>>>>>>>> we
>> >>> > > > > > >>>>>>>>>>>> face, could we avoid introducing an additional
>> >>> > > connector?
>> >>> > > > > > >>>>>>>>>>>>
>> >>> > > > > > >>>>>>>>>>>> Best,
>> >>> > > > > > >>>>>>>>>>>> Shengkai
>> >>> > > > > > >>>>>>>>>>>>
>> >>> > > > > > >>>>>>>>>>>> Gyula Fóra <gyula.f...@gmail.com>
>> 于2025年3月17日周一
>> >>> > > 20:25写道:
>> >>> > > > > > >>>>>>>>>>>>
>> >>> > > > > > >>>>>>>>>>>>> Hi All!
>> >>> > > > > > >>>>>>>>>>>>>
>> >>> > > > > > >>>>>>>>>>>>> Without going into too much detail here are
>> my 2
>> >>> > cents
>> >>> > > > > > >>>>>>> regarding
>> >>> > > > > > >>>>>>>>> the
>> >>> > > > > > >>>>>>>>>>>>> virtual column / catalog metadata / table
>> >>> (connector)
>> >>> > > > > > >>>>>>> discussion
>> >>> > > > > > >>>>>>>>> for
>> >>> > > > > > >>>>>>>>>>> the
>> >>> > > > > > >>>>>>>>>>>>> State metadata.
>> >>> > > > > > >>>>>>>>>>>>>
>> >>> > > > > > >>>>>>>>>>>>> State metadata such as the types of states,
>> their
>> >>> > > > > > >>>>> properties,
>> >>> > > > > > >>>>>>>>> names,
>> >>> > > > > > >>>>>>>>>>>> sizes
>> >>> > > > > > >>>>>>>>>>>>> etc are all valuable information that can be
>> >>> used to
>> >>> > > > > > >>>> enrich
>> >>> > > > > > >>>>>>> the
>> >>> > > > > > >>>>>>>>>>>>> computations we do on state.
>> >>> > > > > > >>>>>>>>>>>>> We can either analyze it standalone (such as
>> >>> discover
>> >>> > > > > > >>>>>>> anomalies,
>> >>> > > > > > >>>>>>>>> for
>> >>> > > > > > >>>>>>>>>>>> large
>> >>> > > > > > >>>>>>>>>>>>> jobs with many states), across multiple
>> >>> savepoints
>> >>> > > > > > >>>> (discover
>> >>> > > > > > >>>>>>> how
>> >>> > > > > > >>>>>>>>>> state
>> >>> > > > > > >>>>>>>>>>>>> changed over time) or by joining it with
>> keyed or
>> >>> > > > > > >>>> non-keyed
>> >>> > > > > > >>>>>>> state
>> >>> > > > > > >>>>>>>>>> data
>> >>> > > > > > >>>>>>>>>>> to
>> >>> > > > > > >>>>>>>>>>>>> serve more complex queries on the state.
>> >>> > > > > > >>>>>>>>>>>>>
>> >>> > > > > > >>>>>>>>>>>>> The only solution that seems to serve all
>> these
>> >>> > > > > > >> use-cases
>> >>> > > > > > >>>>> and
>> >>> > > > > > >>>>>>>>>>>> requirements
>> >>> > > > > > >>>>>>>>>>>>> in a straightforward and SQL canonical way is
>> to
>> >>> > simply
>> >>> > > > > > >>>>> expose
>> >>> > > > > > >>>>>>>> the
>> >>> > > > > > >>>>>>>>>>> state
>> >>> > > > > > >>>>>>>>>>>>> metadata as a separate table. This is a
>> metadata
>> >>> > table
>> >>> > > > > > >>>> but
>> >>> > > > > > >>>>> you
>> >>> > > > > > >>>>>>>> can
>> >>> > > > > > >>>>>>>>>> also
>> >>> > > > > > >>>>>>>>>>>>> think of it as data table, it makes no
>> practical
>> >>> > > > > > >>>> difference
>> >>> > > > > > >>>>>>> here.
>> >>> > > > > > >>>>>>>>>>>>>
>> >>> > > > > > >>>>>>>>>>>>> Once we have a catalog later, the catalog can
>> >>> offer
>> >>> > > > > > >> this
>> >>> > > > > > >>>>> table
>> >>> > > > > > >>>>>>>> out
>> >>> > > > > > >>>>>>>>> of
>> >>> > > > > > >>>>>>>>>>> the
>> >>> > > > > > >>>>>>>>>>>>> box, the same way databases provide metadata
>> >>> tables.
>> >>> > > > > > >> For
>> >>> > > > > > >>>>> this
>> >>> > > > > > >>>>>>> to
>> >>> > > > > > >>>>>>>>> work
>> >>> > > > > > >>>>>>>>>>>>> however we need another, simpler connector
>> that
>> >>> > creates
>> >>> > > > > > >>>> this
>> >>> > > > > > >>>>>>>> table.
>> >>> > > > > > >>>>>>>>>>>>>
>> >>> > > > > > >>>>>>>>>>>>> +1 for state metadata as a separate
>> >>> connector/table,
>> >>> > > > > > >>>> instead
>> >>> > > > > > >>>>>>> of
>> >>> > > > > > >>>>>>>>>> adding
>> >>> > > > > > >>>>>>>>>>>>> virtual columns and adhoc catalog metadata
>> that
>> >>> is
>> >>> > hard
>> >>> > > > > > >>>> to
>> >>> > > > > > >>>>> use
>> >>> > > > > > >>>>>>>> in a
>> >>> > > > > > >>>>>>>>>>> large
>> >>> > > > > > >>>>>>>>>>>>> number of queries.
>> >>> > > > > > >>>>>>>>>>>>>
>> >>> > > > > > >>>>>>>>>>>>> Cheers,
>> >>> > > > > > >>>>>>>>>>>>> Gyula
>> >>> > > > > > >>>>>>>>>>>>>
>> >>> > > > > > >>>>>>>>>>>>> On Mon, Mar 17, 2025 at 12:44 PM Gabor
>> Somogyi <
>> >>> > > > > > >>>>>>>>>>>> gabor.g.somo...@gmail.com>
>> >>> > > > > > >>>>>>>>>>>>> wrote:
>> >>> > > > > > >>>>>>>>>>>>>
>> >>> > > > > > >>>>>>>>>>>>>> 1. State TTL for Value Columns
>> >>> > > > > > >>>>>>>>>>>>>>
>> >>> > > > > > >>>>>>>>>>>>>>> I’m planning on adding this, and we may
>> >>> collaborate
>> >>> > > > > > >>>> on
>> >>> > > > > > >>>>> it
>> >>> > > > > > >>>>>>> in
>> >>> > > > > > >>>>>>>>> the
>> >>> > > > > > >>>>>>>>>>>>> future.
>> >>> > > > > > >>>>>>>>>>>>>>
>> >>> > > > > > >>>>>>>>>>>>>> +1 on this, just ping me.
>> >>> > > > > > >>>>>>>>>>>>>>
>> >>> > > > > > >>>>>>>>>>>>>> 2. Metadata Table vs. Metadata Column
>> >>> > > > > > >>>>>>>>>>>>>>
>> >>> > > > > > >>>>>>>>>>>>>> After some code digging and POC all I can say
>> >>> that
>> >>> > > > > > >> with
>> >>> > > > > > >>>>>>> heavy
>> >>> > > > > > >>>>>>>>>> effort
>> >>> > > > > > >>>>>>>>>>> we
>> >>> > > > > > >>>>>>>>>>>>> can
>> >>> > > > > > >>>>>>>>>>>>>> maybe add such changes that we're able to
>> show
>> >>> > > > > > >> metadata
>> >>> > > > > > >>>>> of a
>> >>> > > > > > >>>>>>>>>>> savepoint
>> >>> > > > > > >>>>>>>>>>>>> from
>> >>> > > > > > >>>>>>>>>>>>>> catalog.
>> >>> > > > > > >>>>>>>>>>>>>> I'm not against that but from user
>> perspective
>> >>> this
>> >>> > > > > > >> has
>> >>> > > > > > >>>>>>> limited
>> >>> > > > > > >>>>>>>>>>> value,
>> >>> > > > > > >>>>>>>>>>>>> let
>> >>> > > > > > >>>>>>>>>>>>>> me explain why.
>> >>> > > > > > >>>>>>>>>>>>>>
>> >>> > > > > > >>>>>>>>>>>>>> From high level perspective I see the
>> following
>> >>> > > > > > >> which I
>> >>> > > > > > >>>>> see
>> >>> > > > > > >>>>>>>>>> agreement
>> >>> > > > > > >>>>>>>>>>>> on:
>> >>> > > > > > >>>>>>>>>>>>>> * We should have a catalog which is
>> >>> representing one
>> >>> > > > > > >> or
>> >>> > > > > > >>>>> more
>> >>> > > > > > >>>>>>>> jobs
>> >>> > > > > > >>>>>>>>>>>>> savepoint
>> >>> > > > > > >>>>>>>>>>>>>> data set (future plan)
>> >>> > > > > > >>>>>>>>>>>>>> * Savepoints should be able to be registered
>> in
>> >>> the
>> >>> > > > > > >>>>> catalog
>> >>> > > > > > >>>>>>>> which
>> >>> > > > > > >>>>>>>>>> are
>> >>> > > > > > >>>>>>>>>>>>> then
>> >>> > > > > > >>>>>>>>>>>>>> databases (future plan)
>> >>> > > > > > >>>>>>>>>>>>>> * There must be a possiblity to create tables
>> >>> from
>> >>> > > > > > >>>>> databases
>> >>> > > > > > >>>>>>>>> where
>> >>> > > > > > >>>>>>>>>>>> users
>> >>> > > > > > >>>>>>>>>>>>>> can read state data (exists already)
>> >>> > > > > > >>>>>>>>>>>>>>
>> >>> > > > > > >>>>>>>>>>>>>> In terms of metadata, If I understand
>> correctly
>> >>> then
>> >>> > > > > > >>>> the
>> >>> > > > > > >>>>>>>>> suggested
>> >>> > > > > > >>>>>>>>>>>>> approach
>> >>> > > > > > >>>>>>>>>>>>>> would be to access
>> >>> > > > > > >>>>>>>>>>>>>> it from the catalog describe command, right?
>> >>> Adding
>> >>> > > > > > >>>> that
>> >>> > > > > > >>>>>>> info
>> >>> > > > > > >>>>>>>>> when
>> >>> > > > > > >>>>>>>>>>>>> specific
>> >>> > > > > > >>>>>>>>>>>>>> database describe command
>> >>> > > > > > >>>>>>>>>>>>>> is executed could be done.
>> >>> > > > > > >>>>>>>>>>>>>>
>> >>> > > > > > >>>>>>>>>>>>>> The question is for instance how can users
>> >>> create
>> >>> > > > > > >> such
>> >>> > > > > > >>>> a
>> >>> > > > > > >>>>>>> logic
>> >>> > > > > > >>>>>>>>> that
>> >>> > > > > > >>>>>>>>>>>> tells
>> >>> > > > > > >>>>>>>>>>>>>> them what is
>> >>> > > > > > >>>>>>>>>>>>>> the difference between multiple savepoints?
>> >>> > > > > > >>>>>>>>>>>>>> Just to give some examples:
>> >>> > > > > > >>>>>>>>>>>>>> * per operator size changes between
>> savepoints
>> >>> > > > > > >>>>>>>>>>>>>> * show values from operator data where state
>> >>> size
>> >>> > > > > > >>>> reaches
>> >>> > > > > > >>>>> a
>> >>> > > > > > >>>>>>>>>> boundary
>> >>> > > > > > >>>>>>>>>>>>>> * in general "find which checkpoint ruined
>> >>> things"
>> >>> > is
>> >>> > > > > > >>>>> quite
>> >>> > > > > > >>>>>>>>> common
>> >>> > > > > > >>>>>>>>>>>>> pattern
>> >>> > > > > > >>>>>>>>>>>>>> What I would like to highlight here is that
>> from
>> >>> > > > > > >> Flink
>> >>> > > > > > >>>>>>> point of
>> >>> > > > > > >>>>>>>>>> view
>> >>> > > > > > >>>>>>>>>>>> the
>> >>> > > > > > >>>>>>>>>>>>>> metadata can be
>> >>> > > > > > >>>>>>>>>>>>>> considered as a static side output
>> information
>> >>> but
>> >>> > > > > > >> for
>> >>> > > > > > >>>>> users
>> >>> > > > > > >>>>>>>>> these
>> >>> > > > > > >>>>>>>>>>>> values
>> >>> > > > > > >>>>>>>>>>>>>> are actual real data
>> >>> > > > > > >>>>>>>>>>>>>> where logic is planned to build around.
>> >>> > > > > > >>>>>>>>>>>>>>
>> >>> > > > > > >>>>>>>>>>>>>>> The metadata is more like one-time
>> information
>> >>> > > > > > >>>> instead
>> >>> > > > > > >>>>> of
>> >>> > > > > > >>>>>>> a
>> >>> > > > > > >>>>>>>>>>> streaming
>> >>> > > > > > >>>>>>>>>>>>>> data that changes all
>> >>> > > > > > >>>>>>>>>>>>>> the time, so a single connector seems to be
>> an
>> >>> > > > > > >>>> overkill.
>> >>> > > > > > >>>>>>>>>>>>>>
>> >>> > > > > > >>>>>>>>>>>>>> State data is also static within a savepoint
>> and
>> >>> > > > > > >> that's
>> >>> > > > > > >>>>> the
>> >>> > > > > > >>>>>>>>> reason
>> >>> > > > > > >>>>>>>>>>> why
>> >>> > > > > > >>>>>>>>>>>>> the
>> >>> > > > > > >>>>>>>>>>>>>> state processor API is working in batch mode.
>> >>> > > > > > >>>>>>>>>>>>>> When we handle multiple checkpoints in a
>> >>> streaming
>> >>> > > > > > >>>> fashion
>> >>> > > > > > >>>>>>> then
>> >>> > > > > > >>>>>>>>>> this
>> >>> > > > > > >>>>>>>>>>>> can
>> >>> > > > > > >>>>>>>>>>>>> be
>> >>> > > > > > >>>>>>>>>>>>>> viewed from another angle.
>> >>> > > > > > >>>>>>>>>>>>>>
>> >>> > > > > > >>>>>>>>>>>>>> We can come up with more lightweight solution
>> >>> other
>> >>> > > > > > >>>> than a
>> >>> > > > > > >>>>>>> new
>> >>> > > > > > >>>>>>>>>>>> connector
>> >>> > > > > > >>>>>>>>>>>>>> but enforcing users to parse the catalog
>> >>> > > > > > >>>>>>>>>>>>>> describe command output in order to compare
>> >>> multiple
>> >>> > > > > > >>>>>>> savepoints
>> >>> > > > > > >>>>>>>>>>> doesn't
>> >>> > > > > > >>>>>>>>>>>>>> sound smooth user experience.
>> >>> > > > > > >>>>>>>>>>>>>> Honestly I've no other idea how exposing
>> >>> metadata as
>> >>> > > > > > >>>> real
>> >>> > > > > > >>>>>>> user
>> >>> > > > > > >>>>>>>>> data
>> >>> > > > > > >>>>>>>>>>> so
>> >>> > > > > > >>>>>>>>>>>>>> waiting on other approaches.
>> >>> > > > > > >>>>>>>>>>>>>>
>> >>> > > > > > >>>>>>>>>>>>>> BR,
>> >>> > > > > > >>>>>>>>>>>>>> G
>> >>> > > > > > >>>>>>>>>>>>>>
>> >>> > > > > > >>>>>>>>>>>>>>
>> >>> > > > > > >>>>>>>>>>>>>> On Thu, Mar 13, 2025 at 2:44 AM Shengkai
>> Fang <
>> >>> > > > > > >>>>>>>> fskm...@gmail.com
>> >>> > > > > > >>>>>>>>>>
>> >>> > > > > > >>>>>>>>>>>> wrote:
>> >>> > > > > > >>>>>>>>>>>>>>
>> >>> > > > > > >>>>>>>>>>>>>>> Looking forward to hearing the good news!
>> >>> > > > > > >>>>>>>>>>>>>>>
>> >>> > > > > > >>>>>>>>>>>>>>> Best,
>> >>> > > > > > >>>>>>>>>>>>>>> Shengkai
>> >>> > > > > > >>>>>>>>>>>>>>>
>> >>> > > > > > >>>>>>>>>>>>>>> Gabor Somogyi <gabor.g.somo...@gmail.com>
>> >>> > > > > > >>>> 于2025年3月12日周三
>> >>> > > > > > >>>>>>>>> 22:24写道:
>> >>> > > > > > >>>>>>>>>>>>>>>
>> >>> > > > > > >>>>>>>>>>>>>>>> Thanks for both the valuable input!
>> >>> > > > > > >>>>>>>>>>>>>>>>
>> >>> > > > > > >>>>>>>>>>>>>>>> Let me take a closer look at the
>> suggestions,
>> >>> > > > > > >> like
>> >>> > > > > > >>>> the
>> >>> > > > > > >>>>>>>>> Catalog
>> >>> > > > > > >>>>>>>>>>>>>>> capabilities
>> >>> > > > > > >>>>>>>>>>>>>>>> and possibility of embedding
>> TypeInformation
>> >>> or
>> >>> > > > > > >>>>>>>>>>>>>>>> StateDescriptor metadata directly into the
>> raw
>> >>> > > > > > >>>> state
>> >>> > > > > > >>>>>>>> files...
>> >>> > > > > > >>>>>>>>>>>>>>>>
>> >>> > > > > > >>>>>>>>>>>>>>>> BR,
>> >>> > > > > > >>>>>>>>>>>>>>>> G
>> >>> > > > > > >>>>>>>>>>>>>>>>
>> >>> > > > > > >>>>>>>>>>>>>>>>
>> >>> > > > > > >>>>>>>>>>>>>>>> On Wed, Mar 12, 2025 at 8:17 AM Shengkai
>> Fang
>> >>> <
>> >>> > > > > > >>>>>>>>>> fskm...@gmail.com
>> >>> > > > > > >>>>>>>>>>>>
>> >>> > > > > > >>>>>>>>>>>>>> wrote:
>> >>> > > > > > >>>>>>>>>>>>>>>>
>> >>> > > > > > >>>>>>>>>>>>>>>>> Thanks for Zakelly's clarification.
>> >>> > > > > > >>>>>>>>>>>>>>>>>
>> >>> > > > > > >>>>>>>>>>>>>>>>> 1. State TTL for Value Columns
>> >>> > > > > > >>>>>>>>>>>>>>>>>
>> >>> > > > > > >>>>>>>>>>>>>>>>> +1 to delay the discussion about this.
>> >>> > > > > > >>>>>>>>>>>>>>>>>
>> >>> > > > > > >>>>>>>>>>>>>>>>> 2. Metadata Table vs. Metadata Column
>> >>> > > > > > >>>>>>>>>>>>>>>>>
>> >>> > > > > > >>>>>>>>>>>>>>>>> I’d like to share my perspective on the
>> State
>> >>> > > > > > >>>>> Catalog
>> >>> > > > > > >>>>>>>>>> proposal.
>> >>> > > > > > >>>>>>>>>>>>> While
>> >>> > > > > > >>>>>>>>>>>>>>>>> introducing this capability is beneficial,
>> >>> > > > > > >> there
>> >>> > > > > > >>>> is
>> >>> > > > > > >>>>> a
>> >>> > > > > > >>>>>>>>>> blocker:
>> >>> > > > > > >>>>>>>>>>>> the
>> >>> > > > > > >>>>>>>>>>>>>>>> current
>> >>> > > > > > >>>>>>>>>>>>>>>>> StateBackend architecture does not permit
>> >>> > > > > > >>>> operators
>> >>> > > > > > >>>>> to
>> >>> > > > > > >>>>>>>>> encode
>> >>> > > > > > >>>>>>>>>>>>>>>>> TypeInformation into the state—it only
>> >>> > > > > > >> preserves
>> >>> > > > > > >>>> the
>> >>> > > > > > >>>>>>>>>>> Serializer.
>> >>> > > > > > >>>>>>>>>>>>> This
>> >>> > > > > > >>>>>>>>>>>>>>>>> limitation creates an asymmetry, as
>> operators
>> >>> > > > > > >>>> alone
>> >>> > > > > > >>>>>>>> retain
>> >>> > > > > > >>>>>>>>>>>>> knowledge
>> >>> > > > > > >>>>>>>>>>>>>> of
>> >>> > > > > > >>>>>>>>>>>>>>>> the
>> >>> > > > > > >>>>>>>>>>>>>>>>> data structure’s schema.
>> >>> > > > > > >>>>>>>>>>>>>>>>>
>> >>> > > > > > >>>>>>>>>>>>>>>>> To address this, I suggest allowing
>> operators
>> >>> > > > > > >> to
>> >>> > > > > > >>>>> embed
>> >>> > > > > > >>>>>>>>>>>>>> TypeInformation
>> >>> > > > > > >>>>>>>>>>>>>>> or
>> >>> > > > > > >>>>>>>>>>>>>>>>> StateDescriptor metadata directly into the
>> >>> raw
>> >>> > > > > > >>>> state
>> >>> > > > > > >>>>>>>> files.
>> >>> > > > > > >>>>>>>>>>> Such
>> >>> > > > > > >>>>>>>>>>>> a
>> >>> > > > > > >>>>>>>>>>>>>>> design
>> >>> > > > > > >>>>>>>>>>>>>>>>> would enable the Catalog to:
>> >>> > > > > > >>>>>>>>>>>>>>>>>
>> >>> > > > > > >>>>>>>>>>>>>>>>> 1. Parse state files and programmatically
>> >>> > > > > > >> derive
>> >>> > > > > > >>>> the
>> >>> > > > > > >>>>>>>> schema
>> >>> > > > > > >>>>>>>>>> and
>> >>> > > > > > >>>>>>>>>>>>>>>> structural
>> >>> > > > > > >>>>>>>>>>>>>>>>> guarantees for each state.
>> >>> > > > > > >>>>>>>>>>>>>>>>> 2. Leverage existing Flink Table
>> utilities,
>> >>> > > > > > >> such
>> >>> > > > > > >>>> as
>> >>> > > > > > >>>>>>>>>>>>>>>>> LegacyTypeInfoDataTypeConverter (in
>> >>> > > > > > >>>>>>>>>>>>>>> org.apache.flink.table.types.utils),
>> >>> > > > > > >>>>>>>>>>>>>>>> to
>> >>> > > > > > >>>>>>>>>>>>>>>>> bridge TypeInformation and DataType
>> >>> > > > > > >> conversions.
>> >>> > > > > > >>>>>>>>>>>>>>>>>
>> >>> > > > > > >>>>>>>>>>>>>>>>> If we can not store the TypeInformation or
>> >>> > > > > > >>>>>>>> StateDescriptor
>> >>> > > > > > >>>>>>>>>> into
>> >>> > > > > > >>>>>>>>>>>> the
>> >>> > > > > > >>>>>>>>>>>>>> raw
>> >>> > > > > > >>>>>>>>>>>>>>>>> state files, I am +1 for this FLIP to use
>> >>> > > > > > >>>> metadata
>> >>> > > > > > >>>>>>> column
>> >>> > > > > > >>>>>>>>> to
>> >>> > > > > > >>>>>>>>>>>>> retrieve
>> >>> > > > > > >>>>>>>>>>>>>>>>> information.
>> >>> > > > > > >>>>>>>>>>>>>>>>>
>> >>> > > > > > >>>>>>>>>>>>>>>>> Best,
>> >>> > > > > > >>>>>>>>>>>>>>>>> Shengkai
>> >>> > > > > > >>>>>>>>>>>>>>>>>
>> >>> > > > > > >>>>>>>>>>>>>>>>>
>> >>> > > > > > >>>>>>>>>>>>>>>>>
>> >>> > > > > > >>>>>>>>>>>>>>>>>
>> >>> > > > > > >>>>>>>>>>>>>>>>> Zakelly Lan <zakelly....@gmail.com>
>> >>> > > > > > >>>> 于2025年3月12日周三
>> >>> > > > > > >>>>>>>> 12:43写道:
>> >>> > > > > > >>>>>>>>>>>>>>>>>
>> >>> > > > > > >>>>>>>>>>>>>>>>>> Hi Gabor and Shengkai,
>> >>> > > > > > >>>>>>>>>>>>>>>>>>
>> >>> > > > > > >>>>>>>>>>>>>>>>>> Thanks for sharing your thoughts! This
>> is a
>> >>> > > > > > >>>> long
>> >>> > > > > > >>>>>>>>> discussion
>> >>> > > > > > >>>>>>>>>>> and
>> >>> > > > > > >>>>>>>>>>>>>> sorry
>> >>> > > > > > >>>>>>>>>>>>>>>> for
>> >>> > > > > > >>>>>>>>>>>>>>>>>> the late reply (I'm busy catching up with
>> >>> > > > > > >>>> release
>> >>> > > > > > >>>>>>> 2.0
>> >>> > > > > > >>>>>>>>> these
>> >>> > > > > > >>>>>>>>>>>>> days).
>> >>> > > > > > >>>>>>>>>>>>>>>>>>
>> >>> > > > > > >>>>>>>>>>>>>>>>>> 1. State TTL for Value Columns
>> >>> > > > > > >>>>>>>>>>>>>>>>>>
>> >>> > > > > > >>>>>>>>>>>>>>>>>>
>> >>> > > > > > >>>>>>>>>>>>>>>>>> Let me first clarify your thoughts to
>> ensure
>> >>> > > > > > >> I
>> >>> > > > > > >>>>>>>> understand
>> >>> > > > > > >>>>>>>>>>>>>> correctly.
>> >>> > > > > > >>>>>>>>>>>>>>>>> IIUC,
>> >>> > > > > > >>>>>>>>>>>>>>>>>> there is no persistent configuration for
>> >>> > > > > > >> state
>> >>> > > > > > >>>> TTL
>> >>> > > > > > >>>>>>> in
>> >>> > > > > > >>>>>>>> the
>> >>> > > > > > >>>>>>>>>>>>>> checkpoint.
>> >>> > > > > > >>>>>>>>>>>>>>>>> While
>> >>> > > > > > >>>>>>>>>>>>>>>>>> you can infer that TTL is enabled by
>> reading
>> >>> > > > > > >>>> the
>> >>> > > > > > >>>>>>>>>> serializer,
>> >>> > > > > > >>>>>>>>>>>> the
>> >>> > > > > > >>>>>>>>>>>>>>>>> checkpoint
>> >>> > > > > > >>>>>>>>>>>>>>>>>> itself only stores the last access time
>> for
>> >>> > > > > > >>>> each
>> >>> > > > > > >>>>>>> value.
>> >>> > > > > > >>>>>>>>> So
>> >>> > > > > > >>>>>>>>>>> the
>> >>> > > > > > >>>>>>>>>>>>> only
>> >>> > > > > > >>>>>>>>>>>>>>>> thing
>> >>> > > > > > >>>>>>>>>>>>>>>>>> we can show is the last access time for
>> each
>> >>> > > > > > >>>>> value.
>> >>> > > > > > >>>>>>> But
>> >>> > > > > > >>>>>>>>> it
>> >>> > > > > > >>>>>>>>>> is
>> >>> > > > > > >>>>>>>>>>>> not
>> >>> > > > > > >>>>>>>>>>>>>>>>> required
>> >>> > > > > > >>>>>>>>>>>>>>>>>> for all state backends to store this, as
>> >>> they
>> >>> > > > > > >>>> may
>> >>> > > > > > >>>>>>>>> directly
>> >>> > > > > > >>>>>>>>>>>> store
>> >>> > > > > > >>>>>>>>>>>>>> the
>> >>> > > > > > >>>>>>>>>>>>>>>>>> expired time. This will also increase the
>> >>> > > > > > >>>>>>> difficulty of
>> >>> > > > > > >>>>>>>>>>>>>>> implementation
>> >>> > > > > > >>>>>>>>>>>>>>>> &
>> >>> > > > > > >>>>>>>>>>>>>>>>>> maintenance.
>> >>> > > > > > >>>>>>>>>>>>>>>>>>
>> >>> > > > > > >>>>>>>>>>>>>>>>>> This once again reiterates the
>> importance of
>> >>> > > > > > >>>>> unified
>> >>> > > > > > >>>>>>>>>> metadata
>> >>> > > > > > >>>>>>>>>>>> for
>> >>> > > > > > >>>>>>>>>>>>>>>>>> checkpoints. I’m planning on adding this,
>> >>> and
>> >>> > > > > > >>>> we
>> >>> > > > > > >>>>> may
>> >>> > > > > > >>>>>>>>>>>> collaborate
>> >>> > > > > > >>>>>>>>>>>>> on
>> >>> > > > > > >>>>>>>>>>>>>>> it
>> >>> > > > > > >>>>>>>>>>>>>>>> in
>> >>> > > > > > >>>>>>>>>>>>>>>>>> the future.
>> >>> > > > > > >>>>>>>>>>>>>>>>>>
>> >>> > > > > > >>>>>>>>>>>>>>>>>> 2. Metadata Table vs. Metadata Column
>> >>> > > > > > >>>>>>>>>>>>>>>>>>
>> >>> > > > > > >>>>>>>>>>>>>>>>>>
>> >>> > > > > > >>>>>>>>>>>>>>>>>> I'm not in favor of adding a new
>> connector
>> >>> > > > > > >> for
>> >>> > > > > > >>>>>>>> metadata.
>> >>> > > > > > >>>>>>>>>> The
>> >>> > > > > > >>>>>>>>>>>>>> metadata
>> >>> > > > > > >>>>>>>>>>>>>>>> is
>> >>> > > > > > >>>>>>>>>>>>>>>>>> more like one-time information instead
>> of a
>> >>> > > > > > >>>>>>> streaming
>> >>> > > > > > >>>>>>>>> data
>> >>> > > > > > >>>>>>>>>>> that
>> >>> > > > > > >>>>>>>>>>>>>>> changes
>> >>> > > > > > >>>>>>>>>>>>>>>>> all
>> >>> > > > > > >>>>>>>>>>>>>>>>>> the time, so a single connector seems to
>> be
>> >>> > > > > > >> an
>> >>> > > > > > >>>>>>>> overkill.
>> >>> > > > > > >>>>>>>>> It
>> >>> > > > > > >>>>>>>>>>> is
>> >>> > > > > > >>>>>>>>>>>>> not
>> >>> > > > > > >>>>>>>>>>>>>>> easy
>> >>> > > > > > >>>>>>>>>>>>>>>>> to
>> >>> > > > > > >>>>>>>>>>>>>>>>>> withdraw a connector if we have a better
>> >>> > > > > > >>>> solution
>> >>> > > > > > >>>>> in
>> >>> > > > > > >>>>>>>>>> future.
>> >>> > > > > > >>>>>>>>>>>> I'm
>> >>> > > > > > >>>>>>>>>>>>>> not
>> >>> > > > > > >>>>>>>>>>>>>>>>>> familiar with current Catalog
>> capabilities,
>> >>> > > > > > >>>> and if
>> >>> > > > > > >>>>>>> it
>> >>> > > > > > >>>>>>>>> could
>> >>> > > > > > >>>>>>>>>>>>> extract
>> >>> > > > > > >>>>>>>>>>>>>>> and
>> >>> > > > > > >>>>>>>>>>>>>>>>>> show some operator-level information from
>> >>> > > > > > >>>>> savepoint,
>> >>> > > > > > >>>>>>>> that
>> >>> > > > > > >>>>>>>>>>> would
>> >>> > > > > > >>>>>>>>>>>>> be
>> >>> > > > > > >>>>>>>>>>>>>>>> great.
>> >>> > > > > > >>>>>>>>>>>>>>>>>>
>> >>> > > > > > >>>>>>>>>>>>>>>>>> If the Catalog can't do that, I would
>> >>> > > > > > >> consider
>> >>> > > > > > >>>> the
>> >>> > > > > > >>>>>>>>> current
>> >>> > > > > > >>>>>>>>>>> FLIP
>> >>> > > > > > >>>>>>>>>>>>> to
>> >>> > > > > > >>>>>>>>>>>>>>> be a
>> >>> > > > > > >>>>>>>>>>>>>>>>>> compromise solution.
>> >>> > > > > > >>>>>>>>>>>>>>>>>>
>> >>> > > > > > >>>>>>>>>>>>>>>>>> And if we have that unified metadata for
>> >>> > > > > > >>>>>>>>>> checkpoint/savepoint
>> >>> > > > > > >>>>>>>>>>>> in
>> >>> > > > > > >>>>>>>>>>>>>>>> future,
>> >>> > > > > > >>>>>>>>>>>>>>>>> we
>> >>> > > > > > >>>>>>>>>>>>>>>>>> may directly register savepoint in
>> catalog,
>> >>> > > > > > >> and
>> >>> > > > > > >>>>>>> create
>> >>> > > > > > >>>>>>>> a
>> >>> > > > > > >>>>>>>>>>> source
>> >>> > > > > > >>>>>>>>>>>>>>> without
>> >>> > > > > > >>>>>>>>>>>>>>>>>> specifying complex columns, as well as
>> >>> > > > > > >> describe
>> >>> > > > > > >>>>> the
>> >>> > > > > > >>>>>>>>>> savepoint
>> >>> > > > > > >>>>>>>>>>>>>> catalog
>> >>> > > > > > >>>>>>>>>>>>>>>> to
>> >>> > > > > > >>>>>>>>>>>>>>>>>> get the metadata. That's a good solution
>> in
>> >>> > > > > > >> my
>> >>> > > > > > >>>>> mind.
>> >>> > > > > > >>>>>>>>>>>>>>>>>>
>> >>> > > > > > >>>>>>>>>>>>>>>>>>
>> >>> > > > > > >>>>>>>>>>>>>>>>>> Best,
>> >>> > > > > > >>>>>>>>>>>>>>>>>> Zakelly
>> >>> > > > > > >>>>>>>>>>>>>>>>>>
>> >>> > > > > > >>>>>>>>>>>>>>>>>> On Wed, Mar 12, 2025 at 10:35 AM Shengkai
>> >>> > > > > > >> Fang
>> >>> > > > > > >>>> <
>> >>> > > > > > >>>>>>>>>>>>> fskm...@gmail.com>
>> >>> > > > > > >>>>>>>>>>>>>>>>> wrote:
>> >>> > > > > > >>>>>>>>>>>>>>>>>>
>> >>> > > > > > >>>>>>>>>>>>>>>>>>> Hi Gabor,
>> >>> > > > > > >>>>>>>>>>>>>>>>>>>
>> >>> > > > > > >>>>>>>>>>>>>>>>>>>> 2. Adding a new connector with
>> >>> > > > > > >>>>>>> `savepoint-metadata`
>> >>> > > > > > >>>>>>>>>>>>>>>>>>>
>> >>> > > > > > >>>>>>>>>>>>>>>>>>> I would argue against introducing a new
>> >>> > > > > > >>>>> connector
>> >>> > > > > > >>>>>>>> type
>> >>> > > > > > >>>>>>>>>>> named
>> >>> > > > > > >>>>>>>>>>>>>>>>>>> savepoint-metadata, as the existing
>> Catalog
>> >>> > > > > > >>>>>>> mechanism
>> >>> > > > > > >>>>>>>>> can
>> >>> > > > > > >>>>>>>>>>>>>>> inherently
>> >>> > > > > > >>>>>>>>>>>>>>>>>>> provide the necessary connector factory
>> >>> > > > > > >>>>>>> capabilities.
>> >>> > > > > > >>>>>>>>>> I’ve
>> >>> > > > > > >>>>>>>>>>>>>> detailed
>> >>> > > > > > >>>>>>>>>>>>>>>>> this
>> >>> > > > > > >>>>>>>>>>>>>>>>>>> proposal in branch[1]. Please take a
>> moment
>> >>> > > > > > >>>> to
>> >>> > > > > > >>>>>>> review
>> >>> > > > > > >>>>>>>>> it.
>> >>> > > > > > >>>>>>>>>>>>>>>>>>>
>> >>> > > > > > >>>>>>>>>>>>>>>>>>> If we introduce a connector named
>> >>> > > > > > >>>>>>>> `savepoint-metadata`,
>> >>> > > > > > >>>>>>>>>> it
>> >>> > > > > > >>>>>>>>>>>>> means
>> >>> > > > > > >>>>>>>>>>>>>>> user
>> >>> > > > > > >>>>>>>>>>>>>>>>> can
>> >>> > > > > > >>>>>>>>>>>>>>>>>>> create a temporary table with connector
>> >>> > > > > > >>>>>>>>>>> `savepoint-metadata`
>> >>> > > > > > >>>>>>>>>>>>> and
>> >>> > > > > > >>>>>>>>>>>>>>> the
>> >>> > > > > > >>>>>>>>>>>>>>>>>>> connector needs to check whether table
>> >>> > > > > > >>>> schema is
>> >>> > > > > > >>>>>>> same
>> >>> > > > > > >>>>>>>>> to
>> >>> > > > > > >>>>>>>>>>> the
>> >>> > > > > > >>>>>>>>>>>>>> schema
>> >>> > > > > > >>>>>>>>>>>>>>>> we
>> >>> > > > > > >>>>>>>>>>>>>>>>>>> proposed in the FLIP. On the other hand,
>> >>> > > > > > >> it's
>> >>> > > > > > >>>>> not
>> >>> > > > > > >>>>>>>> easy
>> >>> > > > > > >>>>>>>>>> work
>> >>> > > > > > >>>>>>>>>>>> for
>> >>> > > > > > >>>>>>>>>>>>>>>> others
>> >>> > > > > > >>>>>>>>>>>>>>>>> to
>> >>> > > > > > >>>>>>>>>>>>>>>>>>> users a metadata table with same schema.
>> >>> > > > > > >>>>>>>>>>>>>>>>>>>
>> >>> > > > > > >>>>>>>>>>>>>>>>>>> [1]
>> >>> > > > > > >>>>>>>>>>>>>>>>>>>
>> >>> > > > > > >>>>>>>>>>>>>>>>>>>
>> >>> > > > > > >>>>>>>>>>>>>>>>>>
>> >>> > > > > > >>>>>>>>>>>>>>>>>
>> >>> > > > > > >>>>>>>>>>>>>>>>
>> >>> > > > > > >>>>>>>>>>>>>>>
>> >>> > > > > > >>>>>>>>>>>>>>
>> >>> > > > > > >>>>>>>>>>>>>
>> >>> > > > > > >>>>>>>>>>>>
>> >>> > > > > > >>>>>>>>>>>
>> >>> > > > > > >>>>>>>>>>
>> >>> > > > > > >>>>>>>>>
>> >>> > > > > > >>>>>>>>
>> >>> > > > > > >>>>>>>
>> >>> > > > > > >>>>>
>> >>> > > > > > >>>>
>> >>> > > > > > >>
>> >>> > > > > >
>> >>> > > > >
>> >>> > > >
>> >>> > >
>> >>> >
>> >>>
>> https://github.com/apache/flink/compare/master...fsk119:flink:state-metadata?expand=1#diff-712a7bc92fe46c405fb0e61b475bb2a005cb7a72bab7df28bbb92744bcb5f465R63
>> >>> > > > > > >>>>>>>>>>>>>>>>>>>
>> >>> > > > > > >>>>>>>>>>>>>>>>>>> Best,
>> >>> > > > > > >>>>>>>>>>>>>>>>>>> Shengkai
>> >>> > > > > > >>>>>>>>>>>>>>>>>>>
>> >>> > > > > > >>>>>>>>>>>>>>>>>>> Gabor Somogyi <
>> gabor.g.somo...@gmail.com>
>> >>> > > > > > >>>>>>>>> 于2025年3月11日周二
>> >>> > > > > > >>>>>>>>>>>>> 16:56写道:
>> >>> > > > > > >>>>>>>>>>>>>>>>>>>
>> >>> > > > > > >>>>>>>>>>>>>>>>>>>> Hi Shengkai,
>> >>> > > > > > >>>>>>>>>>>>>>>>>>>>
>> >>> > > > > > >>>>>>>>>>>>>>>>>>>>> 1. State TTL for Value Columns
>> >>> > > > > > >>>>>>>>>>>>>>>>>>>>
>> >>> > > > > > >>>>>>>>>>>>>>>>>>>> From directional perspective I agree
>> your
>> >>> > > > > > >>>> idea
>> >>> > > > > > >>>>>>> how
>> >>> > > > > > >>>>>>>> it
>> >>> > > > > > >>>>>>>>>> can
>> >>> > > > > > >>>>>>>>>>>> be
>> >>> > > > > > >>>>>>>>>>>>>>>>>> implemented.
>> >>> > > > > > >>>>>>>>>>>>>>>>>>>> Previously I've mentioned that TTL
>> >>> > > > > > >>>> information
>> >>> > > > > > >>>>>>> is
>> >>> > > > > > >>>>>>>> not
>> >>> > > > > > >>>>>>>>>>>> exposed
>> >>> > > > > > >>>>>>>>>>>>>> on
>> >>> > > > > > >>>>>>>>>>>>>>>> the
>> >>> > > > > > >>>>>>>>>>>>>>>>>>> state
>> >>> > > > > > >>>>>>>>>>>>>>>>>>>> processor API (which the SQL state
>> >>> > > > > > >>>> connector
>> >>> > > > > > >>>>>>> uses
>> >>> > > > > > >>>>>>>> to
>> >>> > > > > > >>>>>>>>>> read
>> >>> > > > > > >>>>>>>>>>>>> data)
>> >>> > > > > > >>>>>>>>>>>>>>>>>>>> and unless somebody show me the
>> opposite
>> >>> > > > > > >>>> this
>> >>> > > > > > >>>>>>> FLIP
>> >>> > > > > > >>>>>>>> is
>> >>> > > > > > >>>>>>>>>> not
>> >>> > > > > > >>>>>>>>>>>>> going
>> >>> > > > > > >>>>>>>>>>>>>>> to
>> >>> > > > > > >>>>>>>>>>>>>>>>>>> address
>> >>> > > > > > >>>>>>>>>>>>>>>>>>>> this to avoid feature creep. Our users
>> >>> > > > > > >> are
>> >>> > > > > > >>>>> also
>> >>> > > > > > >>>>>>>>>>> interested
>> >>> > > > > > >>>>>>>>>>>> in
>> >>> > > > > > >>>>>>>>>>>>>> TTL
>> >>> > > > > > >>>>>>>>>>>>>>>> so
>> >>> > > > > > >>>>>>>>>>>>>>>>>>>> sooner or later we're going to expose
>> it,
>> >>> > > > > > >>>> this
>> >>> > > > > > >>>>>>> is
>> >>> > > > > > >>>>>>>>>> matter
>> >>> > > > > > >>>>>>>>>>> of
>> >>> > > > > > >>>>>>>>>>>>>>>>> scheduling.
>> >>> > > > > > >>>>>>>>>>>>>>>>>>>>
>> >>> > > > > > >>>>>>>>>>>>>>>>>>>>> 2. Adding a new connector with
>> >>> > > > > > >>>>>>>> `savepoint-metadata`
>> >>> > > > > > >>>>>>>>>>>>>>>>>>>>
>> >>> > > > > > >>>>>>>>>>>>>>>>>>>> Not sure I understand your point at all
>> >>> > > > > > >>>>> related
>> >>> > > > > > >>>>>>>>>>>> StateCatalog.
>> >>> > > > > > >>>>>>>>>>>>>>> First
>> >>> > > > > > >>>>>>>>>>>>>>>>> of
>> >>> > > > > > >>>>>>>>>>>>>>>>>>> all
>> >>> > > > > > >>>>>>>>>>>>>>>>>>>> I can't agree more that StateCatalog is
>> >>> > > > > > >>>> needed
>> >>> > > > > > >>>>>>> and
>> >>> > > > > > >>>>>>>>> is a
>> >>> > > > > > >>>>>>>>>>>>> planned
>> >>> > > > > > >>>>>>>>>>>>>>>>>> building
>> >>> > > > > > >>>>>>>>>>>>>>>>>>>> block in an upcoming
>> >>> > > > > > >>>>>>>>>>>>>>>>>>>> FLIP but not sure how can it help now?
>> No
>> >>> > > > > > >>>>> matter
>> >>> > > > > > >>>>>>>>> what,
>> >>> > > > > > >>>>>>>>>>> your
>> >>> > > > > > >>>>>>>>>>>>>>>> knowledge
>> >>> > > > > > >>>>>>>>>>>>>>>>>> is
>> >>> > > > > > >>>>>>>>>>>>>>>>>>>> essential when we add StateCatalog. Let
>> >>> > > > > > >> me
>> >>> > > > > > >>>>>>> expose
>> >>> > > > > > >>>>>>>> my
>> >>> > > > > > >>>>>>>>>>>>>>> understanding
>> >>> > > > > > >>>>>>>>>>>>>>>> in
>> >>> > > > > > >>>>>>>>>>>>>>>>>>> this
>> >>> > > > > > >>>>>>>>>>>>>>>>>>>> area:
>> >>> > > > > > >>>>>>>>>>>>>>>>>>>> * First we need create table statements
>> >>> > > > > > >> to
>> >>> > > > > > >>>>>>> access
>> >>> > > > > > >>>>>>>>> state
>> >>> > > > > > >>>>>>>>>>>> data
>> >>> > > > > > >>>>>>>>>>>>>> and
>> >>> > > > > > >>>>>>>>>>>>>>>>>> metadata
>> >>> > > > > > >>>>>>>>>>>>>>>>>>>> * When we have that then we can add
>> >>> > > > > > >>>>> StateCatalog
>> >>> > > > > > >>>>>>>>> which
>> >>> > > > > > >>>>>>>>>>>> could
>> >>> > > > > > >>>>>>>>>>>>>>>>>> potentially
>> >>> > > > > > >>>>>>>>>>>>>>>>>>>> ease the life of users by for ex.
>> giving
>> >>> > > > > > >>>>>>>>> off-the-shelf
>> >>> > > > > > >>>>>>>>>>>> tables
>> >>> > > > > > >>>>>>>>>>>>>>>> without
>> >>> > > > > > >>>>>>>>>>>>>>>>>>>> sweating with create table statements
>> >>> > > > > > >>>>>>>>>>>>>>>>>>>>
>> >>> > > > > > >>>>>>>>>>>>>>>>>>>> User expectations:
>> >>> > > > > > >>>>>>>>>>>>>>>>>>>> * See state data (this is fulfilled
>> with
>> >>> > > > > > >>>> the
>> >>> > > > > > >>>>>>>> existing
>> >>> > > > > > >>>>>>>>>>>>>> connector)
>> >>> > > > > > >>>>>>>>>>>>>>>>>>>> * See metadata about state data like
>> TTL
>> >>> > > > > > >>>> (this
>> >>> > > > > > >>>>>>> can
>> >>> > > > > > >>>>>>>> be
>> >>> > > > > > >>>>>>>>>>> added
>> >>> > > > > > >>>>>>>>>>>>> as
>> >>> > > > > > >>>>>>>>>>>>>>>>> metadata
>> >>> > > > > > >>>>>>>>>>>>>>>>>>>> column as you suggested since it
>> belongs
>> >>> > > > > > >> to
>> >>> > > > > > >>>>> the
>> >>> > > > > > >>>>>>>> data)
>> >>> > > > > > >>>>>>>>>>>>>>>>>>>> * See metadata about operators (this
>> can
>> >>> > > > > > >> be
>> >>> > > > > > >>>>>>> added
>> >>> > > > > > >>>>>>>>> from
>> >>> > > > > > >>>>>>>>>>>>>>>>>>> savepoint-metadata)
>> >>> > > > > > >>>>>>>>>>>>>>>>>>>>
>> >>> > > > > > >>>>>>>>>>>>>>>>>>>> Important to highlight that state data
>> >>> > > > > > >>>> table
>> >>> > > > > > >>>>>>> format
>> >>> > > > > > >>>>>>>>>>> differs
>> >>> > > > > > >>>>>>>>>>>>>> from
>> >>> > > > > > >>>>>>>>>>>>>>>>> state
>> >>> > > > > > >>>>>>>>>>>>>>>>>>>> metadata table format. Namely one table
>> >>> > > > > > >> has
>> >>> > > > > > >>>>> rows
>> >>> > > > > > >>>>>>>> for
>> >>> > > > > > >>>>>>>>>>> state
>> >>> > > > > > >>>>>>>>>>>>>> values
>> >>> > > > > > >>>>>>>>>>>>>>>> and
>> >>> > > > > > >>>>>>>>>>>>>>>>>>>> another has rows for operators, right?
>> >>> > > > > > >>>>>>>>>>>>>>>>>>>> I think that's the reason why you've
>> >>> > > > > > >>>>> pinpointed
>> >>> > > > > > >>>>>>> out
>> >>> > > > > > >>>>>>>>>> that
>> >>> > > > > > >>>>>>>>>>>> the
>> >>> > > > > > >>>>>>>>>>>>>>>>> suggested
>> >>> > > > > > >>>>>>>>>>>>>>>>>>>> metadata columns are somewhat clunky.
>> >>> > > > > > >>>>>>>>>>>>>>>>>>>>
>> >>> > > > > > >>>>>>>>>>>>>>>>>>>> As a conclusion I agree to add
>> >>> > > > > > >>>>> ${state-name}_ttl
>> >>> > > > > > >>>>>>>>>> metadata
>> >>> > > > > > >>>>>>>>>>>>>> column
>> >>> > > > > > >>>>>>>>>>>>>>>>> later
>> >>> > > > > > >>>>>>>>>>>>>>>>>> on
>> >>> > > > > > >>>>>>>>>>>>>>>>>>>> since it belongs to the state value and
>> >>> > > > > > >>>>> adding a
>> >>> > > > > > >>>>>>>> new
>> >>> > > > > > >>>>>>>>>>> table
>> >>> > > > > > >>>>>>>>>>>>> type
>> >>> > > > > > >>>>>>>>>>>>>>>> (like
>> >>> > > > > > >>>>>>>>>>>>>>>>>> you
>> >>> > > > > > >>>>>>>>>>>>>>>>>>>> suggested similar to PG [1])
>> >>> > > > > > >>>>>>>>>>>>>>>>>>>> for metadata. Please see how Spark does
>> >>> > > > > > >>>> that
>> >>> > > > > > >>>>> too
>> >>> > > > > > >>>>>>>> [2].
>> >>> > > > > > >>>>>>>>>>>>>>>>>>>>
>> >>> > > > > > >>>>>>>>>>>>>>>>>>>> If you have better approach then please
>> >>> > > > > > >>>>>>> elaborate
>> >>> > > > > > >>>>>>>>> with
>> >>> > > > > > >>>>>>>>>>> more
>> >>> > > > > > >>>>>>>>>>>>>>> details
>> >>> > > > > > >>>>>>>>>>>>>>>>> and
>> >>> > > > > > >>>>>>>>>>>>>>>>>>>> help me to understand your point.
>> >>> > > > > > >>>>>>>>>>>>>>>>>>>>
>> >>> > > > > > >>>>>>>>>>>>>>>>>>>>> Up until now we've seen even in TB
>> >>> > > > > > >>>>> savepoints
>> >>> > > > > > >>>>>>>> that
>> >>> > > > > > >>>>>>>>>> the
>> >>> > > > > > >>>>>>>>>>>>> number
>> >>> > > > > > >>>>>>>>>>>>>>> of
>> >>> > > > > > >>>>>>>>>>>>>>>>> keys
>> >>> > > > > > >>>>>>>>>>>>>>>>>>> can
>> >>> > > > > > >>>>>>>>>>>>>>>>>>>>> be extremely huge but not the per key
>> >>> > > > > > >>>> state
>> >>> > > > > > >>>>>>>> itself.
>> >>> > > > > > >>>>>>>>>>>>>>>>>>>>> But again, this is a good feature
>> as-is
>> >>> > > > > > >>>> and
>> >>> > > > > > >>>>>>> can
>> >>> > > > > > >>>>>>>> be
>> >>> > > > > > >>>>>>>>>>>> handled
>> >>> > > > > > >>>>>>>>>>>>>> in a
>> >>> > > > > > >>>>>>>>>>>>>>>>>>> separate
>> >>> > > > > > >>>>>>>>>>>>>>>>>>>>> jira.
>> >>> > > > > > >>>>>>>>>>>>>>>>>>>>
>> >>> > > > > > >>>>>>>>>>>>>>>>>>>> I've just created
>> >>> > > > > > >>>>>>>>>>>>>>>>
>> >>> > > > > > >> https://issues.apache.org/jira/browse/FLINK-37456.
>> >>> > > > > > >>>>>>>>>>>>>>>>>>>>
>> >>> > > > > > >>>>>>>>>>>>>>>>>>>> [1]
>> >>> > > > > > >>>>>>>>>>>>>>
>> >>> > > > > > >>>>>
>> >>> https://www.postgresql.org/docs/current/view-pg-tables.html
>> >>> > > > > > >>>>>>>>>>>>>>>>>>>> [2]
>> >>> > > > > > >>>>>>>>>>>>>>>>>>>>
>> >>> > > > > > >>>>>>>>>>>>>>>>>>>>
>> >>> > > > > > >>>>>>>>>>>>>>>>>>>
>> >>> > > > > > >>>>>>>>>>>>>>>>>>
>> >>> > > > > > >>>>>>>>>>>>>>>>>
>> >>> > > > > > >>>>>>>>>>>>>>>>
>> >>> > > > > > >>>>>>>>>>>>>>>
>> >>> > > > > > >>>>>>>>>>>>>>
>> >>> > > > > > >>>>>>>>>>>>>
>> >>> > > > > > >>>>>>>>>>>>
>> >>> > > > > > >>>>>>>>>>>
>> >>> > > > > > >>>>>>>>>>
>> >>> > > > > > >>>>>>>>>
>> >>> > > > > > >>>>>>>>
>> >>> > > > > > >>>>>>>
>> >>> > > > > > >>>>>
>> >>> > > > > > >>>>
>> >>> > > > > > >>
>> >>> > > > > >
>> >>> > > > >
>> >>> > > >
>> >>> > >
>> >>> >
>> >>>
>> https://www.databricks.com/blog/announcing-state-reader-api-new-statestore-data-source
>> >>> > > > > > >>>>>>>>>>>>>>>>>>>>
>> >>> > > > > > >>>>>>>>>>>>>>>>>>>> BR,
>> >>> > > > > > >>>>>>>>>>>>>>>>>>>> G
>> >>> > > > > > >>>>>>>>>>>>>>>>>>>>
>> >>> > > > > > >>>>>>>>>>>>>>>>>>>>
>> >>> > > > > > >>>>>>>>>>>>>>>>>>>> On Tue, Mar 11, 2025 at 3:55 AM
>> Shengkai
>> >>> > > > > > >>>> Fang
>> >>> > > > > > >>>>> <
>> >>> > > > > > >>>>>>>>>>>>>> fskm...@gmail.com
>> >>> > > > > > >>>>>>>>>>>>>>>>
>> >>> > > > > > >>>>>>>>>>>>>>>>>> wrote:
>> >>> > > > > > >>>>>>>>>>>>>>>>>>>>
>> >>> > > > > > >>>>>>>>>>>>>>>>>>>>> Hi, Gabor. Thanks for your response.
>> >>> > > > > > >>>>>>>>>>>>>>>>>>>>>
>> >>> > > > > > >>>>>>>>>>>>>>>>>>>>>> 1. State TTL for Value Columns
>> >>> > > > > > >>>>>>>>>>>>>>>>>>>>>
>> >>> > > > > > >>>>>>>>>>>>>>>>>>>>> Thank you for addressing the
>> >>> > > > > > >> limitations
>> >>> > > > > > >>>>> here.
>> >>> > > > > > >>>>>>>>>>> However, I
>> >>> > > > > > >>>>>>>>>>>>>>> believe
>> >>> > > > > > >>>>>>>>>>>>>>>>> it
>> >>> > > > > > >>>>>>>>>>>>>>>>>>>> would
>> >>> > > > > > >>>>>>>>>>>>>>>>>>>>> be beneficial to further clarify the
>> >>> > > > > > >> API
>> >>> > > > > > >>>> in
>> >>> > > > > > >>>>>>> this
>> >>> > > > > > >>>>>>>>> FLIP
>> >>> > > > > > >>>>>>>>>>>>>> regarding
>> >>> > > > > > >>>>>>>>>>>>>>>> how
>> >>> > > > > > >>>>>>>>>>>>>>>>>>> users
>> >>> > > > > > >>>>>>>>>>>>>>>>>>>>> can specify the TTL column.
>> >>> > > > > > >>>>>>>>>>>>>>>>>>>>>
>> >>> > > > > > >>>>>>>>>>>>>>>>>>>>> One potential approach that comes to
>> >>> > > > > > >>>> mind is
>> >>> > > > > > >>>>>>>> using
>> >>> > > > > > >>>>>>>>> a
>> >>> > > > > > >>>>>>>>>>>>>>> standardized
>> >>> > > > > > >>>>>>>>>>>>>>>>>>> naming
>> >>> > > > > > >>>>>>>>>>>>>>>>>>>>> convention such as ${state-name}_ttl
>> >>> > > > > > >> for
>> >>> > > > > > >>>> the
>> >>> > > > > > >>>>>>>>> metadata
>> >>> > > > > > >>>>>>>>>>>>> column
>> >>> > > > > > >>>>>>>>>>>>>>> that
>> >>> > > > > > >>>>>>>>>>>>>>>>>>> defines
>> >>> > > > > > >>>>>>>>>>>>>>>>>>>>> the TTL value. In terms of
>> >>> > > > > > >>>> implementation,
>> >>> > > > > > >>>>> the
>> >>> > > > > > >>>>>>>>>>>>>>>> listReadableMetadata
>> >>> > > > > > >>>>>>>>>>>>>>>>>>>>> function could:
>> >>> > > > > > >>>>>>>>>>>>>>>>>>>>>
>> >>> > > > > > >>>>>>>>>>>>>>>>>>>>> 1. Read the table’s columns and
>> >>> > > > > > >>>>> configuration,
>> >>> > > > > > >>>>>>>>>>>>>>>>>>>>> 2. Extract all defined state names,
>> and
>> >>> > > > > > >>>>>>>>>>>>>>>>>>>>> 3. Return a structured list of
>> metadata
>> >>> > > > > > >>>>>>> entries
>> >>> > > > > > >>>>>>>>>>> formatted
>> >>> > > > > > >>>>>>>>>>>>> as
>> >>> > > > > > >>>>>>>>>>>>>>>>>>>>> ${state-name}_ttl.
>> >>> > > > > > >>>>>>>>>>>>>>>>>>>>>
>> >>> > > > > > >>>>>>>>>>>>>>>>>>>>> WDYT?
>> >>> > > > > > >>>>>>>>>>>>>>>>>>>>>
>> >>> > > > > > >>>>>>>>>>>>>>>>>>>>>> 2. Adding a new connector with
>> >>> > > > > > >>>>>>>>> `savepoint-metadata`
>> >>> > > > > > >>>>>>>>>>>>>>>>>>>>>
>> >>> > > > > > >>>>>>>>>>>>>>>>>>>>> Introducing a new connector type at
>> >>> > > > > > >> this
>> >>> > > > > > >>>>> stage
>> >>> > > > > > >>>>>>>> may
>> >>> > > > > > >>>>>>>>>>>>>>> unnecessarily
>> >>> > > > > > >>>>>>>>>>>>>>>>>>>> complicate
>> >>> > > > > > >>>>>>>>>>>>>>>>>>>>> the system. Given that every table
>> >>> > > > > > >>>> already
>> >>> > > > > > >>>>>>>> belongs
>> >>> > > > > > >>>>>>>>>> to a
>> >>> > > > > > >>>>>>>>>>>>>>> Catalog,
>> >>> > > > > > >>>>>>>>>>>>>>>>>> which
>> >>> > > > > > >>>>>>>>>>>>>>>>>>> is
>> >>> > > > > > >>>>>>>>>>>>>>>>>>>>> designed to provide a Factory for
>> >>> > > > > > >>>> building
>> >>> > > > > > >>>>>>> source
>> >>> > > > > > >>>>>>>>> or
>> >>> > > > > > >>>>>>>>>>> sink
>> >>> > > > > > >>>>>>>>>>>>>>>>>> connectors, I
>> >>> > > > > > >>>>>>>>>>>>>>>>>>>>> propose integrating a dedicated
>> >>> > > > > > >>>> StateCatalog
>> >>> > > > > > >>>>>>>>> instead.
>> >>> > > > > > >>>>>>>>>>>> This
>> >>> > > > > > >>>>>>>>>>>>>>>> approach
>> >>> > > > > > >>>>>>>>>>>>>>>>>>> would
>> >>> > > > > > >>>>>>>>>>>>>>>>>>>>> allow us to:
>> >>> > > > > > >>>>>>>>>>>>>>>>>>>>>
>> >>> > > > > > >>>>>>>>>>>>>>>>>>>>> 1. Leverage the Catalog’s existing
>> >>> > > > > > >>>>>>> capabilities
>> >>> > > > > > >>>>>>>> to
>> >>> > > > > > >>>>>>>>>>> manage
>> >>> > > > > > >>>>>>>>>>>>> TTL
>> >>> > > > > > >>>>>>>>>>>>>>>>>> metadata
>> >>> > > > > > >>>>>>>>>>>>>>>>>>>>> (e.g., state names and TTL logic)
>> >>> > > > > > >> without
>> >>> > > > > > >>>>>>>>> duplicating
>> >>> > > > > > >>>>>>>>>>>>>>>>> functionality.
>> >>> > > > > > >>>>>>>>>>>>>>>>>>>>> 2. Provide a unified interface for
>> >>> > > > > > >>>> connector
>> >>> > > > > > >>>>>>>>>>>> instantiation
>> >>> > > > > > >>>>>>>>>>>>>> and
>> >>> > > > > > >>>>>>>>>>>>>>>>>> metadata
>> >>> > > > > > >>>>>>>>>>>>>>>>>>>>> handling through the Catalog’s Factory
>> >>> > > > > > >>>>>>> pattern.
>> >>> > > > > > >>>>>>>>>>>>>>>>>>>>>
>> >>> > > > > > >>>>>>>>>>>>>>>>>>>>> Would this design decision better
>> align
>> >>> > > > > > >>>> with
>> >>> > > > > > >>>>>>> our
>> >>> > > > > > >>>>>>>>>>>>>> architecture’s
>> >>> > > > > > >>>>>>>>>>>>>>>>>>>>> extensibility and reduce redundancy?
>> >>> > > > > > >>>>>>>>>>>>>>>>>>>>>
>> >>> > > > > > >>>>>>>>>>>>>>>>>>>>>> Up until now we've seen even in TB
>> >>> > > > > > >>>>>>> savepoints
>> >>> > > > > > >>>>>>>>> that
>> >>> > > > > > >>>>>>>>>>> the
>> >>> > > > > > >>>>>>>>>>>>>> number
>> >>> > > > > > >>>>>>>>>>>>>>>> of
>> >>> > > > > > >>>>>>>>>>>>>>>>>> keys
>> >>> > > > > > >>>>>>>>>>>>>>>>>>>> can
>> >>> > > > > > >>>>>>>>>>>>>>>>>>>>>> be extremely huge but not the per key
>> >>> > > > > > >>>>> state
>> >>> > > > > > >>>>>>>>> itself.
>> >>> > > > > > >>>>>>>>>>>>>>>>>>>>>> But again, this is a good feature
>> >>> > > > > > >> as-is
>> >>> > > > > > >>>>> and
>> >>> > > > > > >>>>>>> can
>> >>> > > > > > >>>>>>>>> be
>> >>> > > > > > >>>>>>>>>>>>> handled
>> >>> > > > > > >>>>>>>>>>>>>>> in a
>> >>> > > > > > >>>>>>>>>>>>>>>>>>>> separate
>> >>> > > > > > >>>>>>>>>>>>>>>>>>>>>> jira.
>> >>> > > > > > >>>>>>>>>>>>>>>>>>>>>
>> >>> > > > > > >>>>>>>>>>>>>>>>>>>>> +1 for a separate jira.
>> >>> > > > > > >>>>>>>>>>>>>>>>>>>>>
>> >>> > > > > > >>>>>>>>>>>>>>>>>>>>> Best,
>> >>> > > > > > >>>>>>>>>>>>>>>>>>>>> Shengkai
>> >>> > > > > > >>>>>>>>>>>>>>>>>>>>>
>> >>> > > > > > >>>>>>>>>>>>>>>>>>>>> Gabor Somogyi <
>> >>> > > > > > >> gabor.g.somo...@gmail.com
>> >>> > > > > > >>>>>
>> >>> > > > > > >>>>>>>>>>> 于2025年3月10日周一
>> >>> > > > > > >>>>>>>>>>>>>>> 19:05写道:
>> >>> > > > > > >>>>>>>>>>>>>>>>>>>>>
>> >>> > > > > > >>>>>>>>>>>>>>>>>>>>>> Hi Shengkai,
>> >>> > > > > > >>>>>>>>>>>>>>>>>>>>>>
>> >>> > > > > > >>>>>>>>>>>>>>>>>>>>>> Please see my comments inline.
>> >>> > > > > > >>>>>>>>>>>>>>>>>>>>>>
>> >>> > > > > > >>>>>>>>>>>>>>>>>>>>>> BR,
>> >>> > > > > > >>>>>>>>>>>>>>>>>>>>>> G
>> >>> > > > > > >>>>>>>>>>>>>>>>>>>>>>
>> >>> > > > > > >>>>>>>>>>>>>>>>>>>>>>
>> >>> > > > > > >>>>>>>>>>>>>>>>>>>>>> On Mon, Mar 3, 2025 at 7:07 AM
>> >>> > > > > > >> Shengkai
>> >>> > > > > > >>>>>>> Fang <
>> >>> > > > > > >>>>>>>>>>>>>>>> fskm...@gmail.com>
>> >>> > > > > > >>>>>>>>>>>>>>>>>>>> wrote:
>> >>> > > > > > >>>>>>>>>>>>>>>>>>>>>>
>> >>> > > > > > >>>>>>>>>>>>>>>>>>>>>>> Hi, Gabor. Thanks for your the
>> >>> > > > > > >> FLIP.
>> >>> > > > > > >>>> I
>> >>> > > > > > >>>>>>> have
>> >>> > > > > > >>>>>>>>> some
>> >>> > > > > > >>>>>>>>>>>>>> questions
>> >>> > > > > > >>>>>>>>>>>>>>>>> about
>> >>> > > > > > >>>>>>>>>>>>>>>>>>> the
>> >>> > > > > > >>>>>>>>>>>>>>>>>>>>>> FLIP:
>> >>> > > > > > >>>>>>>>>>>>>>>>>>>>>>>
>> >>> > > > > > >>>>>>>>>>>>>>>>>>>>>>> 1. State TTL for Value Columns
>> >>> > > > > > >>>>>>>>>>>>>>>>>>>>>>> How can users retrieve the state
>> >>> > > > > > >> TTL
>> >>> > > > > > >>>>>>>>>> (Time-to-Live)
>> >>> > > > > > >>>>>>>>>>>> for
>> >>> > > > > > >>>>>>>>>>>>>>> each
>> >>> > > > > > >>>>>>>>>>>>>>>>>> value
>> >>> > > > > > >>>>>>>>>>>>>>>>>>>>>> column?
>> >>> > > > > > >>>>>>>>>>>>>>>>>>>>>>> From my understanding of the
>> >>> > > > > > >> current
>> >>> > > > > > >>>>>>> design,
>> >>> > > > > > >>>>>>>> it
>> >>> > > > > > >>>>>>>>>>> seems
>> >>> > > > > > >>>>>>>>>>>>>> that
>> >>> > > > > > >>>>>>>>>>>>>>>> this
>> >>> > > > > > >>>>>>>>>>>>>>>>>>>>>>> functionality is not supported.
>> >>> > > > > > >> Could
>> >>> > > > > > >>>>> you
>> >>> > > > > > >>>>>>>>> clarify
>> >>> > > > > > >>>>>>>>>>> if
>> >>> > > > > > >>>>>>>>>>>>>> there
>> >>> > > > > > >>>>>>>>>>>>>>>> are
>> >>> > > > > > >>>>>>>>>>>>>>>>>>> plans
>> >>> > > > > > >>>>>>>>>>>>>>>>>>>> to
>> >>> > > > > > >>>>>>>>>>>>>>>>>>>>>>> address this limitation?
>> >>> > > > > > >>>>>>>>>>>>>>>>>>>>>>>
>> >>> > > > > > >>>>>>>>>>>>>>>>>>>>>>
>> >>> > > > > > >>>>>>>>>>>>>>>>>>>>>> Since the state processor API is not
>> >>> > > > > > >>>> yet
>> >>> > > > > > >>>>>>>> exposing
>> >>> > > > > > >>>>>>>>>>> this
>> >>> > > > > > >>>>>>>>>>>>>>>>> information
>> >>> > > > > > >>>>>>>>>>>>>>>>>>> this
>> >>> > > > > > >>>>>>>>>>>>>>>>>>>>>> would require several steps.
>> >>> > > > > > >>>>>>>>>>>>>>>>>>>>>> First, the state processor API
>> >>> > > > > > >> support
>> >>> > > > > > >>>>>>> needs to
>> >>> > > > > > >>>>>>>>> be
>> >>> > > > > > >>>>>>>>>>>> added
>> >>> > > > > > >>>>>>>>>>>>>>> which
>> >>> > > > > > >>>>>>>>>>>>>>>>> can
>> >>> > > > > > >>>>>>>>>>>>>>>>>> be
>> >>> > > > > > >>>>>>>>>>>>>>>>>>>>> then
>> >>> > > > > > >>>>>>>>>>>>>>>>>>>>>> exposed on the SQL API.
>> >>> > > > > > >>>>>>>>>>>>>>>>>>>>>> This is definitely a future
>> >>> > > > > > >> improvement
>> >>> > > > > > >>>>>>> which
>> >>> > > > > > >>>>>>>> is
>> >>> > > > > > >>>>>>>>>>> useful
>> >>> > > > > > >>>>>>>>>>>>> and
>> >>> > > > > > >>>>>>>>>>>>>>> can
>> >>> > > > > > >>>>>>>>>>>>>>>>> be
>> >>> > > > > > >>>>>>>>>>>>>>>>>>>>> handled
>> >>> > > > > > >>>>>>>>>>>>>>>>>>>>>> in a separate jira.
>> >>> > > > > > >>>>>>>>>>>>>>>>>>>>>>
>> >>> > > > > > >>>>>>>>>>>>>>>>>>>>>>
>> >>> > > > > > >>>>>>>>>>>>>>>>>>>>>>> 2. Metadata Table vs. Metadata
>> >>> > > > > > >> Column
>> >>> > > > > > >>>>>>>>>>>>>>>>>>>>>>> The metadata information described
>> >>> > > > > > >> in
>> >>> > > > > > >>>>> the
>> >>> > > > > > >>>>>>>> FLIP
>> >>> > > > > > >>>>>>>>>>>> appears
>> >>> > > > > > >>>>>>>>>>>>> to
>> >>> > > > > > >>>>>>>>>>>>>>> be
>> >>> > > > > > >>>>>>>>>>>>>>>>>>> intended
>> >>> > > > > > >>>>>>>>>>>>>>>>>>>>> to
>> >>> > > > > > >>>>>>>>>>>>>>>>>>>>>>> describe the state files stored at
>> >>> > > > > > >> a
>> >>> > > > > > >>>>>>> specific
>> >>> > > > > > >>>>>>>>>>>> location.
>> >>> > > > > > >>>>>>>>>>>>>> To
>> >>> > > > > > >>>>>>>>>>>>>>>> me,
>> >>> > > > > > >>>>>>>>>>>>>>>>>> this
>> >>> > > > > > >>>>>>>>>>>>>>>>>>>>>> concept
>> >>> > > > > > >>>>>>>>>>>>>>>>>>>>>>> aligns more closely with system
>> >>> > > > > > >>>> tables
>> >>> > > > > > >>>>>>> like
>> >>> > > > > > >>>>>>>>>>> pg_tables
>> >>> > > > > > >>>>>>>>>>>>> in
>> >>> > > > > > >>>>>>>>>>>>>>>>>> PostgreSQL
>> >>> > > > > > >>>>>>>>>>>>>>>>>>>> [1]
>> >>> > > > > > >>>>>>>>>>>>>>>>>>>>>> or
>> >>> > > > > > >>>>>>>>>>>>>>>>>>>>>>> the INFORMATION_SCHEMA in MySQL
>> >>> > > > > > >> [2].
>> >>> > > > > > >>>>>>>>>>>>>>>>>>>>>>>
>> >>> > > > > > >>>>>>>>>>>>>>>>>>>>>>
>> >>> > > > > > >>>>>>>>>>>>>>>>>>>>>> Adding a new connector with
>> >>> > > > > > >>>>>>>> `savepoint-metadata`
>> >>> > > > > > >>>>>>>>>> is a
>> >>> > > > > > >>>>>>>>>>>>>>>> possibility
>> >>> > > > > > >>>>>>>>>>>>>>>>>>> where
>> >>> > > > > > >>>>>>>>>>>>>>>>>>>>> we
>> >>> > > > > > >>>>>>>>>>>>>>>>>>>>>> can create such functionality.
>> >>> > > > > > >>>>>>>>>>>>>>>>>>>>>> I'm not against that, just want to
>> >>> > > > > > >>>> have a
>> >>> > > > > > >>>>>>>> common
>> >>> > > > > > >>>>>>>>>>>>> agreement
>> >>> > > > > > >>>>>>>>>>>>>>> that
>> >>> > > > > > >>>>>>>>>>>>>>>>> we
>> >>> > > > > > >>>>>>>>>>>>>>>>>>>> would
>> >>> > > > > > >>>>>>>>>>>>>>>>>>>>>> like to move that direction.
>> >>> > > > > > >>>>>>>>>>>>>>>>>>>>>> (As a side note not just PG but Spark
>> >>> > > > > > >>>> also
>> >>> > > > > > >>>>>>> has
>> >>> > > > > > >>>>>>>>>>> similar
>> >>> > > > > > >>>>>>>>>>>>>>> approach
>> >>> > > > > > >>>>>>>>>>>>>>>>>> and I
>> >>> > > > > > >>>>>>>>>>>>>>>>>>>>>> basically like the idea).
>> >>> > > > > > >>>>>>>>>>>>>>>>>>>>>> If we would go that direction
>> >>> > > > > > >> savepoint
>> >>> > > > > > >>>>>>>> metadata
>> >>> > > > > > >>>>>>>>>> can
>> >>> > > > > > >>>>>>>>>>> be
>> >>> > > > > > >>>>>>>>>>>>>>> reached
>> >>> > > > > > >>>>>>>>>>>>>>>>> in
>> >>> > > > > > >>>>>>>>>>>>>>>>>> a
>> >>> > > > > > >>>>>>>>>>>>>>>>>>>> way
>> >>> > > > > > >>>>>>>>>>>>>>>>>>>>>> that one row would represent
>> >>> > > > > > >>>>>>>>>>>>>>>>>>>>>> an operator with it's values
>> >>> > > > > > >> something
>> >>> > > > > > >>>>> like
>> >>> > > > > > >>>>>>>> this:
>> >>> > > > > > >>>>>>>>>>>>>>>>>>>>>>
>> >>> > > > > > >>>>>>>>>>>>>>>>>>>>>>
>> >>> > > > > > >>>>>>>>>>>>>>>>>>>>>>
>> >>> > > > > > >>>>>>>>>>>>>>>>>>>>>
>> >>> > > > > > >>>>>>>>>>>>>>>>>>>>
>> >>> > > > > > >>>>>>>>>>>>>>>>>>>
>> >>> > > > > > >>>>>>>>>>>>>>>>>>
>> >>> > > > > > >>>>>>>>>>>>>>>>>
>> >>> > > > > > >>>>>>>>>>>>>>>>
>> >>> > > > > > >>>>>>>>>>>>>>>
>> >>> > > > > > >>>>>>>>>>>>>>
>> >>> > > > > > >>>>>>>>>>>>>
>> >>> > > > > > >>>>>>>>>>>>
>> >>> > > > > > >>>>>>>>>>>
>> >>> > > > > > >>>>>>>>>>
>> >>> > > > > > >>>>>>>>>
>> >>> > > > > > >>>>>>>>
>> >>> > > > > > >>>>>>>
>> >>> > > > > > >>>>>
>> >>> > > > > > >>>>
>> >>> > > > > > >>
>> >>> > > > > >
>> >>> > > > >
>> >>> > > >
>> >>> > >
>> >>> >
>> >>>
>> ┌─────────┬─────────┬─────────┬─────────┬─────────┬─────────┬─────────┬────────┐
>> >>> > > > > > >>>>>>>>>>>>>>>>>>>>>>
>> >>> > > > > > >>>>>>>>>>>>>>>>>>>>>>
>> >>> > > > > > >>>>>>>>>>>>>>>>>>>>>
>> >>> > > > > > >>>>>>>>>>>>>>>>>>>>
>> >>> > > > > > >>>>>>>>>>>>>>>>>>>
>> >>> > > > > > >>>>>>>>>>>>>>>>>>
>> >>> > > > > > >>>>>>>>>>>>>>>>>
>> >>> > > > > > >>>>>>>>>>>>>>>>
>> >>> > > > > > >>>>>>>>>>>>>>>
>> >>> > > > > > >>>>>>>>>>>>>>
>> >>> > > > > > >>>>>>>>>>>>>
>> >>> > > > > > >>>>>>>>>>>>
>> >>> > > > > > >>>>>>>>>>>
>> >>> > > > > > >>>>>>>>>>
>> >>> > > > > > >>>>>>>>>
>> >>> > > > > > >>>>>>>>
>> >>> > > > > > >>>>>>>
>> >>> > > > > > >>>>>
>> >>> > > > > > >>>>
>> >>> > > > > > >>
>> >>> > > > > >
>> >>> > > > >
>> >>> > > >
>> >>> > >
>> >>> >
>> >>>
>> │operatorN│operatorU│operatorH│paralleli│maxParall│subtaskSt│coordinat│totalSta│
>> >>> > > > > > >>>>>>>>>>>>>>>>>>>>>> │ame      │id       │ash      │sm
>> >>> > > > > > >>>>>>> │elism
>> >>> > > > > > >>>>>>>>>>>>>>>>>>>>>> │atesCount│orStateSi│tesSizeI│
>> >>> > > > > > >>>>>>>>>>>>>>>>>>>>>> │         │         │         │
>> >>> > > > > > >>>> │
>> >>> > > > > > >>>>>>>>> │
>> >>> > > > > > >>>>>>>>>>>>>>>>>>>>>> │zeInBytes│nBytes  │
>> >>> > > > > > >>>>>>>>>>>>>>>>>>>>>>
>> >>> > > > > > >>>>>>>>>>>>>>>>>>>>>>
>> >>> > > > > > >>>>>>>>>>>>>>>>>>>>>
>> >>> > > > > > >>>>>>>>>>>>>>>>>>>>
>> >>> > > > > > >>>>>>>>>>>>>>>>>>>
>> >>> > > > > > >>>>>>>>>>>>>>>>>>
>> >>> > > > > > >>>>>>>>>>>>>>>>>
>> >>> > > > > > >>>>>>>>>>>>>>>>
>> >>> > > > > > >>>>>>>>>>>>>>>
>> >>> > > > > > >>>>>>>>>>>>>>
>> >>> > > > > > >>>>>>>>>>>>>
>> >>> > > > > > >>>>>>>>>>>>
>> >>> > > > > > >>>>>>>>>>>
>> >>> > > > > > >>>>>>>>>>
>> >>> > > > > > >>>>>>>>>
>> >>> > > > > > >>>>>>>>
>> >>> > > > > > >>>>>>>
>> >>> > > > > > >>>>>
>> >>> > > > > > >>>>
>> >>> > > > > > >>
>> >>> > > > > >
>> >>> > > > >
>> >>> > > >
>> >>> > >
>> >>> >
>> >>>
>> ├─────────┼─────────┼─────────┼─────────┼─────────┼─────────┼─────────┼────────┤
>> >>> > > > > > >>>>>>>>>>>>>>>>>>>>>> │Source:  │datagen-s│47aee9439│2
>> >>> > > > > > >>>>> │128
>> >>> > > > > > >>>>>>>>>> │2
>> >>> > > > > > >>>>>>>>>>>>>>> │16
>> >>> > > > > > >>>>>>>>>>>>>>>>>>>>>> │546     │
>> >>> > > > > > >>>>>>>>>>>>>>>>>>>>>> │datagen-s│ource-uid│4d6ea26e2│
>> >>> > > > > > >>>> │
>> >>> > > > > > >>>>>>>>> │
>> >>> > > > > > >>>>>>>>>>>>>> │
>> >>> > > > > > >>>>>>>>>>>>>>>>>>> │
>> >>> > > > > > >>>>>>>>>>>>>>>>>>>>>>    │
>> >>> > > > > > >>>>>>>>>>>>>>>>>>>>>> │ource    │         │d544bef0a│
>> >>> > > > > > >>>> │
>> >>> > > > > > >>>>>>>>> │
>> >>> > > > > > >>>>>>>>>>>>>> │
>> >>> > > > > > >>>>>>>>>>>>>>>>>>> │
>> >>> > > > > > >>>>>>>>>>>>>>>>>>>>>>    │
>> >>> > > > > > >>>>>>>>>>>>>>>>>>>>>> │         │         │37bb5    │
>> >>> > > > > > >>>> │
>> >>> > > > > > >>>>>>>>> │
>> >>> > > > > > >>>>>>>>>>>>>> │
>> >>> > > > > > >>>>>>>>>>>>>>>>>>> │
>> >>> > > > > > >>>>>>>>>>>>>>>>>>>>>>    │
>> >>> > > > > > >>>>>>>>>>>>>>>>>>>>>>
>> >>> > > > > > >>>>>>>>>>>>>>>>>>>>>>
>> >>> > > > > > >>>>>>>>>>>>>>>>>>>>>
>> >>> > > > > > >>>>>>>>>>>>>>>>>>>>
>> >>> > > > > > >>>>>>>>>>>>>>>>>>>
>> >>> > > > > > >>>>>>>>>>>>>>>>>>
>> >>> > > > > > >>>>>>>>>>>>>>>>>
>> >>> > > > > > >>>>>>>>>>>>>>>>
>> >>> > > > > > >>>>>>>>>>>>>>>
>> >>> > > > > > >>>>>>>>>>>>>>
>> >>> > > > > > >>>>>>>>>>>>>
>> >>> > > > > > >>>>>>>>>>>>
>> >>> > > > > > >>>>>>>>>>>
>> >>> > > > > > >>>>>>>>>>
>> >>> > > > > > >>>>>>>>>
>> >>> > > > > > >>>>>>>>
>> >>> > > > > > >>>>>>>
>> >>> > > > > > >>>>>
>> >>> > > > > > >>>>
>> >>> > > > > > >>
>> >>> > > > > >
>> >>> > > > >
>> >>> > > >
>> >>> > >
>> >>> >
>> >>>
>> ├─────────┼─────────┼─────────┼─────────┼─────────┼─────────┼─────────┼────────┤
>> >>> > > > > > >>>>>>>>>>>>>>>>>>>>>> │long-udf-│long-udf-│6ed3f40bf│2
>> >>> > > > > > >>>>> │128
>> >>> > > > > > >>>>>>>>>> │2
>> >>> > > > > > >>>>>>>>>>>>>>> │0
>> >>> > > > > > >>>>>>>>>>>>>>>>>>>> │0
>> >>> > > > > > >>>>>>>>>>>>>>>>>>>>>>     │
>> >>> > > > > > >>>>>>>>>>>>>>>>>>>>>> │with-mast│with-mast│f3c8dfcdf│
>> >>> > > > > > >>>> │
>> >>> > > > > > >>>>>>>>> │
>> >>> > > > > > >>>>>>>>>>>>>> │
>> >>> > > > > > >>>>>>>>>>>>>>>>>>> │
>> >>> > > > > > >>>>>>>>>>>>>>>>>>>>>>    │
>> >>> > > > > > >>>>>>>>>>>>>>>>>>>>>> │er-hook  │er-hook-u│cb95128a1│
>> >>> > > > > > >>>> │
>> >>> > > > > > >>>>>>>>> │
>> >>> > > > > > >>>>>>>>>>>>>> │
>> >>> > > > > > >>>>>>>>>>>>>>>>>>> │
>> >>> > > > > > >>>>>>>>>>>>>>>>>>>>>>    │
>> >>> > > > > > >>>>>>>>>>>>>>>>>>>>>> │         │id       │018f1    │
>> >>> > > > > > >>>> │
>> >>> > > > > > >>>>>>>>> │
>> >>> > > > > > >>>>>>>>>>>>>> │
>> >>> > > > > > >>>>>>>>>>>>>>>>>>> │
>> >>> > > > > > >>>>>>>>>>>>>>>>>>>>>>    │
>> >>> > > > > > >>>>>>>>>>>>>>>>>>>>>>
>> >>> > > > > > >>>>>>>>>>>>>>>>>>>>>>
>> >>> > > > > > >>>>>>>>>>>>>>>>>>>>>
>> >>> > > > > > >>>>>>>>>>>>>>>>>>>>
>> >>> > > > > > >>>>>>>>>>>>>>>>>>>
>> >>> > > > > > >>>>>>>>>>>>>>>>>>
>> >>> > > > > > >>>>>>>>>>>>>>>>>
>> >>> > > > > > >>>>>>>>>>>>>>>>
>> >>> > > > > > >>>>>>>>>>>>>>>
>> >>> > > > > > >>>>>>>>>>>>>>
>> >>> > > > > > >>>>>>>>>>>>>
>> >>> > > > > > >>>>>>>>>>>>
>> >>> > > > > > >>>>>>>>>>>
>> >>> > > > > > >>>>>>>>>>
>> >>> > > > > > >>>>>>>>>
>> >>> > > > > > >>>>>>>>
>> >>> > > > > > >>>>>>>
>> >>> > > > > > >>>>>
>> >>> > > > > > >>>>
>> >>> > > > > > >>
>> >>> > > > > >
>> >>> > > > >
>> >>> > > >
>> >>> > >
>> >>> >
>> >>>
>> ├─────────┼─────────┼─────────┼─────────┼─────────┼─────────┼─────────┼────────┤
>> >>> > > > > > >>>>>>>>>>>>>>>>>>>>>> │value-pro│value-pro│ca4f5fe9a│2
>> >>> > > > > > >>>>> │128
>> >>> > > > > > >>>>>>>>>> │2
>> >>> > > > > > >>>>>>>>>>>>>>> │0
>> >>> > > > > > >>>>>>>>>>>>>>>>>>>>>> │40726   │
>> >>> > > > > > >>>>>>>>>>>>>>>>>>>>>> │cess     │cess-uid │637b656f0│
>> >>> > > > > > >>>> │
>> >>> > > > > > >>>>>>>>> │
>> >>> > > > > > >>>>>>>>>>>>>> │
>> >>> > > > > > >>>>>>>>>>>>>>>>>>> │
>> >>> > > > > > >>>>>>>>>>>>>>>>>>>>>>    │
>> >>> > > > > > >>>>>>>>>>>>>>>>>>>>>> │         │         │9ea78b3e7│
>> >>> > > > > > >>>> │
>> >>> > > > > > >>>>>>>>> │
>> >>> > > > > > >>>>>>>>>>>>>> │
>> >>> > > > > > >>>>>>>>>>>>>>>>>>> │
>> >>> > > > > > >>>>>>>>>>>>>>>>>>>>>>    │
>> >>> > > > > > >>>>>>>>>>>>>>>>>>>>>> │         │         │a15b9    │
>> >>> > > > > > >>>> │
>> >>> > > > > > >>>>>>>>> │
>> >>> > > > > > >>>>>>>>>>>>>> │
>> >>> > > > > > >>>>>>>>>>>>>>>>>>> │
>> >>> > > > > > >>>>>>>>>>>>>>>>>>>>>>    │
>> >>> > > > > > >>>>>>>>>>>>>>>>>>>>>>
>> >>> > > > > > >>>>>>>>>>>>>>>>>>>>>>
>> >>> > > > > > >>>>>>>>>>>>>>>>>>>>>
>> >>> > > > > > >>>>>>>>>>>>>>>>>>>>
>> >>> > > > > > >>>>>>>>>>>>>>>>>>>
>> >>> > > > > > >>>>>>>>>>>>>>>>>>
>> >>> > > > > > >>>>>>>>>>>>>>>>>
>> >>> > > > > > >>>>>>>>>>>>>>>>
>> >>> > > > > > >>>>>>>>>>>>>>>
>> >>> > > > > > >>>>>>>>>>>>>>
>> >>> > > > > > >>>>>>>>>>>>>
>> >>> > > > > > >>>>>>>>>>>>
>> >>> > > > > > >>>>>>>>>>>
>> >>> > > > > > >>>>>>>>>>
>> >>> > > > > > >>>>>>>>>
>> >>> > > > > > >>>>>>>>
>> >>> > > > > > >>>>>>>
>> >>> > > > > > >>>>>
>> >>> > > > > > >>>>
>> >>> > > > > > >>
>> >>> > > > > >
>> >>> > > > >
>> >>> > > >
>> >>> > >
>> >>> >
>> >>>
>> ├─────────┼─────────┼─────────┼─────────┼─────────┼─────────┼─────────┼────────┤
>> >>> > > > > > >>>>>>>>>>>>>>>>>>>>>>
>> >>> > > > > > >>>>>>>>>>>>>>>>>>>>>> This table can then be joined with
>> >>> > > > > > >> the
>> >>> > > > > > >>>>>>> actually
>> >>> > > > > > >>>>>>>>>>>> existing
>> >>> > > > > > >>>>>>>>>>>>>>>>>> `savepoint`
>> >>> > > > > > >>>>>>>>>>>>>>>>>>>>>> connector created tables based on UID
>> >>> > > > > > >>>> hash
>> >>> > > > > > >>>>>>>> (which
>> >>> > > > > > >>>>>>>>>> is
>> >>> > > > > > >>>>>>>>>>>>> unique
>> >>> > > > > > >>>>>>>>>>>>>>> and
>> >>> > > > > > >>>>>>>>>>>>>>>>>>> always
>> >>> > > > > > >>>>>>>>>>>>>>>>>>>>>> exists).
>> >>> > > > > > >>>>>>>>>>>>>>>>>>>>>> This would mean that the already
>> >>> > > > > > >>>> existing
>> >>> > > > > > >>>>>>> table
>> >>> > > > > > >>>>>>>>>> would
>> >>> > > > > > >>>>>>>>>>>>> need
>> >>> > > > > > >>>>>>>>>>>>>>>> only a
>> >>> > > > > > >>>>>>>>>>>>>>>>>>>> single
>> >>> > > > > > >>>>>>>>>>>>>>>>>>>>>> metadata column which is the UID
>> >>> > > > > > >> hash.
>> >>> > > > > > >>>>>>>>>>>>>>>>>>>>>> WDYT?
>> >>> > > > > > >>>>>>>>>>>>>>>>>>>>>> @zakelly, plz share your thoughts
>> >>> > > > > > >> too.
>> >>> > > > > > >>>>>>>>>>>>>>>>>>>>>>
>> >>> > > > > > >>>>>>>>>>>>>>>>>>>>>>
>> >>> > > > > > >>>>>>>>>>>>>>>>>>>>>>> If we opt to use metadata columns,
>> >>> > > > > > >>>> every
>> >>> > > > > > >>>>>>>> record
>> >>> > > > > > >>>>>>>>>> in
>> >>> > > > > > >>>>>>>>>>>> the
>> >>> > > > > > >>>>>>>>>>>>>>> table
>> >>> > > > > > >>>>>>>>>>>>>>>>>> would
>> >>> > > > > > >>>>>>>>>>>>>>>>>>>> end
>> >>> > > > > > >>>>>>>>>>>>>>>>>>>>> up
>> >>> > > > > > >>>>>>>>>>>>>>>>>>>>>>> having identical values for these
>> >>> > > > > > >>>>> columns
>> >>> > > > > > >>>>>>>>> (please
>> >>> > > > > > >>>>>>>>>>>>> correct
>> >>> > > > > > >>>>>>>>>>>>>>> me
>> >>> > > > > > >>>>>>>>>>>>>>>> if
>> >>> > > > > > >>>>>>>>>>>>>>>>>> I’m
>> >>> > > > > > >>>>>>>>>>>>>>>>>>>>>>> mistaken). On the other hand, the
>> >>> > > > > > >>>> state
>> >>> > > > > > >>>>>>>>> connector
>> >>> > > > > > >>>>>>>>>>>>>> requires
>> >>> > > > > > >>>>>>>>>>>>>>>>> users
>> >>> > > > > > >>>>>>>>>>>>>>>>>> to
>> >>> > > > > > >>>>>>>>>>>>>>>>>>>>>> specify
>> >>> > > > > > >>>>>>>>>>>>>>>>>>>>>>> an operator UID or operator UID
>> >>> > > > > > >> hash,
>> >>> > > > > > >>>>>>> after
>> >>> > > > > > >>>>>>>>> which
>> >>> > > > > > >>>>>>>>>>> it
>> >>> > > > > > >>>>>>>>>>>>>>> outputs
>> >>> > > > > > >>>>>>>>>>>>>>>>>>>>> user-defined
>> >>> > > > > > >>>>>>>>>>>>>>>>>>>>>>> values in its records. This
>> >>> > > > > > >> approach
>> >>> > > > > > >>>>> feels
>> >>> > > > > > >>>>>>>>>> somewhat
>> >>> > > > > > >>>>>>>>>>>>>>> redundant
>> >>> > > > > > >>>>>>>>>>>>>>>>> to
>> >>> > > > > > >>>>>>>>>>>>>>>>>>> me.
>> >>> > > > > > >>>>>>>>>>>>>>>>>>>>>>>
>> >>> > > > > > >>>>>>>>>>>>>>>>>>>>>>
>> >>> > > > > > >>>>>>>>>>>>>>>>>>>>>> If we would add a new
>> >>> > > > > > >>>> `savepoint-metadata`
>> >>> > > > > > >>>>>>>>>> connector
>> >>> > > > > > >>>>>>>>>>>> then
>> >>> > > > > > >>>>>>>>>>>>>>> this
>> >>> > > > > > >>>>>>>>>>>>>>>>> can
>> >>> > > > > > >>>>>>>>>>>>>>>>>> be
>> >>> > > > > > >>>>>>>>>>>>>>>>>>>>>> addressed.
>> >>> > > > > > >>>>>>>>>>>>>>>>>>>>>> On the other hand UID and UID hash
>> >>> > > > > > >> are
>> >>> > > > > > >>>>>>> having
>> >>> > > > > > >>>>>>>>>>> either-or
>> >>> > > > > > >>>>>>>>>>>>>>>>>> relationship
>> >>> > > > > > >>>>>>>>>>>>>>>>>>>> from
>> >>> > > > > > >>>>>>>>>>>>>>>>>>>>>> config perspective,
>> >>> > > > > > >>>>>>>>>>>>>>>>>>>>>> so when a user provides the UID then
>> >>> > > > > > >>>>> he/she
>> >>> > > > > > >>>>>>> can
>> >>> > > > > > >>>>>>>>> be
>> >>> > > > > > >>>>>>>>>>>>>> interested
>> >>> > > > > > >>>>>>>>>>>>>>>> in
>> >>> > > > > > >>>>>>>>>>>>>>>>>> the
>> >>> > > > > > >>>>>>>>>>>>>>>>>>>> hash
>> >>> > > > > > >>>>>>>>>>>>>>>>>>>>>> for further calculations
>> >>> > > > > > >>>>>>>>>>>>>>>>>>>>>> (the whole Flink internals are
>> >>> > > > > > >>>> depending
>> >>> > > > > > >>>>> on
>> >>> > > > > > >>>>>>> the
>> >>> > > > > > >>>>>>>>>>> hash).
>> >>> > > > > > >>>>>>>>>>>>>>> Printing
>> >>> > > > > > >>>>>>>>>>>>>>>>> out
>> >>> > > > > > >>>>>>>>>>>>>>>>>>> the
>> >>> > > > > > >>>>>>>>>>>>>>>>>>>>>> human readable UID
>> >>> > > > > > >>>>>>>>>>>>>>>>>>>>>> is an explicit requirement from the
>> >>> > > > > > >>>> user
>> >>> > > > > > >>>>>>> side
>> >>> > > > > > >>>>>>>>>> because
>> >>> > > > > > >>>>>>>>>>>>>> hashes
>> >>> > > > > > >>>>>>>>>>>>>>>> are
>> >>> > > > > > >>>>>>>>>>>>>>>>>> not
>> >>> > > > > > >>>>>>>>>>>>>>>>>>>>> human
>> >>> > > > > > >>>>>>>>>>>>>>>>>>>>>> readable.
>> >>> > > > > > >>>>>>>>>>>>>>>>>>>>>>
>> >>> > > > > > >>>>>>>>>>>>>>>>>>>>>>
>> >>> > > > > > >>>>>>>>>>>>>>>>>>>>>>> 3. Handling LIST and MAP States in
>> >>> > > > > > >>>> the
>> >>> > > > > > >>>>>>> State
>> >>> > > > > > >>>>>>>>>>>> Connector
>> >>> > > > > > >>>>>>>>>>>>>>>>>>>>>>> I have concerns about how the
>> >>> > > > > > >> current
>> >>> > > > > > >>>>>>> design
>> >>> > > > > > >>>>>>>>>>> handles
>> >>> > > > > > >>>>>>>>>>>>> LIST
>> >>> > > > > > >>>>>>>>>>>>>>> and
>> >>> > > > > > >>>>>>>>>>>>>>>>> MAP
>> >>> > > > > > >>>>>>>>>>>>>>>>>>>>> states.
>> >>> > > > > > >>>>>>>>>>>>>>>>>>>>>>> Specifically, the state connector
>> >>> > > > > > >>>> uses
>> >>> > > > > > >>>>>>> Flink
>> >>> > > > > > >>>>>>>>>> SQL’s
>> >>> > > > > > >>>>>>>>>>>> MAP
>> >>> > > > > > >>>>>>>>>>>>>> and
>> >>> > > > > > >>>>>>>>>>>>>>>>> ARRAY
>> >>> > > > > > >>>>>>>>>>>>>>>>>>>> types,
>> >>> > > > > > >>>>>>>>>>>>>>>>>>>>>>> which implies that it attempts to
>> >>> > > > > > >>>> load
>> >>> > > > > > >>>>>>> entire
>> >>> > > > > > >>>>>>>>> MAP
>> >>> > > > > > >>>>>>>>>>> or
>> >>> > > > > > >>>>>>>>>>>>> LIST
>> >>> > > > > > >>>>>>>>>>>>>>>>> states
>> >>> > > > > > >>>>>>>>>>>>>>>>>>> into
>> >>> > > > > > >>>>>>>>>>>>>>>>>>>>>>> memory.
>> >>> > > > > > >>>>>>>>>>>>>>>>>>>>>>>
>> >>> > > > > > >>>>>>>>>>>>>>>>>>>>>>> However, in many real-world
>> >>> > > > > > >>>> scenarios,
>> >>> > > > > > >>>>>>> these
>> >>> > > > > > >>>>>>>>>> states
>> >>> > > > > > >>>>>>>>>>>> can
>> >>> > > > > > >>>>>>>>>>>>>>> grow
>> >>> > > > > > >>>>>>>>>>>>>>>>> very
>> >>> > > > > > >>>>>>>>>>>>>>>>>>>>> large.
>> >>> > > > > > >>>>>>>>>>>>>>>>>>>>>>> Typically, the state API addresses
>> >>> > > > > > >>>> this
>> >>> > > > > > >>>>> by
>> >>> > > > > > >>>>>>>>>>> providing
>> >>> > > > > > >>>>>>>>>>>> an
>> >>> > > > > > >>>>>>>>>>>>>>>>> iterator
>> >>> > > > > > >>>>>>>>>>>>>>>>>> to
>> >>> > > > > > >>>>>>>>>>>>>>>>>>>>>>> traverse elements within the state
>> >>> > > > > > >>>>>>>>> incrementally.
>> >>> > > > > > >>>>>>>>>>> I’m
>> >>> > > > > > >>>>>>>>>>>>>>> unsure
>> >>> > > > > > >>>>>>>>>>>>>>>>>>> whether
>> >>> > > > > > >>>>>>>>>>>>>>>>>>>>> I’ve
>> >>> > > > > > >>>>>>>>>>>>>>>>>>>>>>> missed something in FLIP-496 or
>> >>> > > > > > >>>>> FLIP-512,
>> >>> > > > > > >>>>>>> but
>> >>> > > > > > >>>>>>>>> it
>> >>> > > > > > >>>>>>>>>>>> seems
>> >>> > > > > > >>>>>>>>>>>>>> that
>> >>> > > > > > >>>>>>>>>>>>>>>> the
>> >>> > > > > > >>>>>>>>>>>>>>>>>>>> current
>> >>> > > > > > >>>>>>>>>>>>>>>>>>>>>>> design might struggle with
>> >>> > > > > > >>>> scalability
>> >>> > > > > > >>>>> in
>> >>> > > > > > >>>>>>>> such
>> >>> > > > > > >>>>>>>>>>> cases.
>> >>> > > > > > >>>>>>>>>>>>>>>>>>>>>>>
>> >>> > > > > > >>>>>>>>>>>>>>>>>>>>>>
>> >>> > > > > > >>>>>>>>>>>>>>>>>>>>>> You see it good, the current
>> >>> > > > > > >>>>> implementation
>> >>> > > > > > >>>>>>>> keeps
>> >>> > > > > > >>>>>>>>>>> state
>> >>> > > > > > >>>>>>>>>>>>>> for a
>> >>> > > > > > >>>>>>>>>>>>>>>>>> single
>> >>> > > > > > >>>>>>>>>>>>>>>>>>>> key
>> >>> > > > > > >>>>>>>>>>>>>>>>>>>>> in
>> >>> > > > > > >>>>>>>>>>>>>>>>>>>>>> memory.
>> >>> > > > > > >>>>>>>>>>>>>>>>>>>>>> Back in the days we've considered
>> >>> > > > > > >> this
>> >>> > > > > > >>>>>>>> potential
>> >>> > > > > > >>>>>>>>>>> issue
>> >>> > > > > > >>>>>>>>>>>>> and
>> >>> > > > > > >>>>>>>>>>>>>>>>>> concluded
>> >>> > > > > > >>>>>>>>>>>>>>>>>>>> that
>> >>> > > > > > >>>>>>>>>>>>>>>>>>>>>> this is not necessarily
>> >>> > > > > > >>>>>>>>>>>>>>>>>>>>>> needed for the initial version and
>> >>> > > > > > >> can
>> >>> > > > > > >>>> be
>> >>> > > > > > >>>>>>> done
>> >>> > > > > > >>>>>>>>> as a
>> >>> > > > > > >>>>>>>>>>>> later
>> >>> > > > > > >>>>>>>>>>>>>>>>>>> improvement.
>> >>> > > > > > >>>>>>>>>>>>>>>>>>>>>>
>> >>> > > > > > >>>>>>>>>>>>>>>>>>>>>> Up until now we've seen even in TB
>> >>> > > > > > >>>>>>> savepoints
>> >>> > > > > > >>>>>>>>> that
>> >>> > > > > > >>>>>>>>>>> the
>> >>> > > > > > >>>>>>>>>>>>>> number
>> >>> > > > > > >>>>>>>>>>>>>>>> of
>> >>> > > > > > >>>>>>>>>>>>>>>>>> keys
>> >>> > > > > > >>>>>>>>>>>>>>>>>>>> can
>> >>> > > > > > >>>>>>>>>>>>>>>>>>>>>> be extremely huge but not the per key
>> >>> > > > > > >>>>> state
>> >>> > > > > > >>>>>>>>> itself.
>> >>> > > > > > >>>>>>>>>>>>>>>>>>>>>> But again, this is a good feature
>> >>> > > > > > >> as-is
>> >>> > > > > > >>>>> and
>> >>> > > > > > >>>>>>> can
>> >>> > > > > > >>>>>>>>> be
>> >>> > > > > > >>>>>>>>>>>>> handled
>> >>> > > > > > >>>>>>>>>>>>>>> in a
>> >>> > > > > > >>>>>>>>>>>>>>>>>>>> separate
>> >>> > > > > > >>>>>>>>>>>>>>>>>>>>>> jira.
>> >>> > > > > > >>>>>>>>>>>>>>>>>>>>>>
>> >>> > > > > > >>>>>>>>>>>>>>>>>>>>>>
>> >>> > > > > > >>>>>>>>>>>>>>>>>>>>>>>
>> >>> > > > > > >>>>>>>>>>>>>>>>>>>>>>> Best,
>> >>> > > > > > >>>>>>>>>>>>>>>>>>>>>>> Shengkai
>> >>> > > > > > >>>>>>>>>>>>>>>>>>>>>>>
>> >>> > > > > > >>>>>>>>>>>>>>>>>>>>>>> [1]
>> >>> > > > > > >>>>>>>>>>>>>>>>>
>> >>> > > > > > >>>>>>>>>
>> >>> > > https://www.postgresql.org/docs/current/view-pg-tables.html
>> >>> > > > > > >>>>>>>>>>>>>>>>>>>>>>> [2]
>> >>> > > > > > >>>>>>>>>>>>>>>>>>>>>>>
>> >>> > > > > > >>>>>>>>>>>>>>>>>>>>>>>
>> >>> > > > > > >>>>>>>>>>>>>>>>>>>>>>
>> >>> > > > > > >>>>>>>>>>>>>>>>>>>>>
>> >>> > > > > > >>>>>>>>>>>>>>>>>>>>
>> >>> > > > > > >>>>>>>>>>>>>>>>>>>
>> >>> > > > > > >>>>>>>>>>>>>>>>>>
>> >>> > > > > > >>>>>>>>>>>>>>>>>
>> >>> > > > > > >>>>>>>>>>>>>>>>
>> >>> > > > > > >>>>>>>>>>>>>>>
>> >>> > > > > > >>>>>>>>>>>>>>
>> >>> > > > > > >>>>>>>>>>>>>
>> >>> > > > > > >>>>>>>>>>>>
>> >>> > > > > > >>>>>>>>>>>
>> >>> > > > > > >>>>>>>>>>
>> >>> > > > > > >>>>>>>>>
>> >>> > > > > > >>>>>>>>
>> >>> > > > > > >>>>>>>
>> >>> > > > > > >>>>>
>> >>> > > > > > >>>>
>> >>> > > > > > >>
>> >>> > > > > >
>> >>> > > > >
>> >>> > > >
>> >>> > >
>> >>> >
>> >>>
>> https://dev.mysql.com/doc/refman/8.4/en/information-schema-tables-table.html
>> >>> > > > > > >>>>>>>>>>>>>>>>>>>>>>>
>> >>> > > > > > >>>>>>>>>>>>>>>>>>>>>>> Gabor Somogyi <
>> >>> > > > > > >>>>> gabor.g.somo...@gmail.com>
>> >>> > > > > > >>>>>>>>>>>> 于2025年3月3日周一
>> >>> > > > > > >>>>>>>>>>>>>>>>> 02:00写道:
>> >>> > > > > > >>>>>>>>>>>>>>>>>>>>>>>
>> >>> > > > > > >>>>>>>>>>>>>>>>>>>>>>>> Hi Zakelly,
>> >>> > > > > > >>>>>>>>>>>>>>>>>>>>>>>>
>> >>> > > > > > >>>>>>>>>>>>>>>>>>>>>>>> In order to shoot for simplicity
>> >>> > > > > > >>>>>>> `METADATA
>> >>> > > > > > >>>>>>>>>>> VIRTUAL`
>> >>> > > > > > >>>>>>>>>>>>> as
>> >>> > > > > > >>>>>>>>>>>>>>> key
>> >>> > > > > > >>>>>>>>>>>>>>>>>> words
>> >>> > > > > > >>>>>>>>>>>>>>>>>>>> for
>> >>> > > > > > >>>>>>>>>>>>>>>>>>>>>>>> definition is the target.
>> >>> > > > > > >>>>>>>>>>>>>>>>>>>>>>>> When it's not super complex the
>> >>> > > > > > >>>> latter
>> >>> > > > > > >>>>>>> can
>> >>> > > > > > >>>>>>>> be
>> >>> > > > > > >>>>>>>>>>> added
>> >>> > > > > > >>>>>>>>>>>>>> too.
>> >>> > > > > > >>>>>>>>>>>>>>>>>>>>>>>>
>> >>> > > > > > >>>>>>>>>>>>>>>>>>>>>>>> BR,
>> >>> > > > > > >>>>>>>>>>>>>>>>>>>>>>>> G
>> >>> > > > > > >>>>>>>>>>>>>>>>>>>>>>>>
>> >>> > > > > > >>>>>>>>>>>>>>>>>>>>>>>>
>> >>> > > > > > >>>>>>>>>>>>>>>>>>>>>>>> On Sun, Mar 2, 2025 at 3:37 PM
>> >>> > > > > > >>>> Zakelly
>> >>> > > > > > >>>>>>> Lan
>> >>> > > > > > >>>>>>>> <
>> >>> > > > > > >>>>>>>>>>>>>>>>>>> zakelly....@gmail.com>
>> >>> > > > > > >>>>>>>>>>>>>>>>>>>>>>> wrote:
>> >>> > > > > > >>>>>>>>>>>>>>>>>>>>>>>>
>> >>> > > > > > >>>>>>>>>>>>>>>>>>>>>>>>> Hi Gabor,
>> >>> > > > > > >>>>>>>>>>>>>>>>>>>>>>>>>
>> >>> > > > > > >>>>>>>>>>>>>>>>>>>>>>>>> +1 for this.
>> >>> > > > > > >>>>>>>>>>>>>>>>>>>>>>>>>
>> >>> > > > > > >>>>>>>>>>>>>>>>>>>>>>>>> Will the metadata column use
>> >>> > > > > > >>>>> `METADATA
>> >>> > > > > > >>>>>>>>>> VIRTUAL`
>> >>> > > > > > >>>>>>>>>>>> as
>> >>> > > > > > >>>>>>>>>>>>>> key
>> >>> > > > > > >>>>>>>>>>>>>>>>> words
>> >>> > > > > > >>>>>>>>>>>>>>>>>>> for
>> >>> > > > > > >>>>>>>>>>>>>>>>>>>>>>>>> definition, or `METADATA FROM
>> >>> > > > > > >> xxx
>> >>> > > > > > >>>>>>>> VIRTUAL`
>> >>> > > > > > >>>>>>>>>> for
>> >>> > > > > > >>>>>>>>>>>>>>> renaming,
>> >>> > > > > > >>>>>>>>>>>>>>>>> just
>> >>> > > > > > >>>>>>>>>>>>>>>>>>>> like
>> >>> > > > > > >>>>>>>>>>>>>>>>>>>>>> the
>> >>> > > > > > >>>>>>>>>>>>>>>>>>>>>>>>> Kafka table?
>> >>> > > > > > >>>>>>>>>>>>>>>>>>>>>>>>>
>> >>> > > > > > >>>>>>>>>>>>>>>>>>>>>>>>>
>> >>> > > > > > >>>>>>>>>>>>>>>>>>>>>>>>> Best,
>> >>> > > > > > >>>>>>>>>>>>>>>>>>>>>>>>> Zakelly
>> >>> > > > > > >>>>>>>>>>>>>>>>>>>>>>>>>
>> >>> > > > > > >>>>>>>>>>>>>>>>>>>>>>>>> On Sat, Mar 1, 2025 at 1:31 PM
>> >>> > > > > > >>>> Gabor
>> >>> > > > > > >>>>>>>>> Somogyi
>> >>> > > > > > >>>>>>>>>> <
>> >>> > > > > > >>>>>>>>>>>>>>>>>>>>>>> gabor.g.somo...@gmail.com>
>> >>> > > > > > >>>>>>>>>>>>>>>>>>>>>>>>> wrote:
>> >>> > > > > > >>>>>>>>>>>>>>>>>>>>>>>>>
>> >>> > > > > > >>>>>>>>>>>>>>>>>>>>>>>>>> Hi All,
>> >>> > > > > > >>>>>>>>>>>>>>>>>>>>>>>>>>
>> >>> > > > > > >>>>>>>>>>>>>>>>>>>>>>>>>> I'd like to start a
>> >>> > > > > > >> discussion
>> >>> > > > > > >>>> of
>> >>> > > > > > >>>>>>>>> FLIP-512:
>> >>> > > > > > >>>>>>>>>>> Add
>> >>> > > > > > >>>>>>>>>>>>>> meta
>> >>> > > > > > >>>>>>>>>>>>>>>>>>>> information
>> >>> > > > > > >>>>>>>>>>>>>>>>>>>>> to
>> >>> > > > > > >>>>>>>>>>>>>>>>>>>>>>> SQL
>> >>> > > > > > >>>>>>>>>>>>>>>>>>>>>>>>>> state connector [1].
>> >>> > > > > > >>>>>>>>>>>>>>>>>>>>>>>>>> Feel free to add your
>> >>> > > > > > >> thoughts
>> >>> > > > > > >>>> to
>> >>> > > > > > >>>>>>> make
>> >>> > > > > > >>>>>>>>> this
>> >>> > > > > > >>>>>>>>>>>>> feature
>> >>> > > > > > >>>>>>>>>>>>>>>>> better.
>> >>> > > > > > >>>>>>>>>>>>>>>>>>>>>>>>>>
>> >>> > > > > > >>>>>>>>>>>>>>>>>>>>>>>>>> [1]
>> >>> > > > > > >>>>>>>>>>>>>>>>>>>>>>>>>>
>> >>> > > > > > >>>>>>>>>>>>>>>>>>>>>>>>>>
>> >>> > > > > > >>>>>>>>>>>>>>>>>>>>>>>>>
>> >>> > > > > > >>>>>>>>>>>>>>>>>>>>>>>>
>> >>> > > > > > >>>>>>>>>>>>>>>>>>>>>>>
>> >>> > > > > > >>>>>>>>>>>>>>>>>>>>>>
>> >>> > > > > > >>>>>>>>>>>>>>>>>>>>>
>> >>> > > > > > >>>>>>>>>>>>>>>>>>>>
>> >>> > > > > > >>>>>>>>>>>>>>>>>>>
>> >>> > > > > > >>>>>>>>>>>>>>>>>>
>> >>> > > > > > >>>>>>>>>>>>>>>>>
>> >>> > > > > > >>>>>>>>>>>>>>>>
>> >>> > > > > > >>>>>>>>>>>>>>>
>> >>> > > > > > >>>>>>>>>>>>>>
>> >>> > > > > > >>>>>>>>>>>>>
>> >>> > > > > > >>>>>>>>>>>>
>> >>> > > > > > >>>>>>>>>>>
>> >>> > > > > > >>>>>>>>>>
>> >>> > > > > > >>>>>>>>>
>> >>> > > > > > >>>>>>>>
>> >>> > > > > > >>>>>>>
>> >>> > > > > > >>>>>
>> >>> > > > > > >>>>
>> >>> > > > > > >>
>> >>> > > > > >
>> >>> > > > >
>> >>> > > >
>> >>> > >
>> >>> >
>> >>>
>> https://cwiki.apache.org/confluence/display/FLINK/FLIP-512%3A+Add+meta+information+to+SQL+state+connector
>> >>> > > > > > >>>>>>>>>>>>>>>>>>>>>>>>>>
>> >>> > > > > > >>>>>>>>>>>>>>>>>>>>>>>>>> BR,
>> >>> > > > > > >>>>>>>>>>>>>>>>>>>>>>>>>> G
>> >>> > > > > > >>>>>>>>>>>>>>>>>>>>>>>>>>
>> >>> > > > > > >>>>>>>>>>>>>>>>>>>>>>>>>
>> >>> > > > > > >>>>>>>>>>>>>>>>>>>>>>>>
>> >>> > > > > > >>>>>>>>>>>>>>>>>>>>>>>
>> >>> > > > > > >>>>>>>>>>>>>>>>>>>>>>
>> >>> > > > > > >>>>>>>>>>>>>>>>>>>>>
>> >>> > > > > > >>>>>>>>>>>>>>>>>>>>
>> >>> > > > > > >>>>>>>>>>>>>>>>>>>
>> >>> > > > > > >>>>>>>>>>>>>>>>>>
>> >>> > > > > > >>>>>>>>>>>>>>>>>
>> >>> > > > > > >>>>>>>>>>>>>>>>
>> >>> > > > > > >>>>>>>>>>>>>>>
>> >>> > > > > > >>>>>>>>>>>>>>
>> >>> > > > > > >>>>>>>>>>>>>
>> >>> > > > > > >>>>>>>>>>>>
>> >>> > > > > > >>>>>>>>>>>
>> >>> > > > > > >>>>>>>>>>
>> >>> > > > > > >>>>>>>>>
>> >>> > > > > > >>>>>>>>
>> >>> > > > > > >>>>>>>
>> >>> > > > > > >>>>>>
>> >>> > > > > > >>>>>
>> >>> > > > > > >>>>
>> >>> > > > > > >>>
>> >>> > > > > > >>
>> >>> > > > > >
>> >>> > > > > >
>> >>> > > > >
>> >>> > > >
>> >>> > >
>> >>> >
>> >>>
>> >>
>>
>

Reply via email to