Hi, Gabor.

> Do I understand correctly that this is 2.x only feature and we can't
backport it to 1.x line

Yes. PTF is only supported in 2.x verison.

> Is it possible to describe such function to see the column names/types?

Flink SQL doesn't support this feature, but postgres[2] or mysql[1] has
similar feature.

[1] https://dev.mysql.com/doc/refman/8.4/en/show-create-procedure.html
[2]
https://stackoverflow.com/questions/6898453/show-the-code-of-a-function-procedure-and-trigger-in-postgresql

Best,
Shengkai


Gabor Somogyi <gabor.g.somo...@gmail.com> 于2025年3月27日周四 16:25写道:

> Hi Shengkai,
>
> Thanks for your effort with the example, this looks promising.
> I like the fact that users wouldn't need to sweat with complex create table
> statements.
>
> Couple of questions:
> * Do I understand correctly that this is 2.x only feature and we can't
> backport it to 1.x line?
> I'm not intended to do any backport, just would like to know the technical
> constraints.
> * Is it possible to describe such function to see the column names/types?
>
> BR,
> G
>
>
> On Thu, Mar 27, 2025 at 3:17 AM Shengkai Fang <fskm...@gmail.com> wrote:
>
> > Many thanks for your reminder, Leonard. Here's the link I mentioned[1].
> >
> > Best,
> > Shengkai
> >
> > [1] https://github.com/apache/flink/pull/26358
> >
> > Leonard Xu <xbjt...@gmail.com> 于2025年3月27日周四 10:05写道:
> >
> > > Your link is broken, Shengkai
> > >
> > > Best,
> > > Leonard
> > >
> > > > 2025年3月27日 10:01,Shengkai Fang <fskm...@gmail.com> 写道:
> > > >
> > > > Hi, All.
> > > >
> > > > I write a simple demo to illustrate my idea. Hope this helps.
> > > >
> > > > Best,
> > > > Shengkai
> > > >
> > > >
> > >
> >
> https://github.com/apache/flink/compare/master...fsk119:flink:example?expand=1
> > > >
> > > > Gabor Somogyi <gabor.g.somo...@gmail.com> 于2025年3月26日周三 15:54写道:
> > > >
> > > >>> I'm fine with a seperate SQL connector for metadata, so maybe we
> > could
> > > >> update the FLIP about our discussion?
> > > >>
> > > >> Sorry, I've forgotten this part. Yeah, no matter we choose I'm going
> > to
> > > >> update the FLIP.
> > > >>
> > > >> G
> > > >>
> > > >>
> > > >> On Wed, Mar 26, 2025 at 8:51 AM Gabor Somogyi <
> > > gabor.g.somo...@gmail.com>
> > > >> wrote:
> > > >>
> > > >>> Hi All,
> > > >>>
> > > >>> I've also lack of the knowledge of PTF so I've read just the
> > motivation
> > > >>> part:
> > > >>>
> > > >>> "The SQL 2016 standard introduced a way of defining custom SQL
> > > operators
> > > >>> defined by ISO/IEC 19075-7:2021 (Part 7: Polymorphic table
> > functions).
> > > >>> ~200 pages define how this new kind of function can consume and
> > produce
> > > >>> tables with various execution properties.
> > > >>> Unfortunately, this part of the standard is not publicly
> available."
> > > >>>
> > > >>> Of course we can take a look at some examples but do we really want
> > to
> > > >>> expose state data with this construct
> > > >>> which is described in ~200 pages and part of the standard is not
> > > publicly
> > > >>> available? 🙂
> > > >>> I mean the dataset is couple of rows and the use-case is join with
> > > >> another
> > > >>> table like with state data.
> > > >>> If somebody can give advantages I would buy that but from my
> limited
> > > >>> understanding this would be an overkill here.
> > > >>>
> > > >>> BR,
> > > >>> G
> > > >>>
> > > >>>
> > > >>> On Wed, Mar 26, 2025 at 8:28 AM Gyula Fóra <gyula.f...@gmail.com>
> > > wrote:
> > > >>>
> > > >>>> Hi Zakelly , Shengkai!
> > > >>>>
> > > >>>> I don't know too much about PTFs, it would be interesting to see
> how
> > > the
> > > >>>> usage would look in practice.
> > > >>>>
> > > >>>> Do you have some mockup/example in mind how the PTF would look for
> > > >> example
> > > >>>> when want to:
> > > >>>> - Simply display/aggregate whats in the metadata
> > > >>>> - Join keyed state with some metadata columns
> > > >>>>
> > > >>>> Thanks
> > > >>>> Gyula
> > > >>>>
> > > >>>> On Wed, Mar 26, 2025 at 7:33 AM Zakelly Lan <
> zakelly....@gmail.com>
> > > >>>> wrote:
> > > >>>>
> > > >>>>> Hi everyone,
> > > >>>>>
> > > >>>>> I'm fine with a seperate SQL connector for metadata, so maybe we
> > > could
> > > >>>>> update the FLIP about our discussion? And Shengkai provides a PTF
> > > >>>>> implementation, does that also meet the requirement?
> > > >>>>>
> > > >>>>>
> > > >>>>> Best,
> > > >>>>> Zakelly
> > > >>>>>
> > > >>>>> On Thu, Mar 20, 2025 at 4:47 PM Gabor Somogyi <
> > > >>>> gabor.g.somo...@gmail.com>
> > > >>>>> wrote:
> > > >>>>>
> > > >>>>>> Hi All,
> > > >>>>>>
> > > >>>>>> @Zakelly: Gyula summarised it correctly what I meant so please
> > treat
> > > >>>> the
> > > >>>>>> content as mine.
> > > >>>>>> As an addition I'm not against to add CLI at all, I'm just
> stating
> > > >>>> that
> > > >>>>> in
> > > >>>>>> some cases like this, users would like to have
> > > >>>>>> a self-serving solution where they can provide SQL statements
> > which
> > > >>>> can
> > > >>>>>> trigger alerts automatically.
> > > >>>>>>
> > > >>>>>> My personal opinion is that CLI would be beneficial for several
> > > >>>> cases. A
> > > >>>>>> good example is when users want to restart job
> > > >>>>>> from specific Kafka offsets which are persisted in a savepoint.
> > For
> > > >>>> such
> > > >>>>>> scenario users are more than happy since they
> > > >>>>>> expect manual intervention with full control. So all in all one
> > can
> > > >>>> count
> > > >>>>>> on my +1 when CLI FLIP would come up...
> > > >>>>>>
> > > >>>>>> BR,
> > > >>>>>> G
> > > >>>>>>
> > > >>>>>>
> > > >>>>>> On Thu, Mar 20, 2025 at 8:20 AM Gyula Fóra <
> gyula.f...@gmail.com>
> > > >>>> wrote:
> > > >>>>>>
> > > >>>>>>> Hi!
> > > >>>>>>>
> > > >>>>>>> @Zakelly Lan <zakelly....@gmail.com>
> > > >>>>>>> I think what Gabor means is that users want to have predefined
> > SQL
> > > >>>>> scripts
> > > >>>>>>> to perform state analysis tasks to debug/identify problems.
> > > >>>>>>> Such as write a SQL script that joins the metadata table with
> the
> > > >>>> state
> > > >>>>>>> and
> > > >>>>>>> do some analytics on it.
> > > >>>>>>>
> > > >>>>>>> If we have a meta table then the SQL script that can do this is
> > > >> fixed
> > > >>>>> and
> > > >>>>>>> users can trigger this on demand by simply providing a new
> > > >> savepoint
> > > >>>>> path.
> > > >>>>>>>
> > > >>>>>>> If we have a different mechanism to extract metadata that is
> not
> > > >> SQL
> > > >>>>>>> native
> > > >>>>>>> then manual steps need to be executed and a custom SQL script
> > would
> > > >>>> need
> > > >>>>>>> to
> > > >>>>>>> be written that adds the manually extracted metadata into the
> > > >> script.
> > > >>>>>>>
> > > >>>>>>> Cheers,
> > > >>>>>>> Gyula
> > > >>>>>>>
> > > >>>>>>> On Thu, Mar 20, 2025 at 4:32 AM Zakelly Lan <
> > zakelly....@gmail.com
> > > >>>
> > > >>>>>>> wrote:
> > > >>>>>>>
> > > >>>>>>>> Hi all,
> > > >>>>>>>>
> > > >>>>>>>> Thanks for your answers! Getting everyone aligned on this
> topic
> > > >> is
> > > >>>>>>>> challenging, but it’s definitely worth the effort since it
> will
> > > >>>> help
> > > >>>>>>>> streamline things moving forward.
> > > >>>>>>>>
> > > >>>>>>>> @Gabor are you saying that users are using some scripts to
> > define
> > > >>>> the
> > > >>>>>>> SQL
> > > >>>>>>>> metadata connector and get the information, right? If so,
> would
> > a
> > > >>>> CLI
> > > >>>>>>> tool
> > > >>>>>>>> be more convenient? It's easy to invoke and can get the result
> > > >>>>> swiftly.
> > > >>>>>>> And
> > > >>>>>>>> there should be some other systems to track the checkpoint
> > > >> lineage
> > > >>>> and
> > > >>>>>>>> analyze if there are outliers in metadata (e.g. state size of
> > one
> > > >>>>>>> operator)
> > > >>>>>>>> right? Well, maybe I missed something so please correct me if
> > I'm
> > > >>>>> wrong.
> > > >>>>>>>>
> > > >>>>>>>> I think the overall vision in Flink SQL is to provide a SQL
> > > >> native
> > > >>>>>>>>> environment where we can serve complex use-cases like you
> would
> > > >>>>> expect
> > > >>>>>>>> in a
> > > >>>>>>>>> regular database.
> > > >>>>>>>>
> > > >>>>>>>>
> > > >>>>>>>> @Gyula Well, this is a good point. From the perspective of
> > > >>>>> comprehensive
> > > >>>>>>>> SQL experience, I'd +1 for treating metadata as data.
> Although I
> > > >>>> doubt
> > > >>>>>>> if
> > > >>>>>>>> there is a need for processing metadata, I won't be against a
> > > >>>> separate
> > > >>>>>>>> connector.
> > > >>>>>>>>
> > > >>>>>>>> Regarding the CLI tool, I still think it’s worth implementing.
> > > >>>> Such a
> > > >>>>>>> tool
> > > >>>>>>>> could provide savepoint information before resuming from a
> > > >>>> savepoint,
> > > >>>>>>> which
> > > >>>>>>>> would enhance the user experience in CLI-based workflows. It
> > > >> would
> > > >>>> be
> > > >>>>>>> good
> > > >>>>>>>> if someone could implement this feature. We shouldn’t worry
> > about
> > > >>>>>>> whether
> > > >>>>>>>> this tool might be retired in the future. Regardless of the
> > > >>>> SQL-based
> > > >>>>>>>> solution we eventually adopt, this capability will remain
> > > >> essential
> > > >>>>> for
> > > >>>>>>> CLI
> > > >>>>>>>> users. This is another topic.
> > > >>>>>>>>
> > > >>>>>>>>
> > > >>>>>>>> Best,
> > > >>>>>>>> Zakelly
> > > >>>>>>>>
> > > >>>>>>>>
> > > >>>>>>>> On Thu, Mar 20, 2025 at 10:37 AM Shengkai Fang <
> > > >> fskm...@gmail.com>
> > > >>>>>>> wrote:
> > > >>>>>>>>
> > > >>>>>>>>> Hi.
> > > >>>>>>>>>
> > > >>>>>>>>> After reading the doc[1], I think Spark provides a function
> for
> > > >>>>> users
> > > >>>>>>> to
> > > >>>>>>>>> consume the metadata from the savepoint.  In Flink SQL,
> similar
> > > >>>>>>>>> functionality is implemented through Polymorphic Table
> > > >> Functions
> > > >>>>>>> (PTF) as
> > > >>>>>>>>> proposed in FLIP-440[2]. Below is a code example[3]
> > > >> illustrating
> > > >>>>> this
> > > >>>>>>>>> concept:
> > > >>>>>>>>>
> > > >>>>>>>>> ```
> > > >>>>>>>>>    public static class ScalarArgsFunction extends
> > > >>>>>>>>> TestProcessTableFunctionBase {
> > > >>>>>>>>>        public void eval(Integer i, Boolean b) {
> > > >>>>>>>>>            collectObjects(i, b);
> > > >>>>>>>>>        }
> > > >>>>>>>>>    }
> > > >>>>>>>>> ```
> > > >>>>>>>>>
> > > >>>>>>>>> ```
> > > >>>>>>>>> INSERT INTO sink SELECT * FROM f(i => 42, b => CAST('TRUE' AS
> > > >>>>>>> BOOLEAN))
> > > >>>>>>>>> ``
> > > >>>>>>>>>
> > > >>>>>>>>> So we can add a builtin function named `read_state_metadata`
> to
> > > >>>> read
> > > >>>>>>>>> savepoint data.
> > > >>>>>>>>>
> > > >>>>>>>>> Best,
> > > >>>>>>>>> Shengkai
> > > >>>>>>>>>
> > > >>>>>>>>> [1]
> > > >>>>>>>>>
> > > >>>>>>>>>
> > > >>>>>>>>
> > > >>>>>>>
> > > >>>>>
> > > >>>>
> > > >>
> > >
> >
> https://docs.databricks.com/aws/en/structured-streaming/read-state?language=SQL
> > > >>>>>>>>> [2]
> > > >>>>>>>>>
> > > >>>>>>>>
> > > >>>>>>>
> > > >>>>>
> > > >>>>
> > > >>
> > >
> >
> https://cwiki.apache.org/confluence/pages/viewpage.action?pageId=298781093
> > > >>>>>>>>> [3]
> > > >>>>>>>>>
> > > >>>>>>>>>
> > > >>>>>>>>
> > > >>>>>>>
> > > >>>>>
> > > >>>>
> > > >>
> > >
> >
> https://github.com/apache/flink/blob/master/flink-table/flink-table-planner/src/test/java/org/apache/flink/table/planner/plan/nodes/exec/stream/ProcessTableFunctionTestPrograms.java#L140
> > > >>>>>>>>>
> > > >>>>>>>>> Gyula Fóra <gyula.f...@gmail.com> 于2025年3月19日周三 18:37写道:
> > > >>>>>>>>>
> > > >>>>>>>>>> Hi All!
> > > >>>>>>>>>>
> > > >>>>>>>>>> Thank you for the answers and concerns from everyone.
> > > >>>>>>>>>>
> > > >>>>>>>>>> On the CLI vs State Metadata Connector/Table question I
> would
> > > >>>> also
> > > >>>>>>> like
> > > >>>>>>>>> to
> > > >>>>>>>>>> step back a little and look at the bigger picture.
> > > >>>>>>>>>>
> > > >>>>>>>>>> I think the overall vision in Flink SQL is to provide a SQL
> > > >>>> native
> > > >>>>>>>>>> environment where we can serve complex use-cases like you
> > > >> would
> > > >>>>>>> expect
> > > >>>>>>>>> in a
> > > >>>>>>>>>> regular database.
> > > >>>>>>>>>> Most features, developments in the recent years have gone
> > > >> this
> > > >>>>> way.
> > > >>>>>>>>>>
> > > >>>>>>>>>> The State Metadata Table would be a natural and
> > > >> straightforward
> > > >>>>> fit
> > > >>>>>>>> here.
> > > >>>>>>>>>> So from my side, +1 for that.
> > > >>>>>>>>>>
> > > >>>>>>>>>> However I could understand if we are not ready to add a new
> > > >>>>>>>>>> connector/format due to maintenance concerns (and in general
> > > >>>>> concern
> > > >>>>>>>>> about
> > > >>>>>>>>>> the design).
> > > >>>>>>>>>> If that's the issue then we should spend more time on the
> > > >>>> design
> > > >>>>> to
> > > >>>>>>> get
> > > >>>>>>>>>> comfortable with the approach and seek feedback from the
> > > >> wider
> > > >>>>>>>> community
> > > >>>>>>>>>>
> > > >>>>>>>>>> I am -1 for the CLI/tooling approach as that will not
> provide
> > > >>>> the
> > > >>>>>>>>>> featureset we are looking for that is not already covered by
> > > >>>> the
> > > >>>>>>> Java
> > > >>>>>>>>>> connector. And that approach would come with the same
> > > >>>> maintenance
> > > >>>>>>>>>> implications.
> > > >>>>>>>>>>
> > > >>>>>>>>>> Cheers
> > > >>>>>>>>>> Gyula
> > > >>>>>>>>>>
> > > >>>>>>>>>>
> > > >>>>>>>>>> On Wed, Mar 19, 2025 at 11:24 AM Gabor Somogyi <
> > > >>>>>>>>> gabor.g.somo...@gmail.com>
> > > >>>>>>>>>> wrote:
> > > >>>>>>>>>>
> > > >>>>>>>>>>> Hi Zaklely, Shengkai
> > > >>>>>>>>>>>
> > > >>>>>>>>>>> Several topics are going on so adding gist answers to them.
> > > >>>> When
> > > >>>>>>> some
> > > >>>>>>>>>> topic
> > > >>>>>>>>>>> is not touched please highlight it.
> > > >>>>>>>>>>>
> > > >>>>>>>>>>> @Shengkai: I've read through all the previous FLIPs related
> > > >>>>>>> catalogs
> > > >>>>>>>>> and
> > > >>>>>>>>>> if
> > > >>>>>>>>>>> we would like to keep the concepts there
> > > >>>>>>>>>>> then one-to-one mapping relationship between savepoint and
> > > >>>>> catalog
> > > >>>>>>>> is a
> > > >>>>>>>>>>> reasonable direction. In short I'm happy that
> > > >>>>>>>>>>> you've highlighted this and agree as a whole. I've written
> > > >> it
> > > >>>>> down
> > > >>>>>>>>>>> previously, just want to double confirm that state catalog
> > > >> is
> > > >>>>>>>>>>> essential and planned. When we reach this point then your
> > > >>>> input
> > > >>>>> is
> > > >>>>>>>> more
> > > >>>>>>>>>>> than welcome.
> > > >>>>>>>>>>>
> > > >>>>>>>>>>> @Zakelly: We've tried the CLI and separate library
> > > >> approaches
> > > >>>>> with
> > > >>>>>>>>> users
> > > >>>>>>>>>>> already and these are not something which is welcome
> > > >> because
> > > >>>> of
> > > >>>>>>> the
> > > >>>>>>>>>>> following:
> > > >>>>>>>>>>> * Users want to have automated tasks and not manual
> > > >>>> CLI/library
> > > >>>>>>>> output
> > > >>>>>>>>>>> parsing. This can be hacked around but our experience is
> > > >>>>> negative
> > > >>>>>>> on
> > > >>>>>>>>> this
> > > >>>>>>>>>>> because it's just brittle.
> > > >>>>>>>>>>> * From development perspective It's way much bigger effort
> > > >>>> than
> > > >>>>> a
> > > >>>>>>>>>> connector
> > > >>>>>>>>>>> (hard to test, packaging/version handling is and extra
> > > >> layer
> > > >>>> of
> > > >>>>>>>>>> complexity,
> > > >>>>>>>>>>> external FS authentication is pain for users, expecting
> > > >> them
> > > >>>> to
> > > >>>>>>>>> download
> > > >>>>>>>>>>> savepoints also)
> > > >>>>>>>>>>> * Purely personal opinion but if we would find better ways
> > > >>>> later
> > > >>>>>>> then
> > > >>>>>>>>>>> retire a CLI is not more lightweight than retire a
> > > >> connector
> > > >>>>>>>>>>>
> > > >>>>>>>>>>>> It would be great if you give some examples on how user
> > > >>>> could
> > > >>>>>>>>> leverage
> > > >>>>>>>>>>> the separate connector to process the metadata.
> > > >>>>>>>>>>>
> > > >>>>>>>>>>> The most simplest cases:
> > > >>>>>>>>>>> * give me the overgroving state uids
> > > >>>>>>>>>>> * give me the not known (new or renamed) state uids
> > > >>>>>>>>>>> * give me the state uids where state size drastically
> > > >> dropped
> > > >>>>>>> compare
> > > >>>>>>>>> to
> > > >>>>>>>>>> a
> > > >>>>>>>>>>> previous savepoint (accidental state loss)
> > > >>>>>>>>>>>
> > > >>>>>>>>>>> Since it was mentioned: as a general offtopic teaser, yeah
> > > >> it
> > > >>>>>>> would
> > > >>>>>>>> be
> > > >>>>>>>>>> good
> > > >>>>>>>>>>> to have some sort of checkpoint/savepoint lineage or
> > > >> however
> > > >>>> we
> > > >>>>>>> call
> > > >>>>>>>>> it.
> > > >>>>>>>>>>> Since we've not yet reached this point there are no
> > > >> technical
> > > >>>>>>>> details,
> > > >>>>>>>>>> it's
> > > >>>>>>>>>>> more like a vision. It's a common pattern that
> > > >>>>>>>>>>> jobs are physically running but somehow the state
> > > >> processing
> > > >>>> is
> > > >>>>>>> stuck
> > > >>>>>>>>> and
> > > >>>>>>>>>>> it would be good to add some way to find it out
> > > >>>> automatically.
> > > >>>>>>>>>>> The important saying here is automation and not manual
> > > >>>>> evaluation
> > > >>>>>>>> since
> > > >>>>>>>>>>> handling 10k+ jobs is just not allowing that.
> > > >>>>>>>>>>>
> > > >>>>>>>>>>> BR,
> > > >>>>>>>>>>> G
> > > >>>>>>>>>>>
> > > >>>>>>>>>>>
> > > >>>>>>>>>>> On Wed, Mar 19, 2025 at 6:46 AM Shengkai Fang <
> > > >>>>> fskm...@gmail.com>
> > > >>>>>>>>> wrote:
> > > >>>>>>>>>>>
> > > >>>>>>>>>>>> Hi, All.
> > > >>>>>>>>>>>>
> > > >>>>>>>>>>>> About State Catalog, I want to share more thoughts about
> > > >>>> this.
> > > >>>>>>>>>>>>
> > > >>>>>>>>>>>> In the initial design concept, I understood that a
> > > >>>> savepoint
> > > >>>>>>> and a
> > > >>>>>>>>>> state
> > > >>>>>>>>>>>> catalog have a one-to-one mapping relationship. Each
> > > >>>> operator
> > > >>>>>>>>>> corresponds
> > > >>>>>>>>>>>> to a database, and the state of each operator is
> > > >>>> represented
> > > >>>>> as
> > > >>>>>>>>>>> individual
> > > >>>>>>>>>>>> tables. The rationale behind this design is:
> > > >>>>>>>>>>>>
> > > >>>>>>>>>>>> *State Diversity*: An operator may involve multiple types
> > > >>>> of
> > > >>>>>>>> states.
> > > >>>>>>>>>> For
> > > >>>>>>>>>>>> example, in our VVR design, a "multi-join" operator uses
> > > >>>> keyed
> > > >>>>>>>> states
> > > >>>>>>>>>> for
> > > >>>>>>>>>>>> two input streams and a broadcast state for the third
> > > >>>> stream.
> > > >>>>>>> This
> > > >>>>>>>>>> makes
> > > >>>>>>>>>>> it
> > > >>>>>>>>>>>> challenging to represent all states of an operator
> > > >> within a
> > > >>>>>>> single
> > > >>>>>>>>>> table.
> > > >>>>>>>>>>>> *Scalability*: Internally, an operator might have
> > > >> multiple
> > > >>>>> keyed
> > > >>>>>>>>> states
> > > >>>>>>>>>>>> (e.g., value state and list state). However, large list
> > > >>>> states
> > > >>>>>>> may
> > > >>>>>>>>> not
> > > >>>>>>>>>>> fit
> > > >>>>>>>>>>>> entirely in memory. To address this, we recommend
> > > >>>> implementing
> > > >>>>>>> each
> > > >>>>>>>>>> state
> > > >>>>>>>>>>>> as a separate table.
> > > >>>>>>>>>>>>
> > > >>>>>>>>>>>> To resolve the loosely coupled relationships between
> > > >>>> operator
> > > >>>>>>>> states,
> > > >>>>>>>>>> we
> > > >>>>>>>>>>>> propose embedding predefined views within the catalog.
> > > >>>> These
> > > >>>>>>> views
> > > >>>>>>>>>>> simplify
> > > >>>>>>>>>>>> user understanding of operator implementations and
> > > >> provide
> > > >>>> a
> > > >>>>>>> more
> > > >>>>>>>>>>> intuitive
> > > >>>>>>>>>>>> perspective. For instance, a join operator may have
> > > >>>> multiple
> > > >>>>>>> state
> > > >>>>>>>>>>>> implementations (depending on whether the join key
> > > >> includes
> > > >>>>>>> unique
> > > >>>>>>>>>>>> attributes), but users primarily care about the data
> > > >>>>> associated
> > > >>>>>>>> with
> > > >>>>>>>>> a
> > > >>>>>>>>>>>> specific join key across input streams.
> > > >>>>>>>>>>>>
> > > >>>>>>>>>>>> Returning to the one-to-one mapping between savepoints
> > > >> and
> > > >>>>>>>> catalogs,
> > > >>>>>>>>> we
> > > >>>>>>>>>>> aim
> > > >>>>>>>>>>>> to manage multiple user state catalogs through a catalog
> > > >>>>> store.
> > > >>>>>>>> When
> > > >>>>>>>>> a
> > > >>>>>>>>>>> user
> > > >>>>>>>>>>>> triggers a savepoint for a job on the platform:
> > > >>>>>>>>>>>>
> > > >>>>>>>>>>>> 1. The platform sends a REST request to the JobManager.
> > > >>>>>>>>>>>> 2. Simultaneously, it registers a new state catalog in
> > > >> the
> > > >>>>>>> catalog
> > > >>>>>>>>>> store,
> > > >>>>>>>>>>>> enabling immediate analysis of state data on the
> > > >> platform.
> > > >>>>>>>>>>>> 3. Deleting a savepoint would also trigger the removal of
> > > >>>> its
> > > >>>>>>>>>> associated
> > > >>>>>>>>>>>> catalog.
> > > >>>>>>>>>>>>
> > > >>>>>>>>>>>> This vision assumes that states are self-describing or
> > > >>>> that a
> > > >>>>>>> state
> > > >>>>>>>>>>>> metaservice is introduced to analyze savepoint
> > > >> structures.
> > > >>>>>>>>>>>>
> > > >>>>>>>>>>>>> How can users create logic to identify differences
> > > >>>> between
> > > >>>>>>>> multiple
> > > >>>>>>>>>>>> savepoints?
> > > >>>>>>>>>>>>
> > > >>>>>>>>>>>> Since savepoints and state catalogs are one-to-one
> > > >> mapped,
> > > >>>>> users
> > > >>>>>>>> can
> > > >>>>>>>>>>> query
> > > >>>>>>>>>>>> metadata via their respective catalogs. For example:
> > > >>>>>>>>>>>>
> > > >>>>>>>>>>>> 1.
> > > >>>>> `savepoint-${id}`.`system`.`metadata_table`.`<operator-name>`
> > > >>>>>>>>>> provides
> > > >>>>>>>>>>>> operator-specific metadata (e.g., state size, type).
> > > >>>>>>>>>>>> 2. Comparing metadata tables (e.g., schema versions,
> > > >> state
> > > >>>>> entry
> > > >>>>>>>>>> counts)
> > > >>>>>>>>>>>> across catalogs reveals structural or quantitative
> > > >>>>> differences.
> > > >>>>>>>>>>>> 3. For deeper analysis, users could write SQL queries to
> > > >>>>> compare
> > > >>>>>>>>>> specific
> > > >>>>>>>>>>>> state partitions or leverage the metaservice to track
> > > >> state
> > > >>>>>>>> evolution
> > > >>>>>>>>>>>> (e.g., added/removed operators, modified state
> > > >>>>> configurations).
> > > >>>>>>>>>>>>
> > > >>>>>>>>>>>> If we plan to introduce a state catalog in the future, I
> > > >>>> would
> > > >>>>>>> lean
> > > >>>>>>>>>>> toward
> > > >>>>>>>>>>>> using metadata tables. If a utility tool can address the
> > > >>>>>>> challenges
> > > >>>>>>>>> we
> > > >>>>>>>>>>>> face, could we avoid introducing an additional connector?
> > > >>>>>>>>>>>>
> > > >>>>>>>>>>>> Best,
> > > >>>>>>>>>>>> Shengkai
> > > >>>>>>>>>>>>
> > > >>>>>>>>>>>> Gyula Fóra <gyula.f...@gmail.com> 于2025年3月17日周一 20:25写道:
> > > >>>>>>>>>>>>
> > > >>>>>>>>>>>>> Hi All!
> > > >>>>>>>>>>>>>
> > > >>>>>>>>>>>>> Without going into too much detail here are my 2 cents
> > > >>>>>>> regarding
> > > >>>>>>>>> the
> > > >>>>>>>>>>>>> virtual column / catalog metadata / table (connector)
> > > >>>>>>> discussion
> > > >>>>>>>>> for
> > > >>>>>>>>>>> the
> > > >>>>>>>>>>>>> State metadata.
> > > >>>>>>>>>>>>>
> > > >>>>>>>>>>>>> State metadata such as the types of states, their
> > > >>>>> properties,
> > > >>>>>>>>> names,
> > > >>>>>>>>>>>> sizes
> > > >>>>>>>>>>>>> etc are all valuable information that can be used to
> > > >>>> enrich
> > > >>>>>>> the
> > > >>>>>>>>>>>>> computations we do on state.
> > > >>>>>>>>>>>>> We can either analyze it standalone (such as discover
> > > >>>>>>> anomalies,
> > > >>>>>>>>> for
> > > >>>>>>>>>>>> large
> > > >>>>>>>>>>>>> jobs with many states), across multiple savepoints
> > > >>>> (discover
> > > >>>>>>> how
> > > >>>>>>>>>> state
> > > >>>>>>>>>>>>> changed over time) or by joining it with keyed or
> > > >>>> non-keyed
> > > >>>>>>> state
> > > >>>>>>>>>> data
> > > >>>>>>>>>>> to
> > > >>>>>>>>>>>>> serve more complex queries on the state.
> > > >>>>>>>>>>>>>
> > > >>>>>>>>>>>>> The only solution that seems to serve all these
> > > >> use-cases
> > > >>>>> and
> > > >>>>>>>>>>>> requirements
> > > >>>>>>>>>>>>> in a straightforward and SQL canonical way is to simply
> > > >>>>> expose
> > > >>>>>>>> the
> > > >>>>>>>>>>> state
> > > >>>>>>>>>>>>> metadata as a separate table. This is a metadata table
> > > >>>> but
> > > >>>>> you
> > > >>>>>>>> can
> > > >>>>>>>>>> also
> > > >>>>>>>>>>>>> think of it as data table, it makes no practical
> > > >>>> difference
> > > >>>>>>> here.
> > > >>>>>>>>>>>>>
> > > >>>>>>>>>>>>> Once we have a catalog later, the catalog can offer
> > > >> this
> > > >>>>> table
> > > >>>>>>>> out
> > > >>>>>>>>> of
> > > >>>>>>>>>>> the
> > > >>>>>>>>>>>>> box, the same way databases provide metadata tables.
> > > >> For
> > > >>>>> this
> > > >>>>>>> to
> > > >>>>>>>>> work
> > > >>>>>>>>>>>>> however we need another, simpler connector that creates
> > > >>>> this
> > > >>>>>>>> table.
> > > >>>>>>>>>>>>>
> > > >>>>>>>>>>>>> +1 for state metadata as a separate connector/table,
> > > >>>> instead
> > > >>>>>>> of
> > > >>>>>>>>>> adding
> > > >>>>>>>>>>>>> virtual columns and adhoc catalog metadata that is hard
> > > >>>> to
> > > >>>>> use
> > > >>>>>>>> in a
> > > >>>>>>>>>>> large
> > > >>>>>>>>>>>>> number of queries.
> > > >>>>>>>>>>>>>
> > > >>>>>>>>>>>>> Cheers,
> > > >>>>>>>>>>>>> Gyula
> > > >>>>>>>>>>>>>
> > > >>>>>>>>>>>>> On Mon, Mar 17, 2025 at 12:44 PM Gabor Somogyi <
> > > >>>>>>>>>>>> gabor.g.somo...@gmail.com>
> > > >>>>>>>>>>>>> wrote:
> > > >>>>>>>>>>>>>
> > > >>>>>>>>>>>>>> 1. State TTL for Value Columns
> > > >>>>>>>>>>>>>>
> > > >>>>>>>>>>>>>>> I’m planning on adding this, and we may collaborate
> > > >>>> on
> > > >>>>> it
> > > >>>>>>> in
> > > >>>>>>>>> the
> > > >>>>>>>>>>>>> future.
> > > >>>>>>>>>>>>>>
> > > >>>>>>>>>>>>>> +1 on this, just ping me.
> > > >>>>>>>>>>>>>>
> > > >>>>>>>>>>>>>> 2. Metadata Table vs. Metadata Column
> > > >>>>>>>>>>>>>>
> > > >>>>>>>>>>>>>> After some code digging and POC all I can say that
> > > >> with
> > > >>>>>>> heavy
> > > >>>>>>>>>> effort
> > > >>>>>>>>>>> we
> > > >>>>>>>>>>>>> can
> > > >>>>>>>>>>>>>> maybe add such changes that we're able to show
> > > >> metadata
> > > >>>>> of a
> > > >>>>>>>>>>> savepoint
> > > >>>>>>>>>>>>> from
> > > >>>>>>>>>>>>>> catalog.
> > > >>>>>>>>>>>>>> I'm not against that but from user perspective this
> > > >> has
> > > >>>>>>> limited
> > > >>>>>>>>>>> value,
> > > >>>>>>>>>>>>> let
> > > >>>>>>>>>>>>>> me explain why.
> > > >>>>>>>>>>>>>>
> > > >>>>>>>>>>>>>> From high level perspective I see the following
> > > >> which I
> > > >>>>> see
> > > >>>>>>>>>> agreement
> > > >>>>>>>>>>>> on:
> > > >>>>>>>>>>>>>> * We should have a catalog which is representing one
> > > >> or
> > > >>>>> more
> > > >>>>>>>> jobs
> > > >>>>>>>>>>>>> savepoint
> > > >>>>>>>>>>>>>> data set (future plan)
> > > >>>>>>>>>>>>>> * Savepoints should be able to be registered in the
> > > >>>>> catalog
> > > >>>>>>>> which
> > > >>>>>>>>>> are
> > > >>>>>>>>>>>>> then
> > > >>>>>>>>>>>>>> databases (future plan)
> > > >>>>>>>>>>>>>> * There must be a possiblity to create tables from
> > > >>>>> databases
> > > >>>>>>>>> where
> > > >>>>>>>>>>>> users
> > > >>>>>>>>>>>>>> can read state data (exists already)
> > > >>>>>>>>>>>>>>
> > > >>>>>>>>>>>>>> In terms of metadata, If I understand correctly then
> > > >>>> the
> > > >>>>>>>>> suggested
> > > >>>>>>>>>>>>> approach
> > > >>>>>>>>>>>>>> would be to access
> > > >>>>>>>>>>>>>> it from the catalog describe command, right? Adding
> > > >>>> that
> > > >>>>>>> info
> > > >>>>>>>>> when
> > > >>>>>>>>>>>>> specific
> > > >>>>>>>>>>>>>> database describe command
> > > >>>>>>>>>>>>>> is executed could be done.
> > > >>>>>>>>>>>>>>
> > > >>>>>>>>>>>>>> The question is for instance how can users create
> > > >> such
> > > >>>> a
> > > >>>>>>> logic
> > > >>>>>>>>> that
> > > >>>>>>>>>>>> tells
> > > >>>>>>>>>>>>>> them what is
> > > >>>>>>>>>>>>>> the difference between multiple savepoints?
> > > >>>>>>>>>>>>>> Just to give some examples:
> > > >>>>>>>>>>>>>> * per operator size changes between savepoints
> > > >>>>>>>>>>>>>> * show values from operator data where state size
> > > >>>> reaches
> > > >>>>> a
> > > >>>>>>>>>> boundary
> > > >>>>>>>>>>>>>> * in general "find which checkpoint ruined things" is
> > > >>>>> quite
> > > >>>>>>>>> common
> > > >>>>>>>>>>>>> pattern
> > > >>>>>>>>>>>>>> What I would like to highlight here is that from
> > > >> Flink
> > > >>>>>>> point of
> > > >>>>>>>>>> view
> > > >>>>>>>>>>>> the
> > > >>>>>>>>>>>>>> metadata can be
> > > >>>>>>>>>>>>>> considered as a static side output information but
> > > >> for
> > > >>>>> users
> > > >>>>>>>>> these
> > > >>>>>>>>>>>> values
> > > >>>>>>>>>>>>>> are actual real data
> > > >>>>>>>>>>>>>> where logic is planned to build around.
> > > >>>>>>>>>>>>>>
> > > >>>>>>>>>>>>>>> The metadata is more like one-time information
> > > >>>> instead
> > > >>>>> of
> > > >>>>>>> a
> > > >>>>>>>>>>> streaming
> > > >>>>>>>>>>>>>> data that changes all
> > > >>>>>>>>>>>>>> the time, so a single connector seems to be an
> > > >>>> overkill.
> > > >>>>>>>>>>>>>>
> > > >>>>>>>>>>>>>> State data is also static within a savepoint and
> > > >> that's
> > > >>>>> the
> > > >>>>>>>>> reason
> > > >>>>>>>>>>> why
> > > >>>>>>>>>>>>> the
> > > >>>>>>>>>>>>>> state processor API is working in batch mode.
> > > >>>>>>>>>>>>>> When we handle multiple checkpoints in a streaming
> > > >>>> fashion
> > > >>>>>>> then
> > > >>>>>>>>>> this
> > > >>>>>>>>>>>> can
> > > >>>>>>>>>>>>> be
> > > >>>>>>>>>>>>>> viewed from another angle.
> > > >>>>>>>>>>>>>>
> > > >>>>>>>>>>>>>> We can come up with more lightweight solution other
> > > >>>> than a
> > > >>>>>>> new
> > > >>>>>>>>>>>> connector
> > > >>>>>>>>>>>>>> but enforcing users to parse the catalog
> > > >>>>>>>>>>>>>> describe command output in order to compare multiple
> > > >>>>>>> savepoints
> > > >>>>>>>>>>> doesn't
> > > >>>>>>>>>>>>>> sound smooth user experience.
> > > >>>>>>>>>>>>>> Honestly I've no other idea how exposing metadata as
> > > >>>> real
> > > >>>>>>> user
> > > >>>>>>>>> data
> > > >>>>>>>>>>> so
> > > >>>>>>>>>>>>>> waiting on other approaches.
> > > >>>>>>>>>>>>>>
> > > >>>>>>>>>>>>>> BR,
> > > >>>>>>>>>>>>>> G
> > > >>>>>>>>>>>>>>
> > > >>>>>>>>>>>>>>
> > > >>>>>>>>>>>>>> On Thu, Mar 13, 2025 at 2:44 AM Shengkai Fang <
> > > >>>>>>>> fskm...@gmail.com
> > > >>>>>>>>>>
> > > >>>>>>>>>>>> wrote:
> > > >>>>>>>>>>>>>>
> > > >>>>>>>>>>>>>>> Looking forward to hearing the good news!
> > > >>>>>>>>>>>>>>>
> > > >>>>>>>>>>>>>>> Best,
> > > >>>>>>>>>>>>>>> Shengkai
> > > >>>>>>>>>>>>>>>
> > > >>>>>>>>>>>>>>> Gabor Somogyi <gabor.g.somo...@gmail.com>
> > > >>>> 于2025年3月12日周三
> > > >>>>>>>>> 22:24写道:
> > > >>>>>>>>>>>>>>>
> > > >>>>>>>>>>>>>>>> Thanks for both the valuable input!
> > > >>>>>>>>>>>>>>>>
> > > >>>>>>>>>>>>>>>> Let me take a closer look at the suggestions,
> > > >> like
> > > >>>> the
> > > >>>>>>>>> Catalog
> > > >>>>>>>>>>>>>>> capabilities
> > > >>>>>>>>>>>>>>>> and possibility of embedding TypeInformation or
> > > >>>>>>>>>>>>>>>> StateDescriptor metadata directly into the raw
> > > >>>> state
> > > >>>>>>>> files...
> > > >>>>>>>>>>>>>>>>
> > > >>>>>>>>>>>>>>>> BR,
> > > >>>>>>>>>>>>>>>> G
> > > >>>>>>>>>>>>>>>>
> > > >>>>>>>>>>>>>>>>
> > > >>>>>>>>>>>>>>>> On Wed, Mar 12, 2025 at 8:17 AM Shengkai Fang <
> > > >>>>>>>>>> fskm...@gmail.com
> > > >>>>>>>>>>>>
> > > >>>>>>>>>>>>>> wrote:
> > > >>>>>>>>>>>>>>>>
> > > >>>>>>>>>>>>>>>>> Thanks for Zakelly's clarification.
> > > >>>>>>>>>>>>>>>>>
> > > >>>>>>>>>>>>>>>>> 1. State TTL for Value Columns
> > > >>>>>>>>>>>>>>>>>
> > > >>>>>>>>>>>>>>>>> +1 to delay the discussion about this.
> > > >>>>>>>>>>>>>>>>>
> > > >>>>>>>>>>>>>>>>> 2. Metadata Table vs. Metadata Column
> > > >>>>>>>>>>>>>>>>>
> > > >>>>>>>>>>>>>>>>> I’d like to share my perspective on the State
> > > >>>>> Catalog
> > > >>>>>>>>>> proposal.
> > > >>>>>>>>>>>>> While
> > > >>>>>>>>>>>>>>>>> introducing this capability is beneficial,
> > > >> there
> > > >>>> is
> > > >>>>> a
> > > >>>>>>>>>> blocker:
> > > >>>>>>>>>>>> the
> > > >>>>>>>>>>>>>>>> current
> > > >>>>>>>>>>>>>>>>> StateBackend architecture does not permit
> > > >>>> operators
> > > >>>>> to
> > > >>>>>>>>> encode
> > > >>>>>>>>>>>>>>>>> TypeInformation into the state—it only
> > > >> preserves
> > > >>>> the
> > > >>>>>>>>>>> Serializer.
> > > >>>>>>>>>>>>> This
> > > >>>>>>>>>>>>>>>>> limitation creates an asymmetry, as operators
> > > >>>> alone
> > > >>>>>>>> retain
> > > >>>>>>>>>>>>> knowledge
> > > >>>>>>>>>>>>>> of
> > > >>>>>>>>>>>>>>>> the
> > > >>>>>>>>>>>>>>>>> data structure’s schema.
> > > >>>>>>>>>>>>>>>>>
> > > >>>>>>>>>>>>>>>>> To address this, I suggest allowing operators
> > > >> to
> > > >>>>> embed
> > > >>>>>>>>>>>>>> TypeInformation
> > > >>>>>>>>>>>>>>> or
> > > >>>>>>>>>>>>>>>>> StateDescriptor metadata directly into the raw
> > > >>>> state
> > > >>>>>>>> files.
> > > >>>>>>>>>>> Such
> > > >>>>>>>>>>>> a
> > > >>>>>>>>>>>>>>> design
> > > >>>>>>>>>>>>>>>>> would enable the Catalog to:
> > > >>>>>>>>>>>>>>>>>
> > > >>>>>>>>>>>>>>>>> 1. Parse state files and programmatically
> > > >> derive
> > > >>>> the
> > > >>>>>>>> schema
> > > >>>>>>>>>> and
> > > >>>>>>>>>>>>>>>> structural
> > > >>>>>>>>>>>>>>>>> guarantees for each state.
> > > >>>>>>>>>>>>>>>>> 2. Leverage existing Flink Table utilities,
> > > >> such
> > > >>>> as
> > > >>>>>>>>>>>>>>>>> LegacyTypeInfoDataTypeConverter (in
> > > >>>>>>>>>>>>>>> org.apache.flink.table.types.utils),
> > > >>>>>>>>>>>>>>>> to
> > > >>>>>>>>>>>>>>>>> bridge TypeInformation and DataType
> > > >> conversions.
> > > >>>>>>>>>>>>>>>>>
> > > >>>>>>>>>>>>>>>>> If we can not store the TypeInformation or
> > > >>>>>>>> StateDescriptor
> > > >>>>>>>>>> into
> > > >>>>>>>>>>>> the
> > > >>>>>>>>>>>>>> raw
> > > >>>>>>>>>>>>>>>>> state files, I am +1 for this FLIP to use
> > > >>>> metadata
> > > >>>>>>> column
> > > >>>>>>>>> to
> > > >>>>>>>>>>>>> retrieve
> > > >>>>>>>>>>>>>>>>> information.
> > > >>>>>>>>>>>>>>>>>
> > > >>>>>>>>>>>>>>>>> Best,
> > > >>>>>>>>>>>>>>>>> Shengkai
> > > >>>>>>>>>>>>>>>>>
> > > >>>>>>>>>>>>>>>>>
> > > >>>>>>>>>>>>>>>>>
> > > >>>>>>>>>>>>>>>>>
> > > >>>>>>>>>>>>>>>>> Zakelly Lan <zakelly....@gmail.com>
> > > >>>> 于2025年3月12日周三
> > > >>>>>>>> 12:43写道:
> > > >>>>>>>>>>>>>>>>>
> > > >>>>>>>>>>>>>>>>>> Hi Gabor and Shengkai,
> > > >>>>>>>>>>>>>>>>>>
> > > >>>>>>>>>>>>>>>>>> Thanks for sharing your thoughts! This is a
> > > >>>> long
> > > >>>>>>>>> discussion
> > > >>>>>>>>>>> and
> > > >>>>>>>>>>>>>> sorry
> > > >>>>>>>>>>>>>>>> for
> > > >>>>>>>>>>>>>>>>>> the late reply (I'm busy catching up with
> > > >>>> release
> > > >>>>>>> 2.0
> > > >>>>>>>>> these
> > > >>>>>>>>>>>>> days).
> > > >>>>>>>>>>>>>>>>>>
> > > >>>>>>>>>>>>>>>>>> 1. State TTL for Value Columns
> > > >>>>>>>>>>>>>>>>>>
> > > >>>>>>>>>>>>>>>>>>
> > > >>>>>>>>>>>>>>>>>> Let me first clarify your thoughts to ensure
> > > >> I
> > > >>>>>>>> understand
> > > >>>>>>>>>>>>>> correctly.
> > > >>>>>>>>>>>>>>>>> IIUC,
> > > >>>>>>>>>>>>>>>>>> there is no persistent configuration for
> > > >> state
> > > >>>> TTL
> > > >>>>>>> in
> > > >>>>>>>> the
> > > >>>>>>>>>>>>>> checkpoint.
> > > >>>>>>>>>>>>>>>>> While
> > > >>>>>>>>>>>>>>>>>> you can infer that TTL is enabled by reading
> > > >>>> the
> > > >>>>>>>>>> serializer,
> > > >>>>>>>>>>>> the
> > > >>>>>>>>>>>>>>>>> checkpoint
> > > >>>>>>>>>>>>>>>>>> itself only stores the last access time for
> > > >>>> each
> > > >>>>>>> value.
> > > >>>>>>>>> So
> > > >>>>>>>>>>> the
> > > >>>>>>>>>>>>> only
> > > >>>>>>>>>>>>>>>> thing
> > > >>>>>>>>>>>>>>>>>> we can show is the last access time for each
> > > >>>>> value.
> > > >>>>>>> But
> > > >>>>>>>>> it
> > > >>>>>>>>>> is
> > > >>>>>>>>>>>> not
> > > >>>>>>>>>>>>>>>>> required
> > > >>>>>>>>>>>>>>>>>> for all state backends to store this, as they
> > > >>>> may
> > > >>>>>>>>> directly
> > > >>>>>>>>>>>> store
> > > >>>>>>>>>>>>>> the
> > > >>>>>>>>>>>>>>>>>> expired time. This will also increase the
> > > >>>>>>> difficulty of
> > > >>>>>>>>>>>>>>> implementation
> > > >>>>>>>>>>>>>>>> &
> > > >>>>>>>>>>>>>>>>>> maintenance.
> > > >>>>>>>>>>>>>>>>>>
> > > >>>>>>>>>>>>>>>>>> This once again reiterates the importance of
> > > >>>>> unified
> > > >>>>>>>>>> metadata
> > > >>>>>>>>>>>> for
> > > >>>>>>>>>>>>>>>>>> checkpoints. I’m planning on adding this, and
> > > >>>> we
> > > >>>>> may
> > > >>>>>>>>>>>> collaborate
> > > >>>>>>>>>>>>> on
> > > >>>>>>>>>>>>>>> it
> > > >>>>>>>>>>>>>>>> in
> > > >>>>>>>>>>>>>>>>>> the future.
> > > >>>>>>>>>>>>>>>>>>
> > > >>>>>>>>>>>>>>>>>> 2. Metadata Table vs. Metadata Column
> > > >>>>>>>>>>>>>>>>>>
> > > >>>>>>>>>>>>>>>>>>
> > > >>>>>>>>>>>>>>>>>> I'm not in favor of adding a new connector
> > > >> for
> > > >>>>>>>> metadata.
> > > >>>>>>>>>> The
> > > >>>>>>>>>>>>>> metadata
> > > >>>>>>>>>>>>>>>> is
> > > >>>>>>>>>>>>>>>>>> more like one-time information instead of a
> > > >>>>>>> streaming
> > > >>>>>>>>> data
> > > >>>>>>>>>>> that
> > > >>>>>>>>>>>>>>> changes
> > > >>>>>>>>>>>>>>>>> all
> > > >>>>>>>>>>>>>>>>>> the time, so a single connector seems to be
> > > >> an
> > > >>>>>>>> overkill.
> > > >>>>>>>>> It
> > > >>>>>>>>>>> is
> > > >>>>>>>>>>>>> not
> > > >>>>>>>>>>>>>>> easy
> > > >>>>>>>>>>>>>>>>> to
> > > >>>>>>>>>>>>>>>>>> withdraw a connector if we have a better
> > > >>>> solution
> > > >>>>> in
> > > >>>>>>>>>> future.
> > > >>>>>>>>>>>> I'm
> > > >>>>>>>>>>>>>> not
> > > >>>>>>>>>>>>>>>>>> familiar with current Catalog capabilities,
> > > >>>> and if
> > > >>>>>>> it
> > > >>>>>>>>> could
> > > >>>>>>>>>>>>> extract
> > > >>>>>>>>>>>>>>> and
> > > >>>>>>>>>>>>>>>>>> show some operator-level information from
> > > >>>>> savepoint,
> > > >>>>>>>> that
> > > >>>>>>>>>>> would
> > > >>>>>>>>>>>>> be
> > > >>>>>>>>>>>>>>>> great.
> > > >>>>>>>>>>>>>>>>>>
> > > >>>>>>>>>>>>>>>>>> If the Catalog can't do that, I would
> > > >> consider
> > > >>>> the
> > > >>>>>>>>> current
> > > >>>>>>>>>>> FLIP
> > > >>>>>>>>>>>>> to
> > > >>>>>>>>>>>>>>> be a
> > > >>>>>>>>>>>>>>>>>> compromise solution.
> > > >>>>>>>>>>>>>>>>>>
> > > >>>>>>>>>>>>>>>>>> And if we have that unified metadata for
> > > >>>>>>>>>> checkpoint/savepoint
> > > >>>>>>>>>>>> in
> > > >>>>>>>>>>>>>>>> future,
> > > >>>>>>>>>>>>>>>>> we
> > > >>>>>>>>>>>>>>>>>> may directly register savepoint in catalog,
> > > >> and
> > > >>>>>>> create
> > > >>>>>>>> a
> > > >>>>>>>>>>> source
> > > >>>>>>>>>>>>>>> without
> > > >>>>>>>>>>>>>>>>>> specifying complex columns, as well as
> > > >> describe
> > > >>>>> the
> > > >>>>>>>>>> savepoint
> > > >>>>>>>>>>>>>> catalog
> > > >>>>>>>>>>>>>>>> to
> > > >>>>>>>>>>>>>>>>>> get the metadata. That's a good solution in
> > > >> my
> > > >>>>> mind.
> > > >>>>>>>>>>>>>>>>>>
> > > >>>>>>>>>>>>>>>>>>
> > > >>>>>>>>>>>>>>>>>> Best,
> > > >>>>>>>>>>>>>>>>>> Zakelly
> > > >>>>>>>>>>>>>>>>>>
> > > >>>>>>>>>>>>>>>>>> On Wed, Mar 12, 2025 at 10:35 AM Shengkai
> > > >> Fang
> > > >>>> <
> > > >>>>>>>>>>>>> fskm...@gmail.com>
> > > >>>>>>>>>>>>>>>>> wrote:
> > > >>>>>>>>>>>>>>>>>>
> > > >>>>>>>>>>>>>>>>>>> Hi Gabor,
> > > >>>>>>>>>>>>>>>>>>>
> > > >>>>>>>>>>>>>>>>>>>> 2. Adding a new connector with
> > > >>>>>>> `savepoint-metadata`
> > > >>>>>>>>>>>>>>>>>>>
> > > >>>>>>>>>>>>>>>>>>> I would argue against introducing a new
> > > >>>>> connector
> > > >>>>>>>> type
> > > >>>>>>>>>>> named
> > > >>>>>>>>>>>>>>>>>>> savepoint-metadata, as the existing Catalog
> > > >>>>>>> mechanism
> > > >>>>>>>>> can
> > > >>>>>>>>>>>>>>> inherently
> > > >>>>>>>>>>>>>>>>>>> provide the necessary connector factory
> > > >>>>>>> capabilities.
> > > >>>>>>>>>> I’ve
> > > >>>>>>>>>>>>>> detailed
> > > >>>>>>>>>>>>>>>>> this
> > > >>>>>>>>>>>>>>>>>>> proposal in branch[1]. Please take a moment
> > > >>>> to
> > > >>>>>>> review
> > > >>>>>>>>> it.
> > > >>>>>>>>>>>>>>>>>>>
> > > >>>>>>>>>>>>>>>>>>> If we introduce a connector named
> > > >>>>>>>> `savepoint-metadata`,
> > > >>>>>>>>>> it
> > > >>>>>>>>>>>>> means
> > > >>>>>>>>>>>>>>> user
> > > >>>>>>>>>>>>>>>>> can
> > > >>>>>>>>>>>>>>>>>>> create a temporary table with connector
> > > >>>>>>>>>>> `savepoint-metadata`
> > > >>>>>>>>>>>>> and
> > > >>>>>>>>>>>>>>> the
> > > >>>>>>>>>>>>>>>>>>> connector needs to check whether table
> > > >>>> schema is
> > > >>>>>>> same
> > > >>>>>>>>> to
> > > >>>>>>>>>>> the
> > > >>>>>>>>>>>>>> schema
> > > >>>>>>>>>>>>>>>> we
> > > >>>>>>>>>>>>>>>>>>> proposed in the FLIP. On the other hand,
> > > >> it's
> > > >>>>> not
> > > >>>>>>>> easy
> > > >>>>>>>>>> work
> > > >>>>>>>>>>>> for
> > > >>>>>>>>>>>>>>>> others
> > > >>>>>>>>>>>>>>>>> to
> > > >>>>>>>>>>>>>>>>>>> users a metadata table with same schema.
> > > >>>>>>>>>>>>>>>>>>>
> > > >>>>>>>>>>>>>>>>>>> [1]
> > > >>>>>>>>>>>>>>>>>>>
> > > >>>>>>>>>>>>>>>>>>>
> > > >>>>>>>>>>>>>>>>>>
> > > >>>>>>>>>>>>>>>>>
> > > >>>>>>>>>>>>>>>>
> > > >>>>>>>>>>>>>>>
> > > >>>>>>>>>>>>>>
> > > >>>>>>>>>>>>>
> > > >>>>>>>>>>>>
> > > >>>>>>>>>>>
> > > >>>>>>>>>>
> > > >>>>>>>>>
> > > >>>>>>>>
> > > >>>>>>>
> > > >>>>>
> > > >>>>
> > > >>
> > >
> >
> https://github.com/apache/flink/compare/master...fsk119:flink:state-metadata?expand=1#diff-712a7bc92fe46c405fb0e61b475bb2a005cb7a72bab7df28bbb92744bcb5f465R63
> > > >>>>>>>>>>>>>>>>>>>
> > > >>>>>>>>>>>>>>>>>>> Best,
> > > >>>>>>>>>>>>>>>>>>> Shengkai
> > > >>>>>>>>>>>>>>>>>>>
> > > >>>>>>>>>>>>>>>>>>> Gabor Somogyi <gabor.g.somo...@gmail.com>
> > > >>>>>>>>> 于2025年3月11日周二
> > > >>>>>>>>>>>>> 16:56写道:
> > > >>>>>>>>>>>>>>>>>>>
> > > >>>>>>>>>>>>>>>>>>>> Hi Shengkai,
> > > >>>>>>>>>>>>>>>>>>>>
> > > >>>>>>>>>>>>>>>>>>>>> 1. State TTL for Value Columns
> > > >>>>>>>>>>>>>>>>>>>>
> > > >>>>>>>>>>>>>>>>>>>> From directional perspective I agree your
> > > >>>> idea
> > > >>>>>>> how
> > > >>>>>>>> it
> > > >>>>>>>>>> can
> > > >>>>>>>>>>>> be
> > > >>>>>>>>>>>>>>>>>> implemented.
> > > >>>>>>>>>>>>>>>>>>>> Previously I've mentioned that TTL
> > > >>>> information
> > > >>>>>>> is
> > > >>>>>>>> not
> > > >>>>>>>>>>>> exposed
> > > >>>>>>>>>>>>>> on
> > > >>>>>>>>>>>>>>>> the
> > > >>>>>>>>>>>>>>>>>>> state
> > > >>>>>>>>>>>>>>>>>>>> processor API (which the SQL state
> > > >>>> connector
> > > >>>>>>> uses
> > > >>>>>>>> to
> > > >>>>>>>>>> read
> > > >>>>>>>>>>>>> data)
> > > >>>>>>>>>>>>>>>>>>>> and unless somebody show me the opposite
> > > >>>> this
> > > >>>>>>> FLIP
> > > >>>>>>>> is
> > > >>>>>>>>>> not
> > > >>>>>>>>>>>>> going
> > > >>>>>>>>>>>>>>> to
> > > >>>>>>>>>>>>>>>>>>> address
> > > >>>>>>>>>>>>>>>>>>>> this to avoid feature creep. Our users
> > > >> are
> > > >>>>> also
> > > >>>>>>>>>>> interested
> > > >>>>>>>>>>>> in
> > > >>>>>>>>>>>>>> TTL
> > > >>>>>>>>>>>>>>>> so
> > > >>>>>>>>>>>>>>>>>>>> sooner or later we're going to expose it,
> > > >>>> this
> > > >>>>>>> is
> > > >>>>>>>>>> matter
> > > >>>>>>>>>>> of
> > > >>>>>>>>>>>>>>>>> scheduling.
> > > >>>>>>>>>>>>>>>>>>>>
> > > >>>>>>>>>>>>>>>>>>>>> 2. Adding a new connector with
> > > >>>>>>>> `savepoint-metadata`
> > > >>>>>>>>>>>>>>>>>>>>
> > > >>>>>>>>>>>>>>>>>>>> Not sure I understand your point at all
> > > >>>>> related
> > > >>>>>>>>>>>> StateCatalog.
> > > >>>>>>>>>>>>>>> First
> > > >>>>>>>>>>>>>>>>> of
> > > >>>>>>>>>>>>>>>>>>> all
> > > >>>>>>>>>>>>>>>>>>>> I can't agree more that StateCatalog is
> > > >>>> needed
> > > >>>>>>> and
> > > >>>>>>>>> is a
> > > >>>>>>>>>>>>> planned
> > > >>>>>>>>>>>>>>>>>> building
> > > >>>>>>>>>>>>>>>>>>>> block in an upcoming
> > > >>>>>>>>>>>>>>>>>>>> FLIP but not sure how can it help now? No
> > > >>>>> matter
> > > >>>>>>>>> what,
> > > >>>>>>>>>>> your
> > > >>>>>>>>>>>>>>>> knowledge
> > > >>>>>>>>>>>>>>>>>> is
> > > >>>>>>>>>>>>>>>>>>>> essential when we add StateCatalog. Let
> > > >> me
> > > >>>>>>> expose
> > > >>>>>>>> my
> > > >>>>>>>>>>>>>>> understanding
> > > >>>>>>>>>>>>>>>> in
> > > >>>>>>>>>>>>>>>>>>> this
> > > >>>>>>>>>>>>>>>>>>>> area:
> > > >>>>>>>>>>>>>>>>>>>> * First we need create table statements
> > > >> to
> > > >>>>>>> access
> > > >>>>>>>>> state
> > > >>>>>>>>>>>> data
> > > >>>>>>>>>>>>>> and
> > > >>>>>>>>>>>>>>>>>> metadata
> > > >>>>>>>>>>>>>>>>>>>> * When we have that then we can add
> > > >>>>> StateCatalog
> > > >>>>>>>>> which
> > > >>>>>>>>>>>> could
> > > >>>>>>>>>>>>>>>>>> potentially
> > > >>>>>>>>>>>>>>>>>>>> ease the life of users by for ex. giving
> > > >>>>>>>>> off-the-shelf
> > > >>>>>>>>>>>> tables
> > > >>>>>>>>>>>>>>>> without
> > > >>>>>>>>>>>>>>>>>>>> sweating with create table statements
> > > >>>>>>>>>>>>>>>>>>>>
> > > >>>>>>>>>>>>>>>>>>>> User expectations:
> > > >>>>>>>>>>>>>>>>>>>> * See state data (this is fulfilled with
> > > >>>> the
> > > >>>>>>>> existing
> > > >>>>>>>>>>>>>> connector)
> > > >>>>>>>>>>>>>>>>>>>> * See metadata about state data like TTL
> > > >>>> (this
> > > >>>>>>> can
> > > >>>>>>>> be
> > > >>>>>>>>>>> added
> > > >>>>>>>>>>>>> as
> > > >>>>>>>>>>>>>>>>> metadata
> > > >>>>>>>>>>>>>>>>>>>> column as you suggested since it belongs
> > > >> to
> > > >>>>> the
> > > >>>>>>>> data)
> > > >>>>>>>>>>>>>>>>>>>> * See metadata about operators (this can
> > > >> be
> > > >>>>>>> added
> > > >>>>>>>>> from
> > > >>>>>>>>>>>>>>>>>>> savepoint-metadata)
> > > >>>>>>>>>>>>>>>>>>>>
> > > >>>>>>>>>>>>>>>>>>>> Important to highlight that state data
> > > >>>> table
> > > >>>>>>> format
> > > >>>>>>>>>>> differs
> > > >>>>>>>>>>>>>> from
> > > >>>>>>>>>>>>>>>>> state
> > > >>>>>>>>>>>>>>>>>>>> metadata table format. Namely one table
> > > >> has
> > > >>>>> rows
> > > >>>>>>>> for
> > > >>>>>>>>>>> state
> > > >>>>>>>>>>>>>> values
> > > >>>>>>>>>>>>>>>> and
> > > >>>>>>>>>>>>>>>>>>>> another has rows for operators, right?
> > > >>>>>>>>>>>>>>>>>>>> I think that's the reason why you've
> > > >>>>> pinpointed
> > > >>>>>>> out
> > > >>>>>>>>>> that
> > > >>>>>>>>>>>> the
> > > >>>>>>>>>>>>>>>>> suggested
> > > >>>>>>>>>>>>>>>>>>>> metadata columns are somewhat clunky.
> > > >>>>>>>>>>>>>>>>>>>>
> > > >>>>>>>>>>>>>>>>>>>> As a conclusion I agree to add
> > > >>>>> ${state-name}_ttl
> > > >>>>>>>>>> metadata
> > > >>>>>>>>>>>>>> column
> > > >>>>>>>>>>>>>>>>> later
> > > >>>>>>>>>>>>>>>>>> on
> > > >>>>>>>>>>>>>>>>>>>> since it belongs to the state value and
> > > >>>>> adding a
> > > >>>>>>>> new
> > > >>>>>>>>>>> table
> > > >>>>>>>>>>>>> type
> > > >>>>>>>>>>>>>>>> (like
> > > >>>>>>>>>>>>>>>>>> you
> > > >>>>>>>>>>>>>>>>>>>> suggested similar to PG [1])
> > > >>>>>>>>>>>>>>>>>>>> for metadata. Please see how Spark does
> > > >>>> that
> > > >>>>> too
> > > >>>>>>>> [2].
> > > >>>>>>>>>>>>>>>>>>>>
> > > >>>>>>>>>>>>>>>>>>>> If you have better approach then please
> > > >>>>>>> elaborate
> > > >>>>>>>>> with
> > > >>>>>>>>>>> more
> > > >>>>>>>>>>>>>>> details
> > > >>>>>>>>>>>>>>>>> and
> > > >>>>>>>>>>>>>>>>>>>> help me to understand your point.
> > > >>>>>>>>>>>>>>>>>>>>
> > > >>>>>>>>>>>>>>>>>>>>> Up until now we've seen even in TB
> > > >>>>> savepoints
> > > >>>>>>>> that
> > > >>>>>>>>>> the
> > > >>>>>>>>>>>>> number
> > > >>>>>>>>>>>>>>> of
> > > >>>>>>>>>>>>>>>>> keys
> > > >>>>>>>>>>>>>>>>>>> can
> > > >>>>>>>>>>>>>>>>>>>>> be extremely huge but not the per key
> > > >>>> state
> > > >>>>>>>> itself.
> > > >>>>>>>>>>>>>>>>>>>>> But again, this is a good feature as-is
> > > >>>> and
> > > >>>>>>> can
> > > >>>>>>>> be
> > > >>>>>>>>>>>> handled
> > > >>>>>>>>>>>>>> in a
> > > >>>>>>>>>>>>>>>>>>> separate
> > > >>>>>>>>>>>>>>>>>>>>> jira.
> > > >>>>>>>>>>>>>>>>>>>>
> > > >>>>>>>>>>>>>>>>>>>> I've just created
> > > >>>>>>>>>>>>>>>>
> > > >> https://issues.apache.org/jira/browse/FLINK-37456.
> > > >>>>>>>>>>>>>>>>>>>>
> > > >>>>>>>>>>>>>>>>>>>> [1]
> > > >>>>>>>>>>>>>>
> > > >>>>> https://www.postgresql.org/docs/current/view-pg-tables.html
> > > >>>>>>>>>>>>>>>>>>>> [2]
> > > >>>>>>>>>>>>>>>>>>>>
> > > >>>>>>>>>>>>>>>>>>>>
> > > >>>>>>>>>>>>>>>>>>>
> > > >>>>>>>>>>>>>>>>>>
> > > >>>>>>>>>>>>>>>>>
> > > >>>>>>>>>>>>>>>>
> > > >>>>>>>>>>>>>>>
> > > >>>>>>>>>>>>>>
> > > >>>>>>>>>>>>>
> > > >>>>>>>>>>>>
> > > >>>>>>>>>>>
> > > >>>>>>>>>>
> > > >>>>>>>>>
> > > >>>>>>>>
> > > >>>>>>>
> > > >>>>>
> > > >>>>
> > > >>
> > >
> >
> https://www.databricks.com/blog/announcing-state-reader-api-new-statestore-data-source
> > > >>>>>>>>>>>>>>>>>>>>
> > > >>>>>>>>>>>>>>>>>>>> BR,
> > > >>>>>>>>>>>>>>>>>>>> G
> > > >>>>>>>>>>>>>>>>>>>>
> > > >>>>>>>>>>>>>>>>>>>>
> > > >>>>>>>>>>>>>>>>>>>> On Tue, Mar 11, 2025 at 3:55 AM Shengkai
> > > >>>> Fang
> > > >>>>> <
> > > >>>>>>>>>>>>>> fskm...@gmail.com
> > > >>>>>>>>>>>>>>>>
> > > >>>>>>>>>>>>>>>>>> wrote:
> > > >>>>>>>>>>>>>>>>>>>>
> > > >>>>>>>>>>>>>>>>>>>>> Hi, Gabor. Thanks for your response.
> > > >>>>>>>>>>>>>>>>>>>>>
> > > >>>>>>>>>>>>>>>>>>>>>> 1. State TTL for Value Columns
> > > >>>>>>>>>>>>>>>>>>>>>
> > > >>>>>>>>>>>>>>>>>>>>> Thank you for addressing the
> > > >> limitations
> > > >>>>> here.
> > > >>>>>>>>>>> However, I
> > > >>>>>>>>>>>>>>> believe
> > > >>>>>>>>>>>>>>>>> it
> > > >>>>>>>>>>>>>>>>>>>> would
> > > >>>>>>>>>>>>>>>>>>>>> be beneficial to further clarify the
> > > >> API
> > > >>>> in
> > > >>>>>>> this
> > > >>>>>>>>> FLIP
> > > >>>>>>>>>>>>>> regarding
> > > >>>>>>>>>>>>>>>> how
> > > >>>>>>>>>>>>>>>>>>> users
> > > >>>>>>>>>>>>>>>>>>>>> can specify the TTL column.
> > > >>>>>>>>>>>>>>>>>>>>>
> > > >>>>>>>>>>>>>>>>>>>>> One potential approach that comes to
> > > >>>> mind is
> > > >>>>>>>> using
> > > >>>>>>>>> a
> > > >>>>>>>>>>>>>>> standardized
> > > >>>>>>>>>>>>>>>>>>> naming
> > > >>>>>>>>>>>>>>>>>>>>> convention such as ${state-name}_ttl
> > > >> for
> > > >>>> the
> > > >>>>>>>>> metadata
> > > >>>>>>>>>>>>> column
> > > >>>>>>>>>>>>>>> that
> > > >>>>>>>>>>>>>>>>>>> defines
> > > >>>>>>>>>>>>>>>>>>>>> the TTL value. In terms of
> > > >>>> implementation,
> > > >>>>> the
> > > >>>>>>>>>>>>>>>> listReadableMetadata
> > > >>>>>>>>>>>>>>>>>>>>> function could:
> > > >>>>>>>>>>>>>>>>>>>>>
> > > >>>>>>>>>>>>>>>>>>>>> 1. Read the table’s columns and
> > > >>>>> configuration,
> > > >>>>>>>>>>>>>>>>>>>>> 2. Extract all defined state names, and
> > > >>>>>>>>>>>>>>>>>>>>> 3. Return a structured list of metadata
> > > >>>>>>> entries
> > > >>>>>>>>>>> formatted
> > > >>>>>>>>>>>>> as
> > > >>>>>>>>>>>>>>>>>>>>> ${state-name}_ttl.
> > > >>>>>>>>>>>>>>>>>>>>>
> > > >>>>>>>>>>>>>>>>>>>>> WDYT?
> > > >>>>>>>>>>>>>>>>>>>>>
> > > >>>>>>>>>>>>>>>>>>>>>> 2. Adding a new connector with
> > > >>>>>>>>> `savepoint-metadata`
> > > >>>>>>>>>>>>>>>>>>>>>
> > > >>>>>>>>>>>>>>>>>>>>> Introducing a new connector type at
> > > >> this
> > > >>>>> stage
> > > >>>>>>>> may
> > > >>>>>>>>>>>>>>> unnecessarily
> > > >>>>>>>>>>>>>>>>>>>> complicate
> > > >>>>>>>>>>>>>>>>>>>>> the system. Given that every table
> > > >>>> already
> > > >>>>>>>> belongs
> > > >>>>>>>>>> to a
> > > >>>>>>>>>>>>>>> Catalog,
> > > >>>>>>>>>>>>>>>>>> which
> > > >>>>>>>>>>>>>>>>>>> is
> > > >>>>>>>>>>>>>>>>>>>>> designed to provide a Factory for
> > > >>>> building
> > > >>>>>>> source
> > > >>>>>>>>> or
> > > >>>>>>>>>>> sink
> > > >>>>>>>>>>>>>>>>>> connectors, I
> > > >>>>>>>>>>>>>>>>>>>>> propose integrating a dedicated
> > > >>>> StateCatalog
> > > >>>>>>>>> instead.
> > > >>>>>>>>>>>> This
> > > >>>>>>>>>>>>>>>> approach
> > > >>>>>>>>>>>>>>>>>>> would
> > > >>>>>>>>>>>>>>>>>>>>> allow us to:
> > > >>>>>>>>>>>>>>>>>>>>>
> > > >>>>>>>>>>>>>>>>>>>>> 1. Leverage the Catalog’s existing
> > > >>>>>>> capabilities
> > > >>>>>>>> to
> > > >>>>>>>>>>> manage
> > > >>>>>>>>>>>>> TTL
> > > >>>>>>>>>>>>>>>>>> metadata
> > > >>>>>>>>>>>>>>>>>>>>> (e.g., state names and TTL logic)
> > > >> without
> > > >>>>>>>>> duplicating
> > > >>>>>>>>>>>>>>>>> functionality.
> > > >>>>>>>>>>>>>>>>>>>>> 2. Provide a unified interface for
> > > >>>> connector
> > > >>>>>>>>>>>> instantiation
> > > >>>>>>>>>>>>>> and
> > > >>>>>>>>>>>>>>>>>> metadata
> > > >>>>>>>>>>>>>>>>>>>>> handling through the Catalog’s Factory
> > > >>>>>>> pattern.
> > > >>>>>>>>>>>>>>>>>>>>>
> > > >>>>>>>>>>>>>>>>>>>>> Would this design decision better align
> > > >>>> with
> > > >>>>>>> our
> > > >>>>>>>>>>>>>> architecture’s
> > > >>>>>>>>>>>>>>>>>>>>> extensibility and reduce redundancy?
> > > >>>>>>>>>>>>>>>>>>>>>
> > > >>>>>>>>>>>>>>>>>>>>>> Up until now we've seen even in TB
> > > >>>>>>> savepoints
> > > >>>>>>>>> that
> > > >>>>>>>>>>> the
> > > >>>>>>>>>>>>>> number
> > > >>>>>>>>>>>>>>>> of
> > > >>>>>>>>>>>>>>>>>> keys
> > > >>>>>>>>>>>>>>>>>>>> can
> > > >>>>>>>>>>>>>>>>>>>>>> be extremely huge but not the per key
> > > >>>>> state
> > > >>>>>>>>> itself.
> > > >>>>>>>>>>>>>>>>>>>>>> But again, this is a good feature
> > > >> as-is
> > > >>>>> and
> > > >>>>>>> can
> > > >>>>>>>>> be
> > > >>>>>>>>>>>>> handled
> > > >>>>>>>>>>>>>>> in a
> > > >>>>>>>>>>>>>>>>>>>> separate
> > > >>>>>>>>>>>>>>>>>>>>>> jira.
> > > >>>>>>>>>>>>>>>>>>>>>
> > > >>>>>>>>>>>>>>>>>>>>> +1 for a separate jira.
> > > >>>>>>>>>>>>>>>>>>>>>
> > > >>>>>>>>>>>>>>>>>>>>> Best,
> > > >>>>>>>>>>>>>>>>>>>>> Shengkai
> > > >>>>>>>>>>>>>>>>>>>>>
> > > >>>>>>>>>>>>>>>>>>>>> Gabor Somogyi <
> > > >> gabor.g.somo...@gmail.com
> > > >>>>>
> > > >>>>>>>>>>> 于2025年3月10日周一
> > > >>>>>>>>>>>>>>> 19:05写道:
> > > >>>>>>>>>>>>>>>>>>>>>
> > > >>>>>>>>>>>>>>>>>>>>>> Hi Shengkai,
> > > >>>>>>>>>>>>>>>>>>>>>>
> > > >>>>>>>>>>>>>>>>>>>>>> Please see my comments inline.
> > > >>>>>>>>>>>>>>>>>>>>>>
> > > >>>>>>>>>>>>>>>>>>>>>> BR,
> > > >>>>>>>>>>>>>>>>>>>>>> G
> > > >>>>>>>>>>>>>>>>>>>>>>
> > > >>>>>>>>>>>>>>>>>>>>>>
> > > >>>>>>>>>>>>>>>>>>>>>> On Mon, Mar 3, 2025 at 7:07 AM
> > > >> Shengkai
> > > >>>>>>> Fang <
> > > >>>>>>>>>>>>>>>> fskm...@gmail.com>
> > > >>>>>>>>>>>>>>>>>>>> wrote:
> > > >>>>>>>>>>>>>>>>>>>>>>
> > > >>>>>>>>>>>>>>>>>>>>>>> Hi, Gabor. Thanks for your the
> > > >> FLIP.
> > > >>>> I
> > > >>>>>>> have
> > > >>>>>>>>> some
> > > >>>>>>>>>>>>>> questions
> > > >>>>>>>>>>>>>>>>> about
> > > >>>>>>>>>>>>>>>>>>> the
> > > >>>>>>>>>>>>>>>>>>>>>> FLIP:
> > > >>>>>>>>>>>>>>>>>>>>>>>
> > > >>>>>>>>>>>>>>>>>>>>>>> 1. State TTL for Value Columns
> > > >>>>>>>>>>>>>>>>>>>>>>> How can users retrieve the state
> > > >> TTL
> > > >>>>>>>>>> (Time-to-Live)
> > > >>>>>>>>>>>> for
> > > >>>>>>>>>>>>>>> each
> > > >>>>>>>>>>>>>>>>>> value
> > > >>>>>>>>>>>>>>>>>>>>>> column?
> > > >>>>>>>>>>>>>>>>>>>>>>> From my understanding of the
> > > >> current
> > > >>>>>>> design,
> > > >>>>>>>> it
> > > >>>>>>>>>>> seems
> > > >>>>>>>>>>>>>> that
> > > >>>>>>>>>>>>>>>> this
> > > >>>>>>>>>>>>>>>>>>>>>>> functionality is not supported.
> > > >> Could
> > > >>>>> you
> > > >>>>>>>>> clarify
> > > >>>>>>>>>>> if
> > > >>>>>>>>>>>>>> there
> > > >>>>>>>>>>>>>>>> are
> > > >>>>>>>>>>>>>>>>>>> plans
> > > >>>>>>>>>>>>>>>>>>>> to
> > > >>>>>>>>>>>>>>>>>>>>>>> address this limitation?
> > > >>>>>>>>>>>>>>>>>>>>>>>
> > > >>>>>>>>>>>>>>>>>>>>>>
> > > >>>>>>>>>>>>>>>>>>>>>> Since the state processor API is not
> > > >>>> yet
> > > >>>>>>>> exposing
> > > >>>>>>>>>>> this
> > > >>>>>>>>>>>>>>>>> information
> > > >>>>>>>>>>>>>>>>>>> this
> > > >>>>>>>>>>>>>>>>>>>>>> would require several steps.
> > > >>>>>>>>>>>>>>>>>>>>>> First, the state processor API
> > > >> support
> > > >>>>>>> needs to
> > > >>>>>>>>> be
> > > >>>>>>>>>>>> added
> > > >>>>>>>>>>>>>>> which
> > > >>>>>>>>>>>>>>>>> can
> > > >>>>>>>>>>>>>>>>>> be
> > > >>>>>>>>>>>>>>>>>>>>> then
> > > >>>>>>>>>>>>>>>>>>>>>> exposed on the SQL API.
> > > >>>>>>>>>>>>>>>>>>>>>> This is definitely a future
> > > >> improvement
> > > >>>>>>> which
> > > >>>>>>>> is
> > > >>>>>>>>>>> useful
> > > >>>>>>>>>>>>> and
> > > >>>>>>>>>>>>>>> can
> > > >>>>>>>>>>>>>>>>> be
> > > >>>>>>>>>>>>>>>>>>>>> handled
> > > >>>>>>>>>>>>>>>>>>>>>> in a separate jira.
> > > >>>>>>>>>>>>>>>>>>>>>>
> > > >>>>>>>>>>>>>>>>>>>>>>
> > > >>>>>>>>>>>>>>>>>>>>>>> 2. Metadata Table vs. Metadata
> > > >> Column
> > > >>>>>>>>>>>>>>>>>>>>>>> The metadata information described
> > > >> in
> > > >>>>> the
> > > >>>>>>>> FLIP
> > > >>>>>>>>>>>> appears
> > > >>>>>>>>>>>>> to
> > > >>>>>>>>>>>>>>> be
> > > >>>>>>>>>>>>>>>>>>> intended
> > > >>>>>>>>>>>>>>>>>>>>> to
> > > >>>>>>>>>>>>>>>>>>>>>>> describe the state files stored at
> > > >> a
> > > >>>>>>> specific
> > > >>>>>>>>>>>> location.
> > > >>>>>>>>>>>>>> To
> > > >>>>>>>>>>>>>>>> me,
> > > >>>>>>>>>>>>>>>>>> this
> > > >>>>>>>>>>>>>>>>>>>>>> concept
> > > >>>>>>>>>>>>>>>>>>>>>>> aligns more closely with system
> > > >>>> tables
> > > >>>>>>> like
> > > >>>>>>>>>>> pg_tables
> > > >>>>>>>>>>>>> in
> > > >>>>>>>>>>>>>>>>>> PostgreSQL
> > > >>>>>>>>>>>>>>>>>>>> [1]
> > > >>>>>>>>>>>>>>>>>>>>>> or
> > > >>>>>>>>>>>>>>>>>>>>>>> the INFORMATION_SCHEMA in MySQL
> > > >> [2].
> > > >>>>>>>>>>>>>>>>>>>>>>>
> > > >>>>>>>>>>>>>>>>>>>>>>
> > > >>>>>>>>>>>>>>>>>>>>>> Adding a new connector with
> > > >>>>>>>> `savepoint-metadata`
> > > >>>>>>>>>> is a
> > > >>>>>>>>>>>>>>>> possibility
> > > >>>>>>>>>>>>>>>>>>> where
> > > >>>>>>>>>>>>>>>>>>>>> we
> > > >>>>>>>>>>>>>>>>>>>>>> can create such functionality.
> > > >>>>>>>>>>>>>>>>>>>>>> I'm not against that, just want to
> > > >>>> have a
> > > >>>>>>>> common
> > > >>>>>>>>>>>>> agreement
> > > >>>>>>>>>>>>>>> that
> > > >>>>>>>>>>>>>>>>> we
> > > >>>>>>>>>>>>>>>>>>>> would
> > > >>>>>>>>>>>>>>>>>>>>>> like to move that direction.
> > > >>>>>>>>>>>>>>>>>>>>>> (As a side note not just PG but Spark
> > > >>>> also
> > > >>>>>>> has
> > > >>>>>>>>>>> similar
> > > >>>>>>>>>>>>>>> approach
> > > >>>>>>>>>>>>>>>>>> and I
> > > >>>>>>>>>>>>>>>>>>>>>> basically like the idea).
> > > >>>>>>>>>>>>>>>>>>>>>> If we would go that direction
> > > >> savepoint
> > > >>>>>>>> metadata
> > > >>>>>>>>>> can
> > > >>>>>>>>>>> be
> > > >>>>>>>>>>>>>>> reached
> > > >>>>>>>>>>>>>>>>> in
> > > >>>>>>>>>>>>>>>>>> a
> > > >>>>>>>>>>>>>>>>>>>> way
> > > >>>>>>>>>>>>>>>>>>>>>> that one row would represent
> > > >>>>>>>>>>>>>>>>>>>>>> an operator with it's values
> > > >> something
> > > >>>>> like
> > > >>>>>>>> this:
> > > >>>>>>>>>>>>>>>>>>>>>>
> > > >>>>>>>>>>>>>>>>>>>>>>
> > > >>>>>>>>>>>>>>>>>>>>>>
> > > >>>>>>>>>>>>>>>>>>>>>
> > > >>>>>>>>>>>>>>>>>>>>
> > > >>>>>>>>>>>>>>>>>>>
> > > >>>>>>>>>>>>>>>>>>
> > > >>>>>>>>>>>>>>>>>
> > > >>>>>>>>>>>>>>>>
> > > >>>>>>>>>>>>>>>
> > > >>>>>>>>>>>>>>
> > > >>>>>>>>>>>>>
> > > >>>>>>>>>>>>
> > > >>>>>>>>>>>
> > > >>>>>>>>>>
> > > >>>>>>>>>
> > > >>>>>>>>
> > > >>>>>>>
> > > >>>>>
> > > >>>>
> > > >>
> > >
> >
> ┌─────────┬─────────┬─────────┬─────────┬─────────┬─────────┬─────────┬────────┐
> > > >>>>>>>>>>>>>>>>>>>>>>
> > > >>>>>>>>>>>>>>>>>>>>>>
> > > >>>>>>>>>>>>>>>>>>>>>
> > > >>>>>>>>>>>>>>>>>>>>
> > > >>>>>>>>>>>>>>>>>>>
> > > >>>>>>>>>>>>>>>>>>
> > > >>>>>>>>>>>>>>>>>
> > > >>>>>>>>>>>>>>>>
> > > >>>>>>>>>>>>>>>
> > > >>>>>>>>>>>>>>
> > > >>>>>>>>>>>>>
> > > >>>>>>>>>>>>
> > > >>>>>>>>>>>
> > > >>>>>>>>>>
> > > >>>>>>>>>
> > > >>>>>>>>
> > > >>>>>>>
> > > >>>>>
> > > >>>>
> > > >>
> > >
> >
> │operatorN│operatorU│operatorH│paralleli│maxParall│subtaskSt│coordinat│totalSta│
> > > >>>>>>>>>>>>>>>>>>>>>> │ame      │id       │ash      │sm
> > > >>>>>>> │elism
> > > >>>>>>>>>>>>>>>>>>>>>> │atesCount│orStateSi│tesSizeI│
> > > >>>>>>>>>>>>>>>>>>>>>> │         │         │         │
> > > >>>> │
> > > >>>>>>>>> │
> > > >>>>>>>>>>>>>>>>>>>>>> │zeInBytes│nBytes  │
> > > >>>>>>>>>>>>>>>>>>>>>>
> > > >>>>>>>>>>>>>>>>>>>>>>
> > > >>>>>>>>>>>>>>>>>>>>>
> > > >>>>>>>>>>>>>>>>>>>>
> > > >>>>>>>>>>>>>>>>>>>
> > > >>>>>>>>>>>>>>>>>>
> > > >>>>>>>>>>>>>>>>>
> > > >>>>>>>>>>>>>>>>
> > > >>>>>>>>>>>>>>>
> > > >>>>>>>>>>>>>>
> > > >>>>>>>>>>>>>
> > > >>>>>>>>>>>>
> > > >>>>>>>>>>>
> > > >>>>>>>>>>
> > > >>>>>>>>>
> > > >>>>>>>>
> > > >>>>>>>
> > > >>>>>
> > > >>>>
> > > >>
> > >
> >
> ├─────────┼─────────┼─────────┼─────────┼─────────┼─────────┼─────────┼────────┤
> > > >>>>>>>>>>>>>>>>>>>>>> │Source:  │datagen-s│47aee9439│2
> > > >>>>> │128
> > > >>>>>>>>>> │2
> > > >>>>>>>>>>>>>>> │16
> > > >>>>>>>>>>>>>>>>>>>>>> │546     │
> > > >>>>>>>>>>>>>>>>>>>>>> │datagen-s│ource-uid│4d6ea26e2│
> > > >>>> │
> > > >>>>>>>>> │
> > > >>>>>>>>>>>>>> │
> > > >>>>>>>>>>>>>>>>>>> │
> > > >>>>>>>>>>>>>>>>>>>>>>    │
> > > >>>>>>>>>>>>>>>>>>>>>> │ource    │         │d544bef0a│
> > > >>>> │
> > > >>>>>>>>> │
> > > >>>>>>>>>>>>>> │
> > > >>>>>>>>>>>>>>>>>>> │
> > > >>>>>>>>>>>>>>>>>>>>>>    │
> > > >>>>>>>>>>>>>>>>>>>>>> │         │         │37bb5    │
> > > >>>> │
> > > >>>>>>>>> │
> > > >>>>>>>>>>>>>> │
> > > >>>>>>>>>>>>>>>>>>> │
> > > >>>>>>>>>>>>>>>>>>>>>>    │
> > > >>>>>>>>>>>>>>>>>>>>>>
> > > >>>>>>>>>>>>>>>>>>>>>>
> > > >>>>>>>>>>>>>>>>>>>>>
> > > >>>>>>>>>>>>>>>>>>>>
> > > >>>>>>>>>>>>>>>>>>>
> > > >>>>>>>>>>>>>>>>>>
> > > >>>>>>>>>>>>>>>>>
> > > >>>>>>>>>>>>>>>>
> > > >>>>>>>>>>>>>>>
> > > >>>>>>>>>>>>>>
> > > >>>>>>>>>>>>>
> > > >>>>>>>>>>>>
> > > >>>>>>>>>>>
> > > >>>>>>>>>>
> > > >>>>>>>>>
> > > >>>>>>>>
> > > >>>>>>>
> > > >>>>>
> > > >>>>
> > > >>
> > >
> >
> ├─────────┼─────────┼─────────┼─────────┼─────────┼─────────┼─────────┼────────┤
> > > >>>>>>>>>>>>>>>>>>>>>> │long-udf-│long-udf-│6ed3f40bf│2
> > > >>>>> │128
> > > >>>>>>>>>> │2
> > > >>>>>>>>>>>>>>> │0
> > > >>>>>>>>>>>>>>>>>>>> │0
> > > >>>>>>>>>>>>>>>>>>>>>>     │
> > > >>>>>>>>>>>>>>>>>>>>>> │with-mast│with-mast│f3c8dfcdf│
> > > >>>> │
> > > >>>>>>>>> │
> > > >>>>>>>>>>>>>> │
> > > >>>>>>>>>>>>>>>>>>> │
> > > >>>>>>>>>>>>>>>>>>>>>>    │
> > > >>>>>>>>>>>>>>>>>>>>>> │er-hook  │er-hook-u│cb95128a1│
> > > >>>> │
> > > >>>>>>>>> │
> > > >>>>>>>>>>>>>> │
> > > >>>>>>>>>>>>>>>>>>> │
> > > >>>>>>>>>>>>>>>>>>>>>>    │
> > > >>>>>>>>>>>>>>>>>>>>>> │         │id       │018f1    │
> > > >>>> │
> > > >>>>>>>>> │
> > > >>>>>>>>>>>>>> │
> > > >>>>>>>>>>>>>>>>>>> │
> > > >>>>>>>>>>>>>>>>>>>>>>    │
> > > >>>>>>>>>>>>>>>>>>>>>>
> > > >>>>>>>>>>>>>>>>>>>>>>
> > > >>>>>>>>>>>>>>>>>>>>>
> > > >>>>>>>>>>>>>>>>>>>>
> > > >>>>>>>>>>>>>>>>>>>
> > > >>>>>>>>>>>>>>>>>>
> > > >>>>>>>>>>>>>>>>>
> > > >>>>>>>>>>>>>>>>
> > > >>>>>>>>>>>>>>>
> > > >>>>>>>>>>>>>>
> > > >>>>>>>>>>>>>
> > > >>>>>>>>>>>>
> > > >>>>>>>>>>>
> > > >>>>>>>>>>
> > > >>>>>>>>>
> > > >>>>>>>>
> > > >>>>>>>
> > > >>>>>
> > > >>>>
> > > >>
> > >
> >
> ├─────────┼─────────┼─────────┼─────────┼─────────┼─────────┼─────────┼────────┤
> > > >>>>>>>>>>>>>>>>>>>>>> │value-pro│value-pro│ca4f5fe9a│2
> > > >>>>> │128
> > > >>>>>>>>>> │2
> > > >>>>>>>>>>>>>>> │0
> > > >>>>>>>>>>>>>>>>>>>>>> │40726   │
> > > >>>>>>>>>>>>>>>>>>>>>> │cess     │cess-uid │637b656f0│
> > > >>>> │
> > > >>>>>>>>> │
> > > >>>>>>>>>>>>>> │
> > > >>>>>>>>>>>>>>>>>>> │
> > > >>>>>>>>>>>>>>>>>>>>>>    │
> > > >>>>>>>>>>>>>>>>>>>>>> │         │         │9ea78b3e7│
> > > >>>> │
> > > >>>>>>>>> │
> > > >>>>>>>>>>>>>> │
> > > >>>>>>>>>>>>>>>>>>> │
> > > >>>>>>>>>>>>>>>>>>>>>>    │
> > > >>>>>>>>>>>>>>>>>>>>>> │         │         │a15b9    │
> > > >>>> │
> > > >>>>>>>>> │
> > > >>>>>>>>>>>>>> │
> > > >>>>>>>>>>>>>>>>>>> │
> > > >>>>>>>>>>>>>>>>>>>>>>    │
> > > >>>>>>>>>>>>>>>>>>>>>>
> > > >>>>>>>>>>>>>>>>>>>>>>
> > > >>>>>>>>>>>>>>>>>>>>>
> > > >>>>>>>>>>>>>>>>>>>>
> > > >>>>>>>>>>>>>>>>>>>
> > > >>>>>>>>>>>>>>>>>>
> > > >>>>>>>>>>>>>>>>>
> > > >>>>>>>>>>>>>>>>
> > > >>>>>>>>>>>>>>>
> > > >>>>>>>>>>>>>>
> > > >>>>>>>>>>>>>
> > > >>>>>>>>>>>>
> > > >>>>>>>>>>>
> > > >>>>>>>>>>
> > > >>>>>>>>>
> > > >>>>>>>>
> > > >>>>>>>
> > > >>>>>
> > > >>>>
> > > >>
> > >
> >
> ├─────────┼─────────┼─────────┼─────────┼─────────┼─────────┼─────────┼────────┤
> > > >>>>>>>>>>>>>>>>>>>>>>
> > > >>>>>>>>>>>>>>>>>>>>>> This table can then be joined with
> > > >> the
> > > >>>>>>> actually
> > > >>>>>>>>>>>> existing
> > > >>>>>>>>>>>>>>>>>> `savepoint`
> > > >>>>>>>>>>>>>>>>>>>>>> connector created tables based on UID
> > > >>>> hash
> > > >>>>>>>> (which
> > > >>>>>>>>>> is
> > > >>>>>>>>>>>>> unique
> > > >>>>>>>>>>>>>>> and
> > > >>>>>>>>>>>>>>>>>>> always
> > > >>>>>>>>>>>>>>>>>>>>>> exists).
> > > >>>>>>>>>>>>>>>>>>>>>> This would mean that the already
> > > >>>> existing
> > > >>>>>>> table
> > > >>>>>>>>>> would
> > > >>>>>>>>>>>>> need
> > > >>>>>>>>>>>>>>>> only a
> > > >>>>>>>>>>>>>>>>>>>> single
> > > >>>>>>>>>>>>>>>>>>>>>> metadata column which is the UID
> > > >> hash.
> > > >>>>>>>>>>>>>>>>>>>>>> WDYT?
> > > >>>>>>>>>>>>>>>>>>>>>> @zakelly, plz share your thoughts
> > > >> too.
> > > >>>>>>>>>>>>>>>>>>>>>>
> > > >>>>>>>>>>>>>>>>>>>>>>
> > > >>>>>>>>>>>>>>>>>>>>>>> If we opt to use metadata columns,
> > > >>>> every
> > > >>>>>>>> record
> > > >>>>>>>>>> in
> > > >>>>>>>>>>>> the
> > > >>>>>>>>>>>>>>> table
> > > >>>>>>>>>>>>>>>>>> would
> > > >>>>>>>>>>>>>>>>>>>> end
> > > >>>>>>>>>>>>>>>>>>>>> up
> > > >>>>>>>>>>>>>>>>>>>>>>> having identical values for these
> > > >>>>> columns
> > > >>>>>>>>> (please
> > > >>>>>>>>>>>>> correct
> > > >>>>>>>>>>>>>>> me
> > > >>>>>>>>>>>>>>>> if
> > > >>>>>>>>>>>>>>>>>> I’m
> > > >>>>>>>>>>>>>>>>>>>>>>> mistaken). On the other hand, the
> > > >>>> state
> > > >>>>>>>>> connector
> > > >>>>>>>>>>>>>> requires
> > > >>>>>>>>>>>>>>>>> users
> > > >>>>>>>>>>>>>>>>>> to
> > > >>>>>>>>>>>>>>>>>>>>>> specify
> > > >>>>>>>>>>>>>>>>>>>>>>> an operator UID or operator UID
> > > >> hash,
> > > >>>>>>> after
> > > >>>>>>>>> which
> > > >>>>>>>>>>> it
> > > >>>>>>>>>>>>>>> outputs
> > > >>>>>>>>>>>>>>>>>>>>> user-defined
> > > >>>>>>>>>>>>>>>>>>>>>>> values in its records. This
> > > >> approach
> > > >>>>> feels
> > > >>>>>>>>>> somewhat
> > > >>>>>>>>>>>>>>> redundant
> > > >>>>>>>>>>>>>>>>> to
> > > >>>>>>>>>>>>>>>>>>> me.
> > > >>>>>>>>>>>>>>>>>>>>>>>
> > > >>>>>>>>>>>>>>>>>>>>>>
> > > >>>>>>>>>>>>>>>>>>>>>> If we would add a new
> > > >>>> `savepoint-metadata`
> > > >>>>>>>>>> connector
> > > >>>>>>>>>>>> then
> > > >>>>>>>>>>>>>>> this
> > > >>>>>>>>>>>>>>>>> can
> > > >>>>>>>>>>>>>>>>>> be
> > > >>>>>>>>>>>>>>>>>>>>>> addressed.
> > > >>>>>>>>>>>>>>>>>>>>>> On the other hand UID and UID hash
> > > >> are
> > > >>>>>>> having
> > > >>>>>>>>>>> either-or
> > > >>>>>>>>>>>>>>>>>> relationship
> > > >>>>>>>>>>>>>>>>>>>> from
> > > >>>>>>>>>>>>>>>>>>>>>> config perspective,
> > > >>>>>>>>>>>>>>>>>>>>>> so when a user provides the UID then
> > > >>>>> he/she
> > > >>>>>>> can
> > > >>>>>>>>> be
> > > >>>>>>>>>>>>>> interested
> > > >>>>>>>>>>>>>>>> in
> > > >>>>>>>>>>>>>>>>>> the
> > > >>>>>>>>>>>>>>>>>>>> hash
> > > >>>>>>>>>>>>>>>>>>>>>> for further calculations
> > > >>>>>>>>>>>>>>>>>>>>>> (the whole Flink internals are
> > > >>>> depending
> > > >>>>> on
> > > >>>>>>> the
> > > >>>>>>>>>>> hash).
> > > >>>>>>>>>>>>>>> Printing
> > > >>>>>>>>>>>>>>>>> out
> > > >>>>>>>>>>>>>>>>>>> the
> > > >>>>>>>>>>>>>>>>>>>>>> human readable UID
> > > >>>>>>>>>>>>>>>>>>>>>> is an explicit requirement from the
> > > >>>> user
> > > >>>>>>> side
> > > >>>>>>>>>> because
> > > >>>>>>>>>>>>>> hashes
> > > >>>>>>>>>>>>>>>> are
> > > >>>>>>>>>>>>>>>>>> not
> > > >>>>>>>>>>>>>>>>>>>>> human
> > > >>>>>>>>>>>>>>>>>>>>>> readable.
> > > >>>>>>>>>>>>>>>>>>>>>>
> > > >>>>>>>>>>>>>>>>>>>>>>
> > > >>>>>>>>>>>>>>>>>>>>>>> 3. Handling LIST and MAP States in
> > > >>>> the
> > > >>>>>>> State
> > > >>>>>>>>>>>> Connector
> > > >>>>>>>>>>>>>>>>>>>>>>> I have concerns about how the
> > > >> current
> > > >>>>>>> design
> > > >>>>>>>>>>> handles
> > > >>>>>>>>>>>>> LIST
> > > >>>>>>>>>>>>>>> and
> > > >>>>>>>>>>>>>>>>> MAP
> > > >>>>>>>>>>>>>>>>>>>>> states.
> > > >>>>>>>>>>>>>>>>>>>>>>> Specifically, the state connector
> > > >>>> uses
> > > >>>>>>> Flink
> > > >>>>>>>>>> SQL’s
> > > >>>>>>>>>>>> MAP
> > > >>>>>>>>>>>>>> and
> > > >>>>>>>>>>>>>>>>> ARRAY
> > > >>>>>>>>>>>>>>>>>>>> types,
> > > >>>>>>>>>>>>>>>>>>>>>>> which implies that it attempts to
> > > >>>> load
> > > >>>>>>> entire
> > > >>>>>>>>> MAP
> > > >>>>>>>>>>> or
> > > >>>>>>>>>>>>> LIST
> > > >>>>>>>>>>>>>>>>> states
> > > >>>>>>>>>>>>>>>>>>> into
> > > >>>>>>>>>>>>>>>>>>>>>>> memory.
> > > >>>>>>>>>>>>>>>>>>>>>>>
> > > >>>>>>>>>>>>>>>>>>>>>>> However, in many real-world
> > > >>>> scenarios,
> > > >>>>>>> these
> > > >>>>>>>>>> states
> > > >>>>>>>>>>>> can
> > > >>>>>>>>>>>>>>> grow
> > > >>>>>>>>>>>>>>>>> very
> > > >>>>>>>>>>>>>>>>>>>>> large.
> > > >>>>>>>>>>>>>>>>>>>>>>> Typically, the state API addresses
> > > >>>> this
> > > >>>>> by
> > > >>>>>>>>>>> providing
> > > >>>>>>>>>>>> an
> > > >>>>>>>>>>>>>>>>> iterator
> > > >>>>>>>>>>>>>>>>>> to
> > > >>>>>>>>>>>>>>>>>>>>>>> traverse elements within the state
> > > >>>>>>>>> incrementally.
> > > >>>>>>>>>>> I’m
> > > >>>>>>>>>>>>>>> unsure
> > > >>>>>>>>>>>>>>>>>>> whether
> > > >>>>>>>>>>>>>>>>>>>>> I’ve
> > > >>>>>>>>>>>>>>>>>>>>>>> missed something in FLIP-496 or
> > > >>>>> FLIP-512,
> > > >>>>>>> but
> > > >>>>>>>>> it
> > > >>>>>>>>>>>> seems
> > > >>>>>>>>>>>>>> that
> > > >>>>>>>>>>>>>>>> the
> > > >>>>>>>>>>>>>>>>>>>> current
> > > >>>>>>>>>>>>>>>>>>>>>>> design might struggle with
> > > >>>> scalability
> > > >>>>> in
> > > >>>>>>>> such
> > > >>>>>>>>>>> cases.
> > > >>>>>>>>>>>>>>>>>>>>>>>
> > > >>>>>>>>>>>>>>>>>>>>>>
> > > >>>>>>>>>>>>>>>>>>>>>> You see it good, the current
> > > >>>>> implementation
> > > >>>>>>>> keeps
> > > >>>>>>>>>>> state
> > > >>>>>>>>>>>>>> for a
> > > >>>>>>>>>>>>>>>>>> single
> > > >>>>>>>>>>>>>>>>>>>> key
> > > >>>>>>>>>>>>>>>>>>>>> in
> > > >>>>>>>>>>>>>>>>>>>>>> memory.
> > > >>>>>>>>>>>>>>>>>>>>>> Back in the days we've considered
> > > >> this
> > > >>>>>>>> potential
> > > >>>>>>>>>>> issue
> > > >>>>>>>>>>>>> and
> > > >>>>>>>>>>>>>>>>>> concluded
> > > >>>>>>>>>>>>>>>>>>>> that
> > > >>>>>>>>>>>>>>>>>>>>>> this is not necessarily
> > > >>>>>>>>>>>>>>>>>>>>>> needed for the initial version and
> > > >> can
> > > >>>> be
> > > >>>>>>> done
> > > >>>>>>>>> as a
> > > >>>>>>>>>>>> later
> > > >>>>>>>>>>>>>>>>>>> improvement.
> > > >>>>>>>>>>>>>>>>>>>>>>
> > > >>>>>>>>>>>>>>>>>>>>>> Up until now we've seen even in TB
> > > >>>>>>> savepoints
> > > >>>>>>>>> that
> > > >>>>>>>>>>> the
> > > >>>>>>>>>>>>>> number
> > > >>>>>>>>>>>>>>>> of
> > > >>>>>>>>>>>>>>>>>> keys
> > > >>>>>>>>>>>>>>>>>>>> can
> > > >>>>>>>>>>>>>>>>>>>>>> be extremely huge but not the per key
> > > >>>>> state
> > > >>>>>>>>> itself.
> > > >>>>>>>>>>>>>>>>>>>>>> But again, this is a good feature
> > > >> as-is
> > > >>>>> and
> > > >>>>>>> can
> > > >>>>>>>>> be
> > > >>>>>>>>>>>>> handled
> > > >>>>>>>>>>>>>>> in a
> > > >>>>>>>>>>>>>>>>>>>> separate
> > > >>>>>>>>>>>>>>>>>>>>>> jira.
> > > >>>>>>>>>>>>>>>>>>>>>>
> > > >>>>>>>>>>>>>>>>>>>>>>
> > > >>>>>>>>>>>>>>>>>>>>>>>
> > > >>>>>>>>>>>>>>>>>>>>>>> Best,
> > > >>>>>>>>>>>>>>>>>>>>>>> Shengkai
> > > >>>>>>>>>>>>>>>>>>>>>>>
> > > >>>>>>>>>>>>>>>>>>>>>>> [1]
> > > >>>>>>>>>>>>>>>>>
> > > >>>>>>>>> https://www.postgresql.org/docs/current/view-pg-tables.html
> > > >>>>>>>>>>>>>>>>>>>>>>> [2]
> > > >>>>>>>>>>>>>>>>>>>>>>>
> > > >>>>>>>>>>>>>>>>>>>>>>>
> > > >>>>>>>>>>>>>>>>>>>>>>
> > > >>>>>>>>>>>>>>>>>>>>>
> > > >>>>>>>>>>>>>>>>>>>>
> > > >>>>>>>>>>>>>>>>>>>
> > > >>>>>>>>>>>>>>>>>>
> > > >>>>>>>>>>>>>>>>>
> > > >>>>>>>>>>>>>>>>
> > > >>>>>>>>>>>>>>>
> > > >>>>>>>>>>>>>>
> > > >>>>>>>>>>>>>
> > > >>>>>>>>>>>>
> > > >>>>>>>>>>>
> > > >>>>>>>>>>
> > > >>>>>>>>>
> > > >>>>>>>>
> > > >>>>>>>
> > > >>>>>
> > > >>>>
> > > >>
> > >
> >
> https://dev.mysql.com/doc/refman/8.4/en/information-schema-tables-table.html
> > > >>>>>>>>>>>>>>>>>>>>>>>
> > > >>>>>>>>>>>>>>>>>>>>>>> Gabor Somogyi <
> > > >>>>> gabor.g.somo...@gmail.com>
> > > >>>>>>>>>>>> 于2025年3月3日周一
> > > >>>>>>>>>>>>>>>>> 02:00写道:
> > > >>>>>>>>>>>>>>>>>>>>>>>
> > > >>>>>>>>>>>>>>>>>>>>>>>> Hi Zakelly,
> > > >>>>>>>>>>>>>>>>>>>>>>>>
> > > >>>>>>>>>>>>>>>>>>>>>>>> In order to shoot for simplicity
> > > >>>>>>> `METADATA
> > > >>>>>>>>>>> VIRTUAL`
> > > >>>>>>>>>>>>> as
> > > >>>>>>>>>>>>>>> key
> > > >>>>>>>>>>>>>>>>>> words
> > > >>>>>>>>>>>>>>>>>>>> for
> > > >>>>>>>>>>>>>>>>>>>>>>>> definition is the target.
> > > >>>>>>>>>>>>>>>>>>>>>>>> When it's not super complex the
> > > >>>> latter
> > > >>>>>>> can
> > > >>>>>>>> be
> > > >>>>>>>>>>> added
> > > >>>>>>>>>>>>>> too.
> > > >>>>>>>>>>>>>>>>>>>>>>>>
> > > >>>>>>>>>>>>>>>>>>>>>>>> BR,
> > > >>>>>>>>>>>>>>>>>>>>>>>> G
> > > >>>>>>>>>>>>>>>>>>>>>>>>
> > > >>>>>>>>>>>>>>>>>>>>>>>>
> > > >>>>>>>>>>>>>>>>>>>>>>>> On Sun, Mar 2, 2025 at 3:37 PM
> > > >>>> Zakelly
> > > >>>>>>> Lan
> > > >>>>>>>> <
> > > >>>>>>>>>>>>>>>>>>> zakelly....@gmail.com>
> > > >>>>>>>>>>>>>>>>>>>>>>> wrote:
> > > >>>>>>>>>>>>>>>>>>>>>>>>
> > > >>>>>>>>>>>>>>>>>>>>>>>>> Hi Gabor,
> > > >>>>>>>>>>>>>>>>>>>>>>>>>
> > > >>>>>>>>>>>>>>>>>>>>>>>>> +1 for this.
> > > >>>>>>>>>>>>>>>>>>>>>>>>>
> > > >>>>>>>>>>>>>>>>>>>>>>>>> Will the metadata column use
> > > >>>>> `METADATA
> > > >>>>>>>>>> VIRTUAL`
> > > >>>>>>>>>>>> as
> > > >>>>>>>>>>>>>> key
> > > >>>>>>>>>>>>>>>>> words
> > > >>>>>>>>>>>>>>>>>>> for
> > > >>>>>>>>>>>>>>>>>>>>>>>>> definition, or `METADATA FROM
> > > >> xxx
> > > >>>>>>>> VIRTUAL`
> > > >>>>>>>>>> for
> > > >>>>>>>>>>>>>>> renaming,
> > > >>>>>>>>>>>>>>>>> just
> > > >>>>>>>>>>>>>>>>>>>> like
> > > >>>>>>>>>>>>>>>>>>>>>> the
> > > >>>>>>>>>>>>>>>>>>>>>>>>> Kafka table?
> > > >>>>>>>>>>>>>>>>>>>>>>>>>
> > > >>>>>>>>>>>>>>>>>>>>>>>>>
> > > >>>>>>>>>>>>>>>>>>>>>>>>> Best,
> > > >>>>>>>>>>>>>>>>>>>>>>>>> Zakelly
> > > >>>>>>>>>>>>>>>>>>>>>>>>>
> > > >>>>>>>>>>>>>>>>>>>>>>>>> On Sat, Mar 1, 2025 at 1:31 PM
> > > >>>> Gabor
> > > >>>>>>>>> Somogyi
> > > >>>>>>>>>> <
> > > >>>>>>>>>>>>>>>>>>>>>>> gabor.g.somo...@gmail.com>
> > > >>>>>>>>>>>>>>>>>>>>>>>>> wrote:
> > > >>>>>>>>>>>>>>>>>>>>>>>>>
> > > >>>>>>>>>>>>>>>>>>>>>>>>>> Hi All,
> > > >>>>>>>>>>>>>>>>>>>>>>>>>>
> > > >>>>>>>>>>>>>>>>>>>>>>>>>> I'd like to start a
> > > >> discussion
> > > >>>> of
> > > >>>>>>>>> FLIP-512:
> > > >>>>>>>>>>> Add
> > > >>>>>>>>>>>>>> meta
> > > >>>>>>>>>>>>>>>>>>>> information
> > > >>>>>>>>>>>>>>>>>>>>> to
> > > >>>>>>>>>>>>>>>>>>>>>>> SQL
> > > >>>>>>>>>>>>>>>>>>>>>>>>>> state connector [1].
> > > >>>>>>>>>>>>>>>>>>>>>>>>>> Feel free to add your
> > > >> thoughts
> > > >>>> to
> > > >>>>>>> make
> > > >>>>>>>>> this
> > > >>>>>>>>>>>>> feature
> > > >>>>>>>>>>>>>>>>> better.
> > > >>>>>>>>>>>>>>>>>>>>>>>>>>
> > > >>>>>>>>>>>>>>>>>>>>>>>>>> [1]
> > > >>>>>>>>>>>>>>>>>>>>>>>>>>
> > > >>>>>>>>>>>>>>>>>>>>>>>>>>
> > > >>>>>>>>>>>>>>>>>>>>>>>>>
> > > >>>>>>>>>>>>>>>>>>>>>>>>
> > > >>>>>>>>>>>>>>>>>>>>>>>
> > > >>>>>>>>>>>>>>>>>>>>>>
> > > >>>>>>>>>>>>>>>>>>>>>
> > > >>>>>>>>>>>>>>>>>>>>
> > > >>>>>>>>>>>>>>>>>>>
> > > >>>>>>>>>>>>>>>>>>
> > > >>>>>>>>>>>>>>>>>
> > > >>>>>>>>>>>>>>>>
> > > >>>>>>>>>>>>>>>
> > > >>>>>>>>>>>>>>
> > > >>>>>>>>>>>>>
> > > >>>>>>>>>>>>
> > > >>>>>>>>>>>
> > > >>>>>>>>>>
> > > >>>>>>>>>
> > > >>>>>>>>
> > > >>>>>>>
> > > >>>>>
> > > >>>>
> > > >>
> > >
> >
> https://cwiki.apache.org/confluence/display/FLINK/FLIP-512%3A+Add+meta+information+to+SQL+state+connector
> > > >>>>>>>>>>>>>>>>>>>>>>>>>>
> > > >>>>>>>>>>>>>>>>>>>>>>>>>> BR,
> > > >>>>>>>>>>>>>>>>>>>>>>>>>> G
> > > >>>>>>>>>>>>>>>>>>>>>>>>>>
> > > >>>>>>>>>>>>>>>>>>>>>>>>>
> > > >>>>>>>>>>>>>>>>>>>>>>>>
> > > >>>>>>>>>>>>>>>>>>>>>>>
> > > >>>>>>>>>>>>>>>>>>>>>>
> > > >>>>>>>>>>>>>>>>>>>>>
> > > >>>>>>>>>>>>>>>>>>>>
> > > >>>>>>>>>>>>>>>>>>>
> > > >>>>>>>>>>>>>>>>>>
> > > >>>>>>>>>>>>>>>>>
> > > >>>>>>>>>>>>>>>>
> > > >>>>>>>>>>>>>>>
> > > >>>>>>>>>>>>>>
> > > >>>>>>>>>>>>>
> > > >>>>>>>>>>>>
> > > >>>>>>>>>>>
> > > >>>>>>>>>>
> > > >>>>>>>>>
> > > >>>>>>>>
> > > >>>>>>>
> > > >>>>>>
> > > >>>>>
> > > >>>>
> > > >>>
> > > >>
> > >
> > >
> >
>

Reply via email to