Re: [DISCUSS] FLIP-512: Add meta information to SQL state connector

Shengkai Fang Wed, 19 Mar 2025 19:38:06 -0700

Hi.

After reading the doc[1], I think Spark provides a function for users to
consume the metadata from the savepoint.  In Flink SQL, similar
functionality is implemented through Polymorphic Table Functions (PTF) as
proposed in FLIP-440[2]. Below is a code example[3] illustrating this
concept:


```
    public static class ScalarArgsFunction extends
TestProcessTableFunctionBase {
        public void eval(Integer i, Boolean b) {
            collectObjects(i, b);
        }
    }
```

```
INSERT INTO sink SELECT * FROM f(i => 42, b => CAST('TRUE' AS BOOLEAN))
``

So we can add a builtin function named `read_state_metadata` to read
savepoint data.

Best,
Shengkai

[1]
https://docs.databricks.com/aws/en/structured-streaming/read-state?language=SQL
[2]
https://cwiki.apache.org/confluence/pages/viewpage.action?pageId=298781093
[3]
https://github.com/apache/flink/blob/master/flink-table/flink-table-planner/src/test/java/org/apache/flink/table/planner/plan/nodes/exec/stream/ProcessTableFunctionTestPrograms.java#L140

Gyula Fóra <[email protected]> 于2025年3月19日周三 18:37写道：

> Hi All!
>
> Thank you for the answers and concerns from everyone.
>
> On the CLI vs State Metadata Connector/Table question I would also like to
> step back a little and look at the bigger picture.
>
> I think the overall vision in Flink SQL is to provide a SQL native
> environment where we can serve complex use-cases like you would expect in a
> regular database.
> Most features, developments in the recent years have gone this way.
>
> The State Metadata Table would be a natural and straightforward fit here.
> So from my side, +1 for that.
>
> However I could understand if we are not ready to add a new
> connector/format due to maintenance concerns (and in general concern about
> the design).
> If that's the issue then we should spend more time on the design to get
> comfortable with the approach and seek feedback from the wider community
>
> I am -1 for the CLI/tooling approach as that will not provide the
> featureset we are looking for that is not already covered by the Java
> connector. And that approach would come with the same maintenance
> implications.
>
> Cheers
> Gyula
>
>
> On Wed, Mar 19, 2025 at 11:24 AM Gabor Somogyi <[email protected]>
> wrote:
>
> > Hi Zaklely, Shengkai
> >
> > Several topics are going on so adding gist answers to them. When some
> topic
> > is not touched please highlight it.
> >
> > @Shengkai: I've read through all the previous FLIPs related catalogs and
> if
> > we would like to keep the concepts there
> > then one-to-one mapping relationship between savepoint and catalog is a
> > reasonable direction. In short I'm happy that
> > you've highlighted this and agree as a whole. I've written it down
> > previously, just want to double confirm that state catalog is
> > essential and planned. When we reach this point then your input is more
> > than welcome.
> >
> > @Zakelly: We've tried the CLI and separate library approaches with users
> > already and these are not something which is welcome because of the
> > following:
> > * Users want to have automated tasks and not manual CLI/library output
> > parsing. This can be hacked around but our experience is negative on this
> > because it's just brittle.
> > * From development perspective It's way much bigger effort than a
> connector
> > (hard to test, packaging/version handling is and extra layer of
> complexity,
> > external FS authentication is pain for users, expecting them to download
> > savepoints also)
> > * Purely personal opinion but if we would find better ways later then
> > retire a CLI is not more lightweight than retire a connector
> >
> > > It would be great if you give some examples on how user could leverage
> > the separate connector to process the metadata.
> >
> > The most simplest cases:
> > * give me the overgroving state uids
> > * give me the not known (new or renamed) state uids
> > * give me the state uids where state size drastically dropped compare to
> a
> > previous savepoint (accidental state loss)
> >
> > Since it was mentioned: as a general offtopic teaser, yeah it would be
> good
> > to have some sort of checkpoint/savepoint lineage or however we call it.
> > Since we've not yet reached this point there are no technical details,
> it's
> > more like a vision. It's a common pattern that
> > jobs are physically running but somehow the state processing is stuck and
> > it would be good to add some way to find it out automatically.
> > The important saying here is automation and not manual evaluation since
> > handling 10k+ jobs is just not allowing that.
> >
> > BR,
> > G
> >
> >
> > On Wed, Mar 19, 2025 at 6:46 AM Shengkai Fang <[email protected]> wrote:
> >
> > > Hi, All.
> > >
> > > About State Catalog, I want to share more thoughts about this.
> > >
> > > In the initial design concept, I understood that a savepoint and a
> state
> > > catalog have a one-to-one mapping relationship. Each operator
> corresponds
> > > to a database, and the state of each operator is represented as
> > individual
> > > tables. The rationale behind this design is:
> > >
> > > *State Diversity*: An operator may involve multiple types of states.
> For
> > > example, in our VVR design, a "multi-join" operator uses keyed states
> for
> > > two input streams and a broadcast state for the third stream. This
> makes
> > it
> > > challenging to represent all states of an operator within a single
> table.
> > > *Scalability*: Internally, an operator might have multiple keyed states
> > > (e.g., value state and list state). However, large list states may not
> > fit
> > > entirely in memory. To address this, we recommend implementing each
> state
> > > as a separate table.
> > >
> > > To resolve the loosely coupled relationships between operator states,
> we
> > > propose embedding predefined views within the catalog. These views
> > simplify
> > > user understanding of operator implementations and provide a more
> > intuitive
> > > perspective. For instance, a join operator may have multiple state
> > > implementations (depending on whether the join key includes unique
> > > attributes), but users primarily care about the data associated with a
> > > specific join key across input streams.
> > >
> > > Returning to the one-to-one mapping between savepoints and catalogs, we
> > aim
> > > to manage multiple user state catalogs through a catalog store. When a
> > user
> > > triggers a savepoint for a job on the platform:
> > >
> > > 1. The platform sends a REST request to the JobManager.
> > > 2. Simultaneously, it registers a new state catalog in the catalog
> store,
> > > enabling immediate analysis of state data on the platform.
> > > 3. Deleting a savepoint would also trigger the removal of its
> associated
> > > catalog.
> > >
> > > This vision assumes that states are self-describing or that a state
> > > metaservice is introduced to analyze savepoint structures.
> > >
> > > > How can users create logic to identify differences between multiple
> > > savepoints?
> > >
> > > Since savepoints and state catalogs are one-to-one mapped, users can
> > query
> > > metadata via their respective catalogs. For example:
> > >
> > > 1. `savepoint-${id}`.`system`.`metadata_table`.`<operator-name>`
> provides
> > > operator-specific metadata (e.g., state size, type).
> > > 2. Comparing metadata tables (e.g., schema versions, state entry
> counts)
> > > across catalogs reveals structural or quantitative differences.
> > > 3. For deeper analysis, users could write SQL queries to compare
> specific
> > > state partitions or leverage the metaservice to track state evolution
> > > (e.g., added/removed operators, modified state configurations).
> > >
> > > If we plan to introduce a state catalog in the future, I would lean
> > toward
> > > using metadata tables. If a utility tool can address the challenges we
> > > face, could we avoid introducing an additional connector?
> > >
> > > Best,
> > > Shengkai
> > >
> > > Gyula Fóra <[email protected]> 于2025年3月17日周一 20:25写道：
> > >
> > > > Hi All!
> > > >
> > > > Without going into too much detail here are my 2 cents regarding the
> > > > virtual column / catalog metadata / table (connector) discussion for
> > the
> > > > State metadata.
> > > >
> > > > State metadata such as the types of states, their properties, names,
> > > sizes
> > > > etc are all valuable information that can be used to enrich the
> > > > computations we do on state.
> > > > We can either analyze it standalone (such as discover anomalies, for
> > > large
> > > > jobs with many states), across multiple savepoints (discover how
> state
> > > > changed over time) or by joining it with keyed or non-keyed state
> data
> > to
> > > > serve more complex queries on the state.
> > > >
> > > > The only solution that seems to serve all these use-cases and
> > > requirements
> > > > in a straightforward and SQL canonical way is to simply expose the
> > state
> > > > metadata as a separate table. This is a metadata table but you can
> also
> > > > think of it as data table, it makes no practical difference here.
> > > >
> > > > Once we have a catalog later, the catalog can offer this table out of
> > the
> > > > box, the same way databases provide metadata tables. For this to work
> > > > however we need another, simpler connector that creates this table.
> > > >
> > > > +1 for state metadata as a separate connector/table, instead of
> adding
> > > > virtual columns and adhoc catalog metadata that is hard to use in a
> > large
> > > > number of queries.
> > > >
> > > > Cheers,
> > > > Gyula
> > > >
> > > > On Mon, Mar 17, 2025 at 12:44 PM Gabor Somogyi <
> > > [email protected]>
> > > > wrote:
> > > >
> > > > > 1. State TTL for Value Columns
> > > > >
> > > > > > I’m planning on adding this, and we may collaborate on it in the
> > > > future.
> > > > >
> > > > > +1 on this, just ping me.
> > > > >
> > > > > 2. Metadata Table vs. Metadata Column
> > > > >
> > > > > After some code digging and POC all I can say that with heavy
> effort
> > we
> > > > can
> > > > > maybe add such changes that we're able to show metadata of a
> > savepoint
> > > > from
> > > > > catalog.
> > > > > I'm not against that but from user perspective this has limited
> > value,
> > > > let
> > > > > me explain why.
> > > > >
> > > > > From high level perspective I see the following which I see
> agreement
> > > on:
> > > > > * We should have a catalog which is representing one or more jobs
> > > > savepoint
> > > > > data set (future plan)
> > > > > * Savepoints should be able to be registered in the catalog which
> are
> > > > then
> > > > > databases (future plan)
> > > > > * There must be a possiblity to create tables from databases where
> > > users
> > > > > can read state data (exists already)
> > > > >
> > > > > In terms of metadata, If I understand correctly then the suggested
> > > > approach
> > > > > would be to access
> > > > > it from the catalog describe command, right? Adding that info when
> > > > specific
> > > > > database describe command
> > > > > is executed could be done.
> > > > >
> > > > > The question is for instance how can users create such a logic that
> > > tells
> > > > > them what is
> > > > > the difference between multiple savepoints?
> > > > > Just to give some examples:
> > > > > * per operator size changes between savepoints
> > > > > * show values from operator data where state size reaches a
> boundary
> > > > > * in general "find which checkpoint ruined things" is quite common
> > > > pattern
> > > > > What I would like to highlight here is that from Flink point of
> view
> > > the
> > > > > metadata can be
> > > > > considered as a static side output information but for users these
> > > values
> > > > > are actual real data
> > > > > where logic is planned to build around.
> > > > >
> > > > > > The metadata is more like one-time information instead of a
> > streaming
> > > > > data that changes all
> > > > > the time, so a single connector seems to be an overkill.
> > > > >
> > > > > State data is also static within a savepoint and that's the reason
> > why
> > > > the
> > > > > state processor API is working in batch mode.
> > > > > When we handle multiple checkpoints in a streaming fashion then
> this
> > > can
> > > > be
> > > > > viewed from another angle.
> > > > >
> > > > > We can come up with more lightweight solution other than a new
> > > connector
> > > > > but enforcing users to parse the catalog
> > > > > describe command output in order to compare multiple savepoints
> > doesn't
> > > > > sound smooth user experience.
> > > > > Honestly I've no other idea how exposing metadata as real user data
> > so
> > > > > waiting on other approaches.
> > > > >
> > > > > BR,
> > > > > G
> > > > >
> > > > >
> > > > > On Thu, Mar 13, 2025 at 2:44 AM Shengkai Fang <[email protected]>
> > > wrote:
> > > > >
> > > > > > Looking forward to hearing the good news!
> > > > > >
> > > > > > Best,
> > > > > > Shengkai
> > > > > >
> > > > > > Gabor Somogyi <[email protected]> 于2025年3月12日周三 22:24写道：
> > > > > >
> > > > > > > Thanks for both the valuable input!
> > > > > > >
> > > > > > > Let me take a closer look at the suggestions, like the Catalog
> > > > > > capabilities
> > > > > > > and possibility of embedding TypeInformation or
> > > > > > > StateDescriptor metadata directly into the raw state files...
> > > > > > >
> > > > > > > BR,
> > > > > > > G
> > > > > > >
> > > > > > >
> > > > > > > On Wed, Mar 12, 2025 at 8:17 AM Shengkai Fang <
> [email protected]
> > >
> > > > > wrote:
> > > > > > >
> > > > > > > > Thanks for Zakelly's clarification.
> > > > > > > >
> > > > > > > > 1. State TTL for Value Columns
> > > > > > > >
> > > > > > > > +1 to delay the discussion about this.
> > > > > > > >
> > > > > > > > 2. Metadata Table vs. Metadata Column
> > > > > > > >
> > > > > > > > I’d like to share my perspective on the State Catalog
> proposal.
> > > > While
> > > > > > > > introducing this capability is beneficial, there is a
> blocker:
> > > the
> > > > > > > current
> > > > > > > > StateBackend architecture does not permit operators to encode
> > > > > > > > TypeInformation into the state—it only preserves the
> > Serializer.
> > > > This
> > > > > > > > limitation creates an asymmetry, as operators alone retain
> > > > knowledge
> > > > > of
> > > > > > > the
> > > > > > > > data structure’s schema.
> > > > > > > >
> > > > > > > > To address this, I suggest allowing operators to embed
> > > > > TypeInformation
> > > > > > or
> > > > > > > > StateDescriptor metadata directly into the raw state files.
> > Such
> > > a
> > > > > > design
> > > > > > > > would enable the Catalog to:
> > > > > > > >
> > > > > > > > 1. Parse state files and programmatically derive the schema
> and
> > > > > > > structural
> > > > > > > > guarantees for each state.
> > > > > > > > 2. Leverage existing Flink Table utilities, such as
> > > > > > > > LegacyTypeInfoDataTypeConverter (in
> > > > > > org.apache.flink.table.types.utils),
> > > > > > > to
> > > > > > > > bridge TypeInformation and DataType conversions.
> > > > > > > >
> > > > > > > > If we can not store the TypeInformation or StateDescriptor
> into
> > > the
> > > > > raw
> > > > > > > > state files, I am +1 for this FLIP to use metadata column to
> > > > retrieve
> > > > > > > > information.
> > > > > > > >
> > > > > > > > Best,
> > > > > > > > Shengkai
> > > > > > > >
> > > > > > > >
> > > > > > > >
> > > > > > > >
> > > > > > > > Zakelly Lan <[email protected]> 于2025年3月12日周三 12:43写道：
> > > > > > > >
> > > > > > > > > Hi Gabor and Shengkai,
> > > > > > > > >
> > > > > > > > > Thanks for sharing your thoughts! This is a long discussion
> > and
> > > > > sorry
> > > > > > > for
> > > > > > > > > the late reply (I'm busy catching up with release 2.0 these
> > > > days).
> > > > > > > > >
> > > > > > > > > 1. State TTL for Value Columns
> > > > > > > > >
> > > > > > > > >
> > > > > > > > > Let me first clarify your thoughts to ensure I understand
> > > > > correctly.
> > > > > > > > IIUC,
> > > > > > > > > there is no persistent configuration for state TTL in the
> > > > > checkpoint.
> > > > > > > > While
> > > > > > > > > you can infer that TTL is enabled by reading the
> serializer,
> > > the
> > > > > > > > checkpoint
> > > > > > > > > itself only stores the last access time for each value. So
> > the
> > > > only
> > > > > > > thing
> > > > > > > > > we can show is the last access time for each value. But it
> is
> > > not
> > > > > > > > required
> > > > > > > > > for all state backends to store this, as they may directly
> > > store
> > > > > the
> > > > > > > > > expired time. This will also increase the difficulty of
> > > > > > implementation
> > > > > > > &
> > > > > > > > > maintenance.
> > > > > > > > >
> > > > > > > > > This once again reiterates the importance of unified
> metadata
> > > for
> > > > > > > > > checkpoints. I’m planning on adding this, and we may
> > > collaborate
> > > > on
> > > > > > it
> > > > > > > in
> > > > > > > > > the future.
> > > > > > > > >
> > > > > > > > > 2. Metadata Table vs. Metadata Column
> > > > > > > > >
> > > > > > > > >
> > > > > > > > > I'm not in favor of adding a new connector for metadata.
> The
> > > > > metadata
> > > > > > > is
> > > > > > > > > more like one-time information instead of a streaming data
> > that
> > > > > > changes
> > > > > > > > all
> > > > > > > > > the time, so a single connector seems to be an overkill. It
> > is
> > > > not
> > > > > > easy
> > > > > > > > to
> > > > > > > > > withdraw a connector if we have a better solution in
> future.
> > > I'm
> > > > > not
> > > > > > > > > familiar with current Catalog capabilities, and if it could
> > > > extract
> > > > > > and
> > > > > > > > > show some operator-level information from savepoint, that
> > would
> > > > be
> > > > > > > great.
> > > > > > > > >
> > > > > > > > > If the Catalog can't do that, I would consider the current
> > FLIP
> > > > to
> > > > > > be a
> > > > > > > > > compromise solution.
> > > > > > > > >
> > > > > > > > > And if we have that unified metadata for
> checkpoint/savepoint
> > > in
> > > > > > > future,
> > > > > > > > we
> > > > > > > > > may directly register savepoint in catalog, and create a
> > source
> > > > > > without
> > > > > > > > > specifying complex columns, as well as describe the
> savepoint
> > > > > catalog
> > > > > > > to
> > > > > > > > > get the metadata. That's a good solution in my mind.
> > > > > > > > >
> > > > > > > > >
> > > > > > > > > Best,
> > > > > > > > > Zakelly
> > > > > > > > >
> > > > > > > > > On Wed, Mar 12, 2025 at 10:35 AM Shengkai Fang <
> > > > [email protected]>
> > > > > > > > wrote:
> > > > > > > > >
> > > > > > > > > > Hi Gabor,
> > > > > > > > > >
> > > > > > > > > > > 2. Adding a new connector with `savepoint-metadata`
> > > > > > > > > >
> > > > > > > > > > I would argue against introducing a new connector type
> > named
> > > > > > > > > > savepoint-metadata, as the existing Catalog mechanism can
> > > > > > inherently
> > > > > > > > > > provide the necessary connector factory capabilities.
> I’ve
> > > > > detailed
> > > > > > > > this
> > > > > > > > > > proposal in branch[1]. Please take a moment to review it.
> > > > > > > > > >
> > > > > > > > > > If we introduce a connector named `savepoint-metadata`,
> it
> > > > means
> > > > > > user
> > > > > > > > can
> > > > > > > > > > create a temporary table with connector
> > `savepoint-metadata`
> > > > and
> > > > > > the
> > > > > > > > > > connector needs to check whether table schema is same to
> > the
> > > > > schema
> > > > > > > we
> > > > > > > > > > proposed in the FLIP. On the other hand, it's not easy
> work
> > > for
> > > > > > > others
> > > > > > > > to
> > > > > > > > > > users a metadata table with same schema.
> > > > > > > > > >
> > > > > > > > > > [1]
> > > > > > > > > >
> > > > > > > > > >
> > > > > > > > >
> > > > > > > >
> > > > > > >
> > > > > >
> > > > >
> > > >
> > >
> >
> https://github.com/apache/flink/compare/master...fsk119:flink:state-metadata?expand=1#diff-712a7bc92fe46c405fb0e61b475bb2a005cb7a72bab7df28bbb92744bcb5f465R63
> > > > > > > > > >
> > > > > > > > > > Best,
> > > > > > > > > > Shengkai
> > > > > > > > > >
> > > > > > > > > > Gabor Somogyi <[email protected]> 于2025年3月11日周二
> > > > 16:56写道：
> > > > > > > > > >
> > > > > > > > > > > Hi Shengkai,
> > > > > > > > > > >
> > > > > > > > > > > > 1. State TTL for Value Columns
> > > > > > > > > > >
> > > > > > > > > > > From directional perspective I agree your idea how it
> can
> > > be
> > > > > > > > > implemented.
> > > > > > > > > > > Previously I've mentioned that TTL information is not
> > > exposed
> > > > > on
> > > > > > > the
> > > > > > > > > > state
> > > > > > > > > > > processor API (which the SQL state connector uses to
> read
> > > > data)
> > > > > > > > > > > and unless somebody show me the opposite this FLIP is
> not
> > > > going
> > > > > > to
> > > > > > > > > > address
> > > > > > > > > > > this to avoid feature creep. Our users are also
> > interested
> > > in
> > > > > TTL
> > > > > > > so
> > > > > > > > > > > sooner or later we're going to expose it, this is
> matter
> > of
> > > > > > > > scheduling.
> > > > > > > > > > >
> > > > > > > > > > > > 2. Adding a new connector with `savepoint-metadata`
> > > > > > > > > > >
> > > > > > > > > > > Not sure I understand your point at all related
> > > StateCatalog.
> > > > > > First
> > > > > > > > of
> > > > > > > > > > all
> > > > > > > > > > > I can't agree more that StateCatalog is needed and is a
> > > > planned
> > > > > > > > > building
> > > > > > > > > > > block in an upcoming
> > > > > > > > > > > FLIP but not sure how can it help now? No matter what,
> > your
> > > > > > > knowledge
> > > > > > > > > is
> > > > > > > > > > > essential when we add StateCatalog. Let me expose my
> > > > > > understanding
> > > > > > > in
> > > > > > > > > > this
> > > > > > > > > > > area:
> > > > > > > > > > > * First we need create table statements to access state
> > > data
> > > > > and
> > > > > > > > > metadata
> > > > > > > > > > > * When we have that then we can add StateCatalog which
> > > could
> > > > > > > > > potentially
> > > > > > > > > > > ease the life of users by for ex. giving off-the-shelf
> > > tables
> > > > > > > without
> > > > > > > > > > > sweating with create table statements
> > > > > > > > > > >
> > > > > > > > > > > User expectations:
> > > > > > > > > > > * See state data (this is fulfilled with the existing
> > > > > connector)
> > > > > > > > > > > * See metadata about state data like TTL (this can be
> > added
> > > > as
> > > > > > > > metadata
> > > > > > > > > > > column as you suggested since it belongs to the data)
> > > > > > > > > > > * See metadata about operators (this can be added from
> > > > > > > > > > savepoint-metadata)
> > > > > > > > > > >
> > > > > > > > > > > Important to highlight that state data table format
> > differs
> > > > > from
> > > > > > > > state
> > > > > > > > > > > metadata table format. Namely one table has rows for
> > state
> > > > > values
> > > > > > > and
> > > > > > > > > > > another has rows for operators, right?
> > > > > > > > > > > I think that's the reason why you've pinpointed out
> that
> > > the
> > > > > > > > suggested
> > > > > > > > > > > metadata columns are somewhat clunky.
> > > > > > > > > > >
> > > > > > > > > > > As a conclusion I agree to add ${state-name}_ttl
> metadata
> > > > > column
> > > > > > > > later
> > > > > > > > > on
> > > > > > > > > > > since it belongs to the state value and adding a new
> > table
> > > > type
> > > > > > > (like
> > > > > > > > > you
> > > > > > > > > > > suggested similar to PG [1])
> > > > > > > > > > > for metadata. Please see how Spark does that too [2].
> > > > > > > > > > >
> > > > > > > > > > > If you have better approach then please elaborate with
> > more
> > > > > > details
> > > > > > > > and
> > > > > > > > > > > help me to understand your point.
> > > > > > > > > > >
> > > > > > > > > > > > Up until now we've seen even in TB savepoints that
> the
> > > > number
> > > > > > of
> > > > > > > > keys
> > > > > > > > > > can
> > > > > > > > > > > > be extremely huge but not the per key state itself.
> > > > > > > > > > > > But again, this is a good feature as-is and can be
> > > handled
> > > > > in a
> > > > > > > > > > separate
> > > > > > > > > > > > jira.
> > > > > > > > > > >
> > > > > > > > > > > I've just created
> > > > > > > https://issues.apache.org/jira/browse/FLINK-37456.
> > > > > > > > > > >
> > > > > > > > > > > [1]
> > > > > https://www.postgresql.org/docs/current/view-pg-tables.html
> > > > > > > > > > > [2]
> > > > > > > > > > >
> > > > > > > > > > >
> > > > > > > > > >
> > > > > > > > >
> > > > > > > >
> > > > > > >
> > > > > >
> > > > >
> > > >
> > >
> >
> https://www.databricks.com/blog/announcing-state-reader-api-new-statestore-data-source
> > > > > > > > > > >
> > > > > > > > > > > BR,
> > > > > > > > > > > G
> > > > > > > > > > >
> > > > > > > > > > >
> > > > > > > > > > > On Tue, Mar 11, 2025 at 3:55 AM Shengkai Fang <
> > > > > [email protected]
> > > > > > >
> > > > > > > > > wrote:
> > > > > > > > > > >
> > > > > > > > > > > > Hi, Gabor. Thanks for your response.
> > > > > > > > > > > >
> > > > > > > > > > > > > 1. State TTL for Value Columns
> > > > > > > > > > > >
> > > > > > > > > > > > Thank you for addressing the limitations here.
> > However, I
> > > > > > believe
> > > > > > > > it
> > > > > > > > > > > would
> > > > > > > > > > > > be beneficial to further clarify the API in this FLIP
> > > > > regarding
> > > > > > > how
> > > > > > > > > > users
> > > > > > > > > > > > can specify the TTL column.
> > > > > > > > > > > >
> > > > > > > > > > > > One potential approach that comes to mind is using a
> > > > > > standardized
> > > > > > > > > > naming
> > > > > > > > > > > > convention such as ${state-name}_ttl for the metadata
> > > > column
> > > > > > that
> > > > > > > > > > defines
> > > > > > > > > > > > the TTL value. In terms of implementation, the
> > > > > > > listReadableMetadata
> > > > > > > > > > > > function could:
> > > > > > > > > > > >
> > > > > > > > > > > > 1. Read the table’s columns and configuration,
> > > > > > > > > > > > 2. Extract all defined state names, and
> > > > > > > > > > > > 3. Return a structured list of metadata entries
> > formatted
> > > > as
> > > > > > > > > > > > ${state-name}_ttl.
> > > > > > > > > > > >
> > > > > > > > > > > > WDYT?
> > > > > > > > > > > >
> > > > > > > > > > > > > 2. Adding a new connector with `savepoint-metadata`
> > > > > > > > > > > >
> > > > > > > > > > > > Introducing a new connector type at this stage may
> > > > > > unnecessarily
> > > > > > > > > > > complicate
> > > > > > > > > > > > the system. Given that every table already belongs
> to a
> > > > > > Catalog,
> > > > > > > > > which
> > > > > > > > > > is
> > > > > > > > > > > > designed to provide a Factory for building source or
> > sink
> > > > > > > > > connectors, I
> > > > > > > > > > > > propose integrating a dedicated StateCatalog instead.
> > > This
> > > > > > > approach
> > > > > > > > > > would
> > > > > > > > > > > > allow us to:
> > > > > > > > > > > >
> > > > > > > > > > > > 1. Leverage the Catalog’s existing capabilities to
> > manage
> > > > TTL
> > > > > > > > > metadata
> > > > > > > > > > > > (e.g., state names and TTL logic) without duplicating
> > > > > > > > functionality.
> > > > > > > > > > > > 2. Provide a unified interface for connector
> > > instantiation
> > > > > and
> > > > > > > > > metadata
> > > > > > > > > > > > handling through the Catalog’s Factory pattern.
> > > > > > > > > > > >
> > > > > > > > > > > > Would this design decision better align with our
> > > > > architecture’s
> > > > > > > > > > > > extensibility and reduce redundancy?
> > > > > > > > > > > >
> > > > > > > > > > > > > Up until now we've seen even in TB savepoints that
> > the
> > > > > number
> > > > > > > of
> > > > > > > > > keys
> > > > > > > > > > > can
> > > > > > > > > > > > > be extremely huge but not the per key state itself.
> > > > > > > > > > > > > But again, this is a good feature as-is and can be
> > > > handled
> > > > > > in a
> > > > > > > > > > > separate
> > > > > > > > > > > > > jira.
> > > > > > > > > > > >
> > > > > > > > > > > > +1 for a separate jira.
> > > > > > > > > > > >
> > > > > > > > > > > > Best,
> > > > > > > > > > > > Shengkai
> > > > > > > > > > > >
> > > > > > > > > > > > Gabor Somogyi <[email protected]>
> > 于2025年3月10日周一
> > > > > > 19:05写道：
> > > > > > > > > > > >
> > > > > > > > > > > > > Hi Shengkai,
> > > > > > > > > > > > >
> > > > > > > > > > > > > Please see my comments inline.
> > > > > > > > > > > > >
> > > > > > > > > > > > > BR,
> > > > > > > > > > > > > G
> > > > > > > > > > > > >
> > > > > > > > > > > > >
> > > > > > > > > > > > > On Mon, Mar 3, 2025 at 7:07 AM Shengkai Fang <
> > > > > > > [email protected]>
> > > > > > > > > > > wrote:
> > > > > > > > > > > > >
> > > > > > > > > > > > > > Hi, Gabor. Thanks for your the FLIP. I have some
> > > > > questions
> > > > > > > > about
> > > > > > > > > > the
> > > > > > > > > > > > > FLIP:
> > > > > > > > > > > > > >
> > > > > > > > > > > > > > 1. State TTL for Value Columns
> > > > > > > > > > > > > > How can users retrieve the state TTL
> (Time-to-Live)
> > > for
> > > > > > each
> > > > > > > > > value
> > > > > > > > > > > > > column?
> > > > > > > > > > > > > > From my understanding of the current design, it
> > seems
> > > > > that
> > > > > > > this
> > > > > > > > > > > > > > functionality is not supported. Could you clarify
> > if
> > > > > there
> > > > > > > are
> > > > > > > > > > plans
> > > > > > > > > > > to
> > > > > > > > > > > > > > address this limitation?
> > > > > > > > > > > > > >
> > > > > > > > > > > > >
> > > > > > > > > > > > > Since the state processor API is not yet exposing
> > this
> > > > > > > > information
> > > > > > > > > > this
> > > > > > > > > > > > > would require several steps.
> > > > > > > > > > > > > First, the state processor API support needs to be
> > > added
> > > > > > which
> > > > > > > > can
> > > > > > > > > be
> > > > > > > > > > > > then
> > > > > > > > > > > > > exposed on the SQL API.
> > > > > > > > > > > > > This is definitely a future improvement which is
> > useful
> > > > and
> > > > > > can
> > > > > > > > be
> > > > > > > > > > > > handled
> > > > > > > > > > > > > in a separate jira.
> > > > > > > > > > > > >
> > > > > > > > > > > > >
> > > > > > > > > > > > > > 2. Metadata Table vs. Metadata Column
> > > > > > > > > > > > > > The metadata information described in the FLIP
> > > appears
> > > > to
> > > > > > be
> > > > > > > > > > intended
> > > > > > > > > > > > to
> > > > > > > > > > > > > > describe the state files stored at a specific
> > > location.
> > > > > To
> > > > > > > me,
> > > > > > > > > this
> > > > > > > > > > > > > concept
> > > > > > > > > > > > > > aligns more closely with system tables like
> > pg_tables
> > > > in
> > > > > > > > > PostgreSQL
> > > > > > > > > > > [1]
> > > > > > > > > > > > > or
> > > > > > > > > > > > > > the INFORMATION_SCHEMA in MySQL [2].
> > > > > > > > > > > > > >
> > > > > > > > > > > > >
> > > > > > > > > > > > > Adding a new connector with `savepoint-metadata`
> is a
> > > > > > > possibility
> > > > > > > > > > where
> > > > > > > > > > > > we
> > > > > > > > > > > > > can create such functionality.
> > > > > > > > > > > > > I'm not against that, just want to have a common
> > > > agreement
> > > > > > that
> > > > > > > > we
> > > > > > > > > > > would
> > > > > > > > > > > > > like to move that direction.
> > > > > > > > > > > > > (As a side note not just PG but Spark also has
> > similar
> > > > > > approach
> > > > > > > > > and I
> > > > > > > > > > > > > basically like the idea).
> > > > > > > > > > > > > If we would go that direction savepoint metadata
> can
> > be
> > > > > > reached
> > > > > > > > in
> > > > > > > > > a
> > > > > > > > > > > way
> > > > > > > > > > > > > that one row would represent
> > > > > > > > > > > > > an operator with it's values something like this:
> > > > > > > > > > > > >
> > > > > > > > > > > > >
> > > > > > > > > > > > >
> > > > > > > > > > > >
> > > > > > > > > > >
> > > > > > > > > >
> > > > > > > > >
> > > > > > > >
> > > > > > >
> > > > > >
> > > > >
> > > >
> > >
> >
> ┌─────────┬─────────┬─────────┬─────────┬─────────┬─────────┬─────────┬────────┐
> > > > > > > > > > > > >
> > > > > > > > > > > > >
> > > > > > > > > > > >
> > > > > > > > > > >
> > > > > > > > > >
> > > > > > > > >
> > > > > > > >
> > > > > > >
> > > > > >
> > > > >
> > > >
> > >
> >
> │operatorN│operatorU│operatorH│paralleli│maxParall│subtaskSt│coordinat│totalSta│
> > > > > > > > > > > > > │ame      │id       │ash      │sm       │elism
> > > > > > > > > > > > > │atesCount│orStateSi│tesSizeI│
> > > > > > > > > > > > > │         │         │         │         │         │
> > > > > > > > > > > > >  │zeInBytes│nBytes  │
> > > > > > > > > > > > >
> > > > > > > > > > > > >
> > > > > > > > > > > >
> > > > > > > > > > >
> > > > > > > > > >
> > > > > > > > >
> > > > > > > >
> > > > > > >
> > > > > >
> > > > >
> > > >
> > >
> >
> ├─────────┼─────────┼─────────┼─────────┼─────────┼─────────┼─────────┼────────┤
> > > > > > > > > > > > > │Source:  │datagen-s│47aee9439│2        │128
> │2
> > > > > > │16
> > > > > > > > > > > > >  │546     │
> > > > > > > > > > > > > │datagen-s│ource-uid│4d6ea26e2│         │         │
> > > > >  │
> > > > > > > > > >  │
> > > > > > > > > > > > >     │
> > > > > > > > > > > > > │ource    │         │d544bef0a│         │         │
> > > > >  │
> > > > > > > > > >  │
> > > > > > > > > > > > >     │
> > > > > > > > > > > > > │         │         │37bb5    │         │         │
> > > > >  │
> > > > > > > > > >  │
> > > > > > > > > > > > >     │
> > > > > > > > > > > > >
> > > > > > > > > > > > >
> > > > > > > > > > > >
> > > > > > > > > > >
> > > > > > > > > >
> > > > > > > > >
> > > > > > > >
> > > > > > >
> > > > > >
> > > > >
> > > >
> > >
> >
> ├─────────┼─────────┼─────────┼─────────┼─────────┼─────────┼─────────┼────────┤
> > > > > > > > > > > > > │long-udf-│long-udf-│6ed3f40bf│2        │128
> │2
> > > > > > │0
> > > > > > > > > > > │0
> > > > > > > > > > > > >      │
> > > > > > > > > > > > > │with-mast│with-mast│f3c8dfcdf│         │         │
> > > > >  │
> > > > > > > > > >  │
> > > > > > > > > > > > >     │
> > > > > > > > > > > > > │er-hook  │er-hook-u│cb95128a1│         │         │
> > > > >  │
> > > > > > > > > >  │
> > > > > > > > > > > > >     │
> > > > > > > > > > > > > │         │id       │018f1    │         │         │
> > > > >  │
> > > > > > > > > >  │
> > > > > > > > > > > > >     │
> > > > > > > > > > > > >
> > > > > > > > > > > > >
> > > > > > > > > > > >
> > > > > > > > > > >
> > > > > > > > > >
> > > > > > > > >
> > > > > > > >
> > > > > > >
> > > > > >
> > > > >
> > > >
> > >
> >
> ├─────────┼─────────┼─────────┼─────────┼─────────┼─────────┼─────────┼────────┤
> > > > > > > > > > > > > │value-pro│value-pro│ca4f5fe9a│2        │128
> │2
> > > > > > │0
> > > > > > > > > > > > > │40726   │
> > > > > > > > > > > > > │cess     │cess-uid │637b656f0│         │         │
> > > > >  │
> > > > > > > > > >  │
> > > > > > > > > > > > >     │
> > > > > > > > > > > > > │         │         │9ea78b3e7│         │         │
> > > > >  │
> > > > > > > > > >  │
> > > > > > > > > > > > >     │
> > > > > > > > > > > > > │         │         │a15b9    │         │         │
> > > > >  │
> > > > > > > > > >  │
> > > > > > > > > > > > >     │
> > > > > > > > > > > > >
> > > > > > > > > > > > >
> > > > > > > > > > > >
> > > > > > > > > > >
> > > > > > > > > >
> > > > > > > > >
> > > > > > > >
> > > > > > >
> > > > > >
> > > > >
> > > >
> > >
> >
> ├─────────┼─────────┼─────────┼─────────┼─────────┼─────────┼─────────┼────────┤
> > > > > > > > > > > > >
> > > > > > > > > > > > > This table can then be joined with the actually
> > > existing
> > > > > > > > > `savepoint`
> > > > > > > > > > > > > connector created tables based on UID hash (which
> is
> > > > unique
> > > > > > and
> > > > > > > > > > always
> > > > > > > > > > > > > exists).
> > > > > > > > > > > > > This would mean that the already existing table
> would
> > > > need
> > > > > > > only a
> > > > > > > > > > > single
> > > > > > > > > > > > > metadata column which is the UID hash.
> > > > > > > > > > > > > WDYT?
> > > > > > > > > > > > > @zakelly, plz share your thoughts too.
> > > > > > > > > > > > >
> > > > > > > > > > > > >
> > > > > > > > > > > > > > If we opt to use metadata columns, every record
> in
> > > the
> > > > > > table
> > > > > > > > > would
> > > > > > > > > > > end
> > > > > > > > > > > > up
> > > > > > > > > > > > > > having identical values for these columns (please
> > > > correct
> > > > > > me
> > > > > > > if
> > > > > > > > > I’m
> > > > > > > > > > > > > > mistaken). On the other hand, the state connector
> > > > > requires
> > > > > > > > users
> > > > > > > > > to
> > > > > > > > > > > > > specify
> > > > > > > > > > > > > > an operator UID or operator UID hash, after which
> > it
> > > > > > outputs
> > > > > > > > > > > > user-defined
> > > > > > > > > > > > > > values in its records. This approach feels
> somewhat
> > > > > > redundant
> > > > > > > > to
> > > > > > > > > > me.
> > > > > > > > > > > > > >
> > > > > > > > > > > > >
> > > > > > > > > > > > > If we would add a new `savepoint-metadata`
> connector
> > > then
> > > > > > this
> > > > > > > > can
> > > > > > > > > be
> > > > > > > > > > > > > addressed.
> > > > > > > > > > > > > On the other hand UID and UID hash are having
> > either-or
> > > > > > > > > relationship
> > > > > > > > > > > from
> > > > > > > > > > > > > config perspective,
> > > > > > > > > > > > > so when a user provides the UID then he/she can be
> > > > > interested
> > > > > > > in
> > > > > > > > > the
> > > > > > > > > > > hash
> > > > > > > > > > > > > for further calculations
> > > > > > > > > > > > > (the whole Flink internals are depending on the
> > hash).
> > > > > > Printing
> > > > > > > > out
> > > > > > > > > > the
> > > > > > > > > > > > > human readable UID
> > > > > > > > > > > > > is an explicit requirement from the user side
> because
> > > > > hashes
> > > > > > > are
> > > > > > > > > not
> > > > > > > > > > > > human
> > > > > > > > > > > > > readable.
> > > > > > > > > > > > >
> > > > > > > > > > > > >
> > > > > > > > > > > > > > 3. Handling LIST and MAP States in the State
> > > Connector
> > > > > > > > > > > > > > I have concerns about how the current design
> > handles
> > > > LIST
> > > > > > and
> > > > > > > > MAP
> > > > > > > > > > > > states.
> > > > > > > > > > > > > > Specifically, the state connector uses Flink
> SQL’s
> > > MAP
> > > > > and
> > > > > > > > ARRAY
> > > > > > > > > > > types,
> > > > > > > > > > > > > > which implies that it attempts to load entire MAP
> > or
> > > > LIST
> > > > > > > > states
> > > > > > > > > > into
> > > > > > > > > > > > > > memory.
> > > > > > > > > > > > > >
> > > > > > > > > > > > > > However, in many real-world scenarios, these
> states
> > > can
> > > > > > grow
> > > > > > > > very
> > > > > > > > > > > > large.
> > > > > > > > > > > > > > Typically, the state API addresses this by
> > providing
> > > an
> > > > > > > > iterator
> > > > > > > > > to
> > > > > > > > > > > > > > traverse elements within the state incrementally.
> > I’m
> > > > > > unsure
> > > > > > > > > > whether
> > > > > > > > > > > > I’ve
> > > > > > > > > > > > > > missed something in FLIP-496 or FLIP-512, but it
> > > seems
> > > > > that
> > > > > > > the
> > > > > > > > > > > current
> > > > > > > > > > > > > > design might struggle with scalability in such
> > cases.
> > > > > > > > > > > > > >
> > > > > > > > > > > > >
> > > > > > > > > > > > > You see it good, the current implementation keeps
> > state
> > > > > for a
> > > > > > > > > single
> > > > > > > > > > > key
> > > > > > > > > > > > in
> > > > > > > > > > > > > memory.
> > > > > > > > > > > > > Back in the days we've considered this potential
> > issue
> > > > and
> > > > > > > > > concluded
> > > > > > > > > > > that
> > > > > > > > > > > > > this is not necessarily
> > > > > > > > > > > > > needed for the initial version and can be done as a
> > > later
> > > > > > > > > > improvement.
> > > > > > > > > > > > >
> > > > > > > > > > > > > Up until now we've seen even in TB savepoints that
> > the
> > > > > number
> > > > > > > of
> > > > > > > > > keys
> > > > > > > > > > > can
> > > > > > > > > > > > > be extremely huge but not the per key state itself.
> > > > > > > > > > > > > But again, this is a good feature as-is and can be
> > > > handled
> > > > > > in a
> > > > > > > > > > > separate
> > > > > > > > > > > > > jira.
> > > > > > > > > > > > >
> > > > > > > > > > > > >
> > > > > > > > > > > > > >
> > > > > > > > > > > > > > Best,
> > > > > > > > > > > > > > Shengkai
> > > > > > > > > > > > > >
> > > > > > > > > > > > > > [1]
> > > > > > > > https://www.postgresql.org/docs/current/view-pg-tables.html
> > > > > > > > > > > > > > [2]
> > > > > > > > > > > > > >
> > > > > > > > > > > > > >
> > > > > > > > > > > > >
> > > > > > > > > > > >
> > > > > > > > > > >
> > > > > > > > > >
> > > > > > > > >
> > > > > > > >
> > > > > > >
> > > > > >
> > > > >
> > > >
> > >
> >
> https://dev.mysql.com/doc/refman/8.4/en/information-schema-tables-table.html
> > > > > > > > > > > > > >
> > > > > > > > > > > > > > Gabor Somogyi <[email protected]>
> > > 于2025年3月3日周一
> > > > > > > > 02:00写道：
> > > > > > > > > > > > > >
> > > > > > > > > > > > > > > Hi Zakelly,
> > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > In order to shoot for simplicity `METADATA
> > VIRTUAL`
> > > > as
> > > > > > key
> > > > > > > > > words
> > > > > > > > > > > for
> > > > > > > > > > > > > > > definition is the target.
> > > > > > > > > > > > > > > When it's not super complex the latter can be
> > added
> > > > > too.
> > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > BR,
> > > > > > > > > > > > > > > G
> > > > > > > > > > > > > > >
> > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > On Sun, Mar 2, 2025 at 3:37 PM Zakelly Lan <
> > > > > > > > > > [email protected]>
> > > > > > > > > > > > > > wrote:
> > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > Hi Gabor,
> > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > +1 for this.
> > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > Will the metadata column use `METADATA
> VIRTUAL`
> > > as
> > > > > key
> > > > > > > > words
> > > > > > > > > > for
> > > > > > > > > > > > > > > > definition, or `METADATA FROM xxx VIRTUAL`
> for
> > > > > > renaming,
> > > > > > > > just
> > > > > > > > > > > like
> > > > > > > > > > > > > the
> > > > > > > > > > > > > > > > Kafka table?
> > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > Best,
> > > > > > > > > > > > > > > > Zakelly
> > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > On Sat, Mar 1, 2025 at 1:31 PM Gabor Somogyi
> <
> > > > > > > > > > > > > > [email protected]>
> > > > > > > > > > > > > > > > wrote:
> > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > > Hi All,
> > > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > > I'd like to start a discussion of FLIP-512:
> > Add
> > > > > meta
> > > > > > > > > > > information
> > > > > > > > > > > > to
> > > > > > > > > > > > > > SQL
> > > > > > > > > > > > > > > > > state connector [1].
> > > > > > > > > > > > > > > > > Feel free to add your thoughts to make this
> > > > feature
> > > > > > > > better.
> > > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > > [1]
> > > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > >
> > > > > > > > > > > > > >
> > > > > > > > > > > > >
> > > > > > > > > > > >
> > > > > > > > > > >
> > > > > > > > > >
> > > > > > > > >
> > > > > > > >
> > > > > > >
> > > > > >
> > > > >
> > > >
> > >
> >
> https://cwiki.apache.org/confluence/display/FLINK/FLIP-512%3A+Add+meta+information+to+SQL+state+connector
> > > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > > BR,
> > > > > > > > > > > > > > > > > G
> > > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > >
> > > > > > > > > > > > > >
> > > > > > > > > > > > >
> > > > > > > > > > > >
> > > > > > > > > > >
> > > > > > > > > >
> > > > > > > > >
> > > > > > > >
> > > > > > >
> > > > > >
> > > > >
> > > >
> > >
> >
>

Re: [DISCUSS] FLIP-512: Add meta information to SQL state connector

Reply via email to