Hi, All.

About State Catalog, I want to share more thoughts about this.

In the initial design concept, I understood that a savepoint and a state
catalog have a one-to-one mapping relationship. Each operator corresponds
to a database, and the state of each operator is represented as individual
tables. The rationale behind this design is:

*State Diversity*: An operator may involve multiple types of states. For
example, in our VVR design, a "multi-join" operator uses keyed states for
two input streams and a broadcast state for the third stream. This makes it
challenging to represent all states of an operator within a single table.
*Scalability*: Internally, an operator might have multiple keyed states
(e.g., value state and list state). However, large list states may not fit
entirely in memory. To address this, we recommend implementing each state
as a separate table.

To resolve the loosely coupled relationships between operator states, we
propose embedding predefined views within the catalog. These views simplify
user understanding of operator implementations and provide a more intuitive
perspective. For instance, a join operator may have multiple state
implementations (depending on whether the join key includes unique
attributes), but users primarily care about the data associated with a
specific join key across input streams.

Returning to the one-to-one mapping between savepoints and catalogs, we aim
to manage multiple user state catalogs through a catalog store. When a user
triggers a savepoint for a job on the platform:

1. The platform sends a REST request to the JobManager.
2. Simultaneously, it registers a new state catalog in the catalog store,
enabling immediate analysis of state data on the platform.
3. Deleting a savepoint would also trigger the removal of its associated
catalog.

This vision assumes that states are self-describing or that a state
metaservice is introduced to analyze savepoint structures.

> How can users create logic to identify differences between multiple
savepoints?

Since savepoints and state catalogs are one-to-one mapped, users can query
metadata via their respective catalogs. For example:

1. `savepoint-${id}`.`system`.`metadata_table`.`<operator-name>` provides
operator-specific metadata (e.g., state size, type).
2. Comparing metadata tables (e.g., schema versions, state entry counts)
across catalogs reveals structural or quantitative differences.
3. For deeper analysis, users could write SQL queries to compare specific
state partitions or leverage the metaservice to track state evolution
(e.g., added/removed operators, modified state configurations).

If we plan to introduce a state catalog in the future, I would lean toward
using metadata tables. If a utility tool can address the challenges we
face, could we avoid introducing an additional connector?

Best,
Shengkai

Gyula Fóra <gyula.f...@gmail.com> 于2025年3月17日周一 20:25写道:

> Hi All!
>
> Without going into too much detail here are my 2 cents regarding the
> virtual column / catalog metadata / table (connector) discussion for the
> State metadata.
>
> State metadata such as the types of states, their properties, names, sizes
> etc are all valuable information that can be used to enrich the
> computations we do on state.
> We can either analyze it standalone (such as discover anomalies, for large
> jobs with many states), across multiple savepoints (discover how state
> changed over time) or by joining it with keyed or non-keyed state data to
> serve more complex queries on the state.
>
> The only solution that seems to serve all these use-cases and requirements
> in a straightforward and SQL canonical way is to simply expose the state
> metadata as a separate table. This is a metadata table but you can also
> think of it as data table, it makes no practical difference here.
>
> Once we have a catalog later, the catalog can offer this table out of the
> box, the same way databases provide metadata tables. For this to work
> however we need another, simpler connector that creates this table.
>
> +1 for state metadata as a separate connector/table, instead of adding
> virtual columns and adhoc catalog metadata that is hard to use in a large
> number of queries.
>
> Cheers,
> Gyula
>
> On Mon, Mar 17, 2025 at 12:44 PM Gabor Somogyi <gabor.g.somo...@gmail.com>
> wrote:
>
> > 1. State TTL for Value Columns
> >
> > > I’m planning on adding this, and we may collaborate on it in the
> future.
> >
> > +1 on this, just ping me.
> >
> > 2. Metadata Table vs. Metadata Column
> >
> > After some code digging and POC all I can say that with heavy effort we
> can
> > maybe add such changes that we're able to show metadata of a savepoint
> from
> > catalog.
> > I'm not against that but from user perspective this has limited value,
> let
> > me explain why.
> >
> > From high level perspective I see the following which I see agreement on:
> > * We should have a catalog which is representing one or more jobs
> savepoint
> > data set (future plan)
> > * Savepoints should be able to be registered in the catalog which are
> then
> > databases (future plan)
> > * There must be a possiblity to create tables from databases where users
> > can read state data (exists already)
> >
> > In terms of metadata, If I understand correctly then the suggested
> approach
> > would be to access
> > it from the catalog describe command, right? Adding that info when
> specific
> > database describe command
> > is executed could be done.
> >
> > The question is for instance how can users create such a logic that tells
> > them what is
> > the difference between multiple savepoints?
> > Just to give some examples:
> > * per operator size changes between savepoints
> > * show values from operator data where state size reaches a boundary
> > * in general "find which checkpoint ruined things" is quite common
> pattern
> > What I would like to highlight here is that from Flink point of view the
> > metadata can be
> > considered as a static side output information but for users these values
> > are actual real data
> > where logic is planned to build around.
> >
> > > The metadata is more like one-time information instead of a streaming
> > data that changes all
> > the time, so a single connector seems to be an overkill.
> >
> > State data is also static within a savepoint and that's the reason why
> the
> > state processor API is working in batch mode.
> > When we handle multiple checkpoints in a streaming fashion then this can
> be
> > viewed from another angle.
> >
> > We can come up with more lightweight solution other than a new connector
> > but enforcing users to parse the catalog
> > describe command output in order to compare multiple savepoints doesn't
> > sound smooth user experience.
> > Honestly I've no other idea how exposing metadata as real user data so
> > waiting on other approaches.
> >
> > BR,
> > G
> >
> >
> > On Thu, Mar 13, 2025 at 2:44 AM Shengkai Fang <fskm...@gmail.com> wrote:
> >
> > > Looking forward to hearing the good news!
> > >
> > > Best,
> > > Shengkai
> > >
> > > Gabor Somogyi <gabor.g.somo...@gmail.com> 于2025年3月12日周三 22:24写道:
> > >
> > > > Thanks for both the valuable input!
> > > >
> > > > Let me take a closer look at the suggestions, like the Catalog
> > > capabilities
> > > > and possibility of embedding TypeInformation or
> > > > StateDescriptor metadata directly into the raw state files...
> > > >
> > > > BR,
> > > > G
> > > >
> > > >
> > > > On Wed, Mar 12, 2025 at 8:17 AM Shengkai Fang <fskm...@gmail.com>
> > wrote:
> > > >
> > > > > Thanks for Zakelly's clarification.
> > > > >
> > > > > 1. State TTL for Value Columns
> > > > >
> > > > > +1 to delay the discussion about this.
> > > > >
> > > > > 2. Metadata Table vs. Metadata Column
> > > > >
> > > > > I’d like to share my perspective on the State Catalog proposal.
> While
> > > > > introducing this capability is beneficial, there is a blocker: the
> > > > current
> > > > > StateBackend architecture does not permit operators to encode
> > > > > TypeInformation into the state—it only preserves the Serializer.
> This
> > > > > limitation creates an asymmetry, as operators alone retain
> knowledge
> > of
> > > > the
> > > > > data structure’s schema.
> > > > >
> > > > > To address this, I suggest allowing operators to embed
> > TypeInformation
> > > or
> > > > > StateDescriptor metadata directly into the raw state files. Such a
> > > design
> > > > > would enable the Catalog to:
> > > > >
> > > > > 1. Parse state files and programmatically derive the schema and
> > > > structural
> > > > > guarantees for each state.
> > > > > 2. Leverage existing Flink Table utilities, such as
> > > > > LegacyTypeInfoDataTypeConverter (in
> > > org.apache.flink.table.types.utils),
> > > > to
> > > > > bridge TypeInformation and DataType conversions.
> > > > >
> > > > > If we can not store the TypeInformation or StateDescriptor into the
> > raw
> > > > > state files, I am +1 for this FLIP to use metadata column to
> retrieve
> > > > > information.
> > > > >
> > > > > Best,
> > > > > Shengkai
> > > > >
> > > > >
> > > > >
> > > > >
> > > > > Zakelly Lan <zakelly....@gmail.com> 于2025年3月12日周三 12:43写道:
> > > > >
> > > > > > Hi Gabor and Shengkai,
> > > > > >
> > > > > > Thanks for sharing your thoughts! This is a long discussion and
> > sorry
> > > > for
> > > > > > the late reply (I'm busy catching up with release 2.0 these
> days).
> > > > > >
> > > > > > 1. State TTL for Value Columns
> > > > > >
> > > > > >
> > > > > > Let me first clarify your thoughts to ensure I understand
> > correctly.
> > > > > IIUC,
> > > > > > there is no persistent configuration for state TTL in the
> > checkpoint.
> > > > > While
> > > > > > you can infer that TTL is enabled by reading the serializer, the
> > > > > checkpoint
> > > > > > itself only stores the last access time for each value. So the
> only
> > > > thing
> > > > > > we can show is the last access time for each value. But it is not
> > > > > required
> > > > > > for all state backends to store this, as they may directly store
> > the
> > > > > > expired time. This will also increase the difficulty of
> > > implementation
> > > > &
> > > > > > maintenance.
> > > > > >
> > > > > > This once again reiterates the importance of unified metadata for
> > > > > > checkpoints. I’m planning on adding this, and we may collaborate
> on
> > > it
> > > > in
> > > > > > the future.
> > > > > >
> > > > > > 2. Metadata Table vs. Metadata Column
> > > > > >
> > > > > >
> > > > > > I'm not in favor of adding a new connector for metadata. The
> > metadata
> > > > is
> > > > > > more like one-time information instead of a streaming data that
> > > changes
> > > > > all
> > > > > > the time, so a single connector seems to be an overkill. It is
> not
> > > easy
> > > > > to
> > > > > > withdraw a connector if we have a better solution in future. I'm
> > not
> > > > > > familiar with current Catalog capabilities, and if it could
> extract
> > > and
> > > > > > show some operator-level information from savepoint, that would
> be
> > > > great.
> > > > > >
> > > > > > If the Catalog can't do that, I would consider the current FLIP
> to
> > > be a
> > > > > > compromise solution.
> > > > > >
> > > > > > And if we have that unified metadata for checkpoint/savepoint in
> > > > future,
> > > > > we
> > > > > > may directly register savepoint in catalog, and create a source
> > > without
> > > > > > specifying complex columns, as well as describe the savepoint
> > catalog
> > > > to
> > > > > > get the metadata. That's a good solution in my mind.
> > > > > >
> > > > > >
> > > > > > Best,
> > > > > > Zakelly
> > > > > >
> > > > > > On Wed, Mar 12, 2025 at 10:35 AM Shengkai Fang <
> fskm...@gmail.com>
> > > > > wrote:
> > > > > >
> > > > > > > Hi Gabor,
> > > > > > >
> > > > > > > > 2. Adding a new connector with `savepoint-metadata`
> > > > > > >
> > > > > > > I would argue against introducing a new connector type named
> > > > > > > savepoint-metadata, as the existing Catalog mechanism can
> > > inherently
> > > > > > > provide the necessary connector factory capabilities. I’ve
> > detailed
> > > > > this
> > > > > > > proposal in branch[1]. Please take a moment to review it.
> > > > > > >
> > > > > > > If we introduce a connector named `savepoint-metadata`, it
> means
> > > user
> > > > > can
> > > > > > > create a temporary table with connector `savepoint-metadata`
> and
> > > the
> > > > > > > connector needs to check whether table schema is same to the
> > schema
> > > > we
> > > > > > > proposed in the FLIP. On the other hand, it's not easy work for
> > > > others
> > > > > to
> > > > > > > users a metadata table with same schema.
> > > > > > >
> > > > > > > [1]
> > > > > > >
> > > > > > >
> > > > > >
> > > > >
> > > >
> > >
> >
> https://github.com/apache/flink/compare/master...fsk119:flink:state-metadata?expand=1#diff-712a7bc92fe46c405fb0e61b475bb2a005cb7a72bab7df28bbb92744bcb5f465R63
> > > > > > >
> > > > > > > Best,
> > > > > > > Shengkai
> > > > > > >
> > > > > > > Gabor Somogyi <gabor.g.somo...@gmail.com> 于2025年3月11日周二
> 16:56写道:
> > > > > > >
> > > > > > > > Hi Shengkai,
> > > > > > > >
> > > > > > > > > 1. State TTL for Value Columns
> > > > > > > >
> > > > > > > > From directional perspective I agree your idea how it can be
> > > > > > implemented.
> > > > > > > > Previously I've mentioned that TTL information is not exposed
> > on
> > > > the
> > > > > > > state
> > > > > > > > processor API (which the SQL state connector uses to read
> data)
> > > > > > > > and unless somebody show me the opposite this FLIP is not
> going
> > > to
> > > > > > > address
> > > > > > > > this to avoid feature creep. Our users are also interested in
> > TTL
> > > > so
> > > > > > > > sooner or later we're going to expose it, this is matter of
> > > > > scheduling.
> > > > > > > >
> > > > > > > > > 2. Adding a new connector with `savepoint-metadata`
> > > > > > > >
> > > > > > > > Not sure I understand your point at all related StateCatalog.
> > > First
> > > > > of
> > > > > > > all
> > > > > > > > I can't agree more that StateCatalog is needed and is a
> planned
> > > > > > building
> > > > > > > > block in an upcoming
> > > > > > > > FLIP but not sure how can it help now? No matter what, your
> > > > knowledge
> > > > > > is
> > > > > > > > essential when we add StateCatalog. Let me expose my
> > > understanding
> > > > in
> > > > > > > this
> > > > > > > > area:
> > > > > > > > * First we need create table statements to access state data
> > and
> > > > > > metadata
> > > > > > > > * When we have that then we can add StateCatalog which could
> > > > > > potentially
> > > > > > > > ease the life of users by for ex. giving off-the-shelf tables
> > > > without
> > > > > > > > sweating with create table statements
> > > > > > > >
> > > > > > > > User expectations:
> > > > > > > > * See state data (this is fulfilled with the existing
> > connector)
> > > > > > > > * See metadata about state data like TTL (this can be added
> as
> > > > > metadata
> > > > > > > > column as you suggested since it belongs to the data)
> > > > > > > > * See metadata about operators (this can be added from
> > > > > > > savepoint-metadata)
> > > > > > > >
> > > > > > > > Important to highlight that state data table format differs
> > from
> > > > > state
> > > > > > > > metadata table format. Namely one table has rows for state
> > values
> > > > and
> > > > > > > > another has rows for operators, right?
> > > > > > > > I think that's the reason why you've pinpointed out that the
> > > > > suggested
> > > > > > > > metadata columns are somewhat clunky.
> > > > > > > >
> > > > > > > > As a conclusion I agree to add ${state-name}_ttl metadata
> > column
> > > > > later
> > > > > > on
> > > > > > > > since it belongs to the state value and adding a new table
> type
> > > > (like
> > > > > > you
> > > > > > > > suggested similar to PG [1])
> > > > > > > > for metadata. Please see how Spark does that too [2].
> > > > > > > >
> > > > > > > > If you have better approach then please elaborate with more
> > > details
> > > > > and
> > > > > > > > help me to understand your point.
> > > > > > > >
> > > > > > > > > Up until now we've seen even in TB savepoints that the
> number
> > > of
> > > > > keys
> > > > > > > can
> > > > > > > > > be extremely huge but not the per key state itself.
> > > > > > > > > But again, this is a good feature as-is and can be handled
> > in a
> > > > > > > separate
> > > > > > > > > jira.
> > > > > > > >
> > > > > > > > I've just created
> > > > https://issues.apache.org/jira/browse/FLINK-37456.
> > > > > > > >
> > > > > > > > [1]
> > https://www.postgresql.org/docs/current/view-pg-tables.html
> > > > > > > > [2]
> > > > > > > >
> > > > > > > >
> > > > > > >
> > > > > >
> > > > >
> > > >
> > >
> >
> https://www.databricks.com/blog/announcing-state-reader-api-new-statestore-data-source
> > > > > > > >
> > > > > > > > BR,
> > > > > > > > G
> > > > > > > >
> > > > > > > >
> > > > > > > > On Tue, Mar 11, 2025 at 3:55 AM Shengkai Fang <
> > fskm...@gmail.com
> > > >
> > > > > > wrote:
> > > > > > > >
> > > > > > > > > Hi, Gabor. Thanks for your response.
> > > > > > > > >
> > > > > > > > > > 1. State TTL for Value Columns
> > > > > > > > >
> > > > > > > > > Thank you for addressing the limitations here. However, I
> > > believe
> > > > > it
> > > > > > > > would
> > > > > > > > > be beneficial to further clarify the API in this FLIP
> > regarding
> > > > how
> > > > > > > users
> > > > > > > > > can specify the TTL column.
> > > > > > > > >
> > > > > > > > > One potential approach that comes to mind is using a
> > > standardized
> > > > > > > naming
> > > > > > > > > convention such as ${state-name}_ttl for the metadata
> column
> > > that
> > > > > > > defines
> > > > > > > > > the TTL value. In terms of implementation, the
> > > > listReadableMetadata
> > > > > > > > > function could:
> > > > > > > > >
> > > > > > > > > 1. Read the table’s columns and configuration,
> > > > > > > > > 2. Extract all defined state names, and
> > > > > > > > > 3. Return a structured list of metadata entries formatted
> as
> > > > > > > > > ${state-name}_ttl.
> > > > > > > > >
> > > > > > > > > WDYT?
> > > > > > > > >
> > > > > > > > > > 2. Adding a new connector with `savepoint-metadata`
> > > > > > > > >
> > > > > > > > > Introducing a new connector type at this stage may
> > > unnecessarily
> > > > > > > > complicate
> > > > > > > > > the system. Given that every table already belongs to a
> > > Catalog,
> > > > > > which
> > > > > > > is
> > > > > > > > > designed to provide a Factory for building source or sink
> > > > > > connectors, I
> > > > > > > > > propose integrating a dedicated StateCatalog instead. This
> > > > approach
> > > > > > > would
> > > > > > > > > allow us to:
> > > > > > > > >
> > > > > > > > > 1. Leverage the Catalog’s existing capabilities to manage
> TTL
> > > > > > metadata
> > > > > > > > > (e.g., state names and TTL logic) without duplicating
> > > > > functionality.
> > > > > > > > > 2. Provide a unified interface for connector instantiation
> > and
> > > > > > metadata
> > > > > > > > > handling through the Catalog’s Factory pattern.
> > > > > > > > >
> > > > > > > > > Would this design decision better align with our
> > architecture’s
> > > > > > > > > extensibility and reduce redundancy?
> > > > > > > > >
> > > > > > > > > > Up until now we've seen even in TB savepoints that the
> > number
> > > > of
> > > > > > keys
> > > > > > > > can
> > > > > > > > > > be extremely huge but not the per key state itself.
> > > > > > > > > > But again, this is a good feature as-is and can be
> handled
> > > in a
> > > > > > > > separate
> > > > > > > > > > jira.
> > > > > > > > >
> > > > > > > > > +1 for a separate jira.
> > > > > > > > >
> > > > > > > > > Best,
> > > > > > > > > Shengkai
> > > > > > > > >
> > > > > > > > > Gabor Somogyi <gabor.g.somo...@gmail.com> 于2025年3月10日周一
> > > 19:05写道:
> > > > > > > > >
> > > > > > > > > > Hi Shengkai,
> > > > > > > > > >
> > > > > > > > > > Please see my comments inline.
> > > > > > > > > >
> > > > > > > > > > BR,
> > > > > > > > > > G
> > > > > > > > > >
> > > > > > > > > >
> > > > > > > > > > On Mon, Mar 3, 2025 at 7:07 AM Shengkai Fang <
> > > > fskm...@gmail.com>
> > > > > > > > wrote:
> > > > > > > > > >
> > > > > > > > > > > Hi, Gabor. Thanks for your the FLIP. I have some
> > questions
> > > > > about
> > > > > > > the
> > > > > > > > > > FLIP:
> > > > > > > > > > >
> > > > > > > > > > > 1. State TTL for Value Columns
> > > > > > > > > > > How can users retrieve the state TTL (Time-to-Live) for
> > > each
> > > > > > value
> > > > > > > > > > column?
> > > > > > > > > > > From my understanding of the current design, it seems
> > that
> > > > this
> > > > > > > > > > > functionality is not supported. Could you clarify if
> > there
> > > > are
> > > > > > > plans
> > > > > > > > to
> > > > > > > > > > > address this limitation?
> > > > > > > > > > >
> > > > > > > > > >
> > > > > > > > > > Since the state processor API is not yet exposing this
> > > > > information
> > > > > > > this
> > > > > > > > > > would require several steps.
> > > > > > > > > > First, the state processor API support needs to be added
> > > which
> > > > > can
> > > > > > be
> > > > > > > > > then
> > > > > > > > > > exposed on the SQL API.
> > > > > > > > > > This is definitely a future improvement which is useful
> and
> > > can
> > > > > be
> > > > > > > > > handled
> > > > > > > > > > in a separate jira.
> > > > > > > > > >
> > > > > > > > > >
> > > > > > > > > > > 2. Metadata Table vs. Metadata Column
> > > > > > > > > > > The metadata information described in the FLIP appears
> to
> > > be
> > > > > > > intended
> > > > > > > > > to
> > > > > > > > > > > describe the state files stored at a specific location.
> > To
> > > > me,
> > > > > > this
> > > > > > > > > > concept
> > > > > > > > > > > aligns more closely with system tables like pg_tables
> in
> > > > > > PostgreSQL
> > > > > > > > [1]
> > > > > > > > > > or
> > > > > > > > > > > the INFORMATION_SCHEMA in MySQL [2].
> > > > > > > > > > >
> > > > > > > > > >
> > > > > > > > > > Adding a new connector with `savepoint-metadata` is a
> > > > possibility
> > > > > > > where
> > > > > > > > > we
> > > > > > > > > > can create such functionality.
> > > > > > > > > > I'm not against that, just want to have a common
> agreement
> > > that
> > > > > we
> > > > > > > > would
> > > > > > > > > > like to move that direction.
> > > > > > > > > > (As a side note not just PG but Spark also has similar
> > > approach
> > > > > > and I
> > > > > > > > > > basically like the idea).
> > > > > > > > > > If we would go that direction savepoint metadata can be
> > > reached
> > > > > in
> > > > > > a
> > > > > > > > way
> > > > > > > > > > that one row would represent
> > > > > > > > > > an operator with it's values something like this:
> > > > > > > > > >
> > > > > > > > > >
> > > > > > > > > >
> > > > > > > > >
> > > > > > > >
> > > > > > >
> > > > > >
> > > > >
> > > >
> > >
> >
> ┌─────────┬─────────┬─────────┬─────────┬─────────┬─────────┬─────────┬────────┐
> > > > > > > > > >
> > > > > > > > > >
> > > > > > > > >
> > > > > > > >
> > > > > > >
> > > > > >
> > > > >
> > > >
> > >
> >
> │operatorN│operatorU│operatorH│paralleli│maxParall│subtaskSt│coordinat│totalSta│
> > > > > > > > > > │ame      │id       │ash      │sm       │elism
> > > > > > > > > > │atesCount│orStateSi│tesSizeI│
> > > > > > > > > > │         │         │         │         │         │
> > > > > > > > > >  │zeInBytes│nBytes  │
> > > > > > > > > >
> > > > > > > > > >
> > > > > > > > >
> > > > > > > >
> > > > > > >
> > > > > >
> > > > >
> > > >
> > >
> >
> ├─────────┼─────────┼─────────┼─────────┼─────────┼─────────┼─────────┼────────┤
> > > > > > > > > > │Source:  │datagen-s│47aee9439│2        │128      │2
> > > │16
> > > > > > > > > >  │546     │
> > > > > > > > > > │datagen-s│ource-uid│4d6ea26e2│         │         │
> >  │
> > > > > > >  │
> > > > > > > > > >     │
> > > > > > > > > > │ource    │         │d544bef0a│         │         │
> >  │
> > > > > > >  │
> > > > > > > > > >     │
> > > > > > > > > > │         │         │37bb5    │         │         │
> >  │
> > > > > > >  │
> > > > > > > > > >     │
> > > > > > > > > >
> > > > > > > > > >
> > > > > > > > >
> > > > > > > >
> > > > > > >
> > > > > >
> > > > >
> > > >
> > >
> >
> ├─────────┼─────────┼─────────┼─────────┼─────────┼─────────┼─────────┼────────┤
> > > > > > > > > > │long-udf-│long-udf-│6ed3f40bf│2        │128      │2
> > > │0
> > > > > > > > │0
> > > > > > > > > >      │
> > > > > > > > > > │with-mast│with-mast│f3c8dfcdf│         │         │
> >  │
> > > > > > >  │
> > > > > > > > > >     │
> > > > > > > > > > │er-hook  │er-hook-u│cb95128a1│         │         │
> >  │
> > > > > > >  │
> > > > > > > > > >     │
> > > > > > > > > > │         │id       │018f1    │         │         │
> >  │
> > > > > > >  │
> > > > > > > > > >     │
> > > > > > > > > >
> > > > > > > > > >
> > > > > > > > >
> > > > > > > >
> > > > > > >
> > > > > >
> > > > >
> > > >
> > >
> >
> ├─────────┼─────────┼─────────┼─────────┼─────────┼─────────┼─────────┼────────┤
> > > > > > > > > > │value-pro│value-pro│ca4f5fe9a│2        │128      │2
> > > │0
> > > > > > > > > > │40726   │
> > > > > > > > > > │cess     │cess-uid │637b656f0│         │         │
> >  │
> > > > > > >  │
> > > > > > > > > >     │
> > > > > > > > > > │         │         │9ea78b3e7│         │         │
> >  │
> > > > > > >  │
> > > > > > > > > >     │
> > > > > > > > > > │         │         │a15b9    │         │         │
> >  │
> > > > > > >  │
> > > > > > > > > >     │
> > > > > > > > > >
> > > > > > > > > >
> > > > > > > > >
> > > > > > > >
> > > > > > >
> > > > > >
> > > > >
> > > >
> > >
> >
> ├─────────┼─────────┼─────────┼─────────┼─────────┼─────────┼─────────┼────────┤
> > > > > > > > > >
> > > > > > > > > > This table can then be joined with the actually existing
> > > > > > `savepoint`
> > > > > > > > > > connector created tables based on UID hash (which is
> unique
> > > and
> > > > > > > always
> > > > > > > > > > exists).
> > > > > > > > > > This would mean that the already existing table would
> need
> > > > only a
> > > > > > > > single
> > > > > > > > > > metadata column which is the UID hash.
> > > > > > > > > > WDYT?
> > > > > > > > > > @zakelly, plz share your thoughts too.
> > > > > > > > > >
> > > > > > > > > >
> > > > > > > > > > > If we opt to use metadata columns, every record in the
> > > table
> > > > > > would
> > > > > > > > end
> > > > > > > > > up
> > > > > > > > > > > having identical values for these columns (please
> correct
> > > me
> > > > if
> > > > > > I’m
> > > > > > > > > > > mistaken). On the other hand, the state connector
> > requires
> > > > > users
> > > > > > to
> > > > > > > > > > specify
> > > > > > > > > > > an operator UID or operator UID hash, after which it
> > > outputs
> > > > > > > > > user-defined
> > > > > > > > > > > values in its records. This approach feels somewhat
> > > redundant
> > > > > to
> > > > > > > me.
> > > > > > > > > > >
> > > > > > > > > >
> > > > > > > > > > If we would add a new `savepoint-metadata` connector then
> > > this
> > > > > can
> > > > > > be
> > > > > > > > > > addressed.
> > > > > > > > > > On the other hand UID and UID hash are having either-or
> > > > > > relationship
> > > > > > > > from
> > > > > > > > > > config perspective,
> > > > > > > > > > so when a user provides the UID then he/she can be
> > interested
> > > > in
> > > > > > the
> > > > > > > > hash
> > > > > > > > > > for further calculations
> > > > > > > > > > (the whole Flink internals are depending on the hash).
> > > Printing
> > > > > out
> > > > > > > the
> > > > > > > > > > human readable UID
> > > > > > > > > > is an explicit requirement from the user side because
> > hashes
> > > > are
> > > > > > not
> > > > > > > > > human
> > > > > > > > > > readable.
> > > > > > > > > >
> > > > > > > > > >
> > > > > > > > > > > 3. Handling LIST and MAP States in the State Connector
> > > > > > > > > > > I have concerns about how the current design handles
> LIST
> > > and
> > > > > MAP
> > > > > > > > > states.
> > > > > > > > > > > Specifically, the state connector uses Flink SQL’s MAP
> > and
> > > > > ARRAY
> > > > > > > > types,
> > > > > > > > > > > which implies that it attempts to load entire MAP or
> LIST
> > > > > states
> > > > > > > into
> > > > > > > > > > > memory.
> > > > > > > > > > >
> > > > > > > > > > > However, in many real-world scenarios, these states can
> > > grow
> > > > > very
> > > > > > > > > large.
> > > > > > > > > > > Typically, the state API addresses this by providing an
> > > > > iterator
> > > > > > to
> > > > > > > > > > > traverse elements within the state incrementally. I’m
> > > unsure
> > > > > > > whether
> > > > > > > > > I’ve
> > > > > > > > > > > missed something in FLIP-496 or FLIP-512, but it seems
> > that
> > > > the
> > > > > > > > current
> > > > > > > > > > > design might struggle with scalability in such cases.
> > > > > > > > > > >
> > > > > > > > > >
> > > > > > > > > > You see it good, the current implementation keeps state
> > for a
> > > > > > single
> > > > > > > > key
> > > > > > > > > in
> > > > > > > > > > memory.
> > > > > > > > > > Back in the days we've considered this potential issue
> and
> > > > > > concluded
> > > > > > > > that
> > > > > > > > > > this is not necessarily
> > > > > > > > > > needed for the initial version and can be done as a later
> > > > > > > improvement.
> > > > > > > > > >
> > > > > > > > > > Up until now we've seen even in TB savepoints that the
> > number
> > > > of
> > > > > > keys
> > > > > > > > can
> > > > > > > > > > be extremely huge but not the per key state itself.
> > > > > > > > > > But again, this is a good feature as-is and can be
> handled
> > > in a
> > > > > > > > separate
> > > > > > > > > > jira.
> > > > > > > > > >
> > > > > > > > > >
> > > > > > > > > > >
> > > > > > > > > > > Best,
> > > > > > > > > > > Shengkai
> > > > > > > > > > >
> > > > > > > > > > > [1]
> > > > > https://www.postgresql.org/docs/current/view-pg-tables.html
> > > > > > > > > > > [2]
> > > > > > > > > > >
> > > > > > > > > > >
> > > > > > > > > >
> > > > > > > > >
> > > > > > > >
> > > > > > >
> > > > > >
> > > > >
> > > >
> > >
> >
> https://dev.mysql.com/doc/refman/8.4/en/information-schema-tables-table.html
> > > > > > > > > > >
> > > > > > > > > > > Gabor Somogyi <gabor.g.somo...@gmail.com> 于2025年3月3日周一
> > > > > 02:00写道:
> > > > > > > > > > >
> > > > > > > > > > > > Hi Zakelly,
> > > > > > > > > > > >
> > > > > > > > > > > > In order to shoot for simplicity `METADATA VIRTUAL`
> as
> > > key
> > > > > > words
> > > > > > > > for
> > > > > > > > > > > > definition is the target.
> > > > > > > > > > > > When it's not super complex the latter can be added
> > too.
> > > > > > > > > > > >
> > > > > > > > > > > > BR,
> > > > > > > > > > > > G
> > > > > > > > > > > >
> > > > > > > > > > > >
> > > > > > > > > > > > On Sun, Mar 2, 2025 at 3:37 PM Zakelly Lan <
> > > > > > > zakelly....@gmail.com>
> > > > > > > > > > > wrote:
> > > > > > > > > > > >
> > > > > > > > > > > > > Hi Gabor,
> > > > > > > > > > > > >
> > > > > > > > > > > > > +1 for this.
> > > > > > > > > > > > >
> > > > > > > > > > > > > Will the metadata column use `METADATA VIRTUAL` as
> > key
> > > > > words
> > > > > > > for
> > > > > > > > > > > > > definition, or `METADATA FROM xxx VIRTUAL` for
> > > renaming,
> > > > > just
> > > > > > > > like
> > > > > > > > > > the
> > > > > > > > > > > > > Kafka table?
> > > > > > > > > > > > >
> > > > > > > > > > > > >
> > > > > > > > > > > > > Best,
> > > > > > > > > > > > > Zakelly
> > > > > > > > > > > > >
> > > > > > > > > > > > > On Sat, Mar 1, 2025 at 1:31 PM Gabor Somogyi <
> > > > > > > > > > > gabor.g.somo...@gmail.com>
> > > > > > > > > > > > > wrote:
> > > > > > > > > > > > >
> > > > > > > > > > > > > > Hi All,
> > > > > > > > > > > > > >
> > > > > > > > > > > > > > I'd like to start a discussion of FLIP-512: Add
> > meta
> > > > > > > > information
> > > > > > > > > to
> > > > > > > > > > > SQL
> > > > > > > > > > > > > > state connector [1].
> > > > > > > > > > > > > > Feel free to add your thoughts to make this
> feature
> > > > > better.
> > > > > > > > > > > > > >
> > > > > > > > > > > > > > [1]
> > > > > > > > > > > > > >
> > > > > > > > > > > > > >
> > > > > > > > > > > > >
> > > > > > > > > > > >
> > > > > > > > > > >
> > > > > > > > > >
> > > > > > > > >
> > > > > > > >
> > > > > > >
> > > > > >
> > > > >
> > > >
> > >
> >
> https://cwiki.apache.org/confluence/display/FLINK/FLIP-512%3A+Add+meta+information+to+SQL+state+connector
> > > > > > > > > > > > > >
> > > > > > > > > > > > > > BR,
> > > > > > > > > > > > > > G
> > > > > > > > > > > > > >
> > > > > > > > > > > > >
> > > > > > > > > > > >
> > > > > > > > > > >
> > > > > > > > > >
> > > > > > > > >
> > > > > > > >
> > > > > > >
> > > > > >
> > > > >
> > > >
> > >
> >
>

Reply via email to