Re: [DISCUSS] FLIP-512: Add meta information to SQL state connector

Shengkai Fang Wed, 12 Mar 2025 00:26:28 -0700

Thanks for Zakelly's clarification.

1. State TTL for Value Columns


+1 to delay the discussion about this.

2. Metadata Table vs. Metadata Column

I’d like to share my perspective on the State Catalog proposal. While
introducing this capability is beneficial, there is a blocker: the current
StateBackend architecture does not permit operators to encode
TypeInformation into the state—it only preserves the Serializer. This
limitation creates an asymmetry, as operators alone retain knowledge of the
data structure’s schema.

To address this, I suggest allowing operators to embed TypeInformation or
StateDescriptor metadata directly into the raw state files. Such a design
would enable the Catalog to:

1. Parse state files and programmatically derive the schema and structural
guarantees for each state.
2. Leverage existing Flink Table utilities, such as
LegacyTypeInfoDataTypeConverter (in org.apache.flink.table.types.utils), to
bridge TypeInformation and DataType conversions.

If we can not store the TypeInformation or StateDescriptor into the raw
state files, I am +1 for this FLIP to use metadata column to retrieve
information.

Best,
Shengkai




Zakelly Lan <[email protected]> 于2025年3月12日周三 12:43写道：

> Hi Gabor and Shengkai,
>
> Thanks for sharing your thoughts! This is a long discussion and sorry for
> the late reply (I'm busy catching up with release 2.0 these days).
>
> 1. State TTL for Value Columns
>
>
> Let me first clarify your thoughts to ensure I understand correctly. IIUC,
> there is no persistent configuration for state TTL in the checkpoint. While
> you can infer that TTL is enabled by reading the serializer, the checkpoint
> itself only stores the last access time for each value. So the only thing
> we can show is the last access time for each value. But it is not required
> for all state backends to store this, as they may directly store the
> expired time. This will also increase the difficulty of implementation &
> maintenance.
>
> This once again reiterates the importance of unified metadata for
> checkpoints. I’m planning on adding this, and we may collaborate on it in
> the future.
>
> 2. Metadata Table vs. Metadata Column
>
>
> I'm not in favor of adding a new connector for metadata. The metadata is
> more like one-time information instead of a streaming data that changes all
> the time, so a single connector seems to be an overkill. It is not easy to
> withdraw a connector if we have a better solution in future. I'm not
> familiar with current Catalog capabilities, and if it could extract and
> show some operator-level information from savepoint, that would be great.
>
> If the Catalog can't do that, I would consider the current FLIP to be a
> compromise solution.
>
> And if we have that unified metadata for checkpoint/savepoint in future, we
> may directly register savepoint in catalog, and create a source without
> specifying complex columns, as well as describe the savepoint catalog to
> get the metadata. That's a good solution in my mind.
>
>
> Best,
> Zakelly
>
> On Wed, Mar 12, 2025 at 10:35 AM Shengkai Fang <[email protected]> wrote:
>
> > Hi Gabor,
> >
> > > 2. Adding a new connector with `savepoint-metadata`
> >
> > I would argue against introducing a new connector type named
> > savepoint-metadata, as the existing Catalog mechanism can inherently
> > provide the necessary connector factory capabilities. I’ve detailed this
> > proposal in branch[1]. Please take a moment to review it.
> >
> > If we introduce a connector named `savepoint-metadata`, it means user can
> > create a temporary table with connector `savepoint-metadata` and the
> > connector needs to check whether table schema is same to the schema we
> > proposed in the FLIP. On the other hand, it's not easy work for others to
> > users a metadata table with same schema.
> >
> > [1]
> >
> >
> https://github.com/apache/flink/compare/master...fsk119:flink:state-metadata?expand=1#diff-712a7bc92fe46c405fb0e61b475bb2a005cb7a72bab7df28bbb92744bcb5f465R63
> >
> > Best,
> > Shengkai
> >
> > Gabor Somogyi <[email protected]> 于2025年3月11日周二 16:56写道：
> >
> > > Hi Shengkai,
> > >
> > > > 1. State TTL for Value Columns
> > >
> > > From directional perspective I agree your idea how it can be
> implemented.
> > > Previously I've mentioned that TTL information is not exposed on the
> > state
> > > processor API (which the SQL state connector uses to read data)
> > > and unless somebody show me the opposite this FLIP is not going to
> > address
> > > this to avoid feature creep. Our users are also interested in TTL so
> > > sooner or later we're going to expose it, this is matter of scheduling.
> > >
> > > > 2. Adding a new connector with `savepoint-metadata`
> > >
> > > Not sure I understand your point at all related StateCatalog. First of
> > all
> > > I can't agree more that StateCatalog is needed and is a planned
> building
> > > block in an upcoming
> > > FLIP but not sure how can it help now? No matter what, your knowledge
> is
> > > essential when we add StateCatalog. Let me expose my understanding in
> > this
> > > area:
> > > * First we need create table statements to access state data and
> metadata
> > > * When we have that then we can add StateCatalog which could
> potentially
> > > ease the life of users by for ex. giving off-the-shelf tables without
> > > sweating with create table statements
> > >
> > > User expectations:
> > > * See state data (this is fulfilled with the existing connector)
> > > * See metadata about state data like TTL (this can be added as metadata
> > > column as you suggested since it belongs to the data)
> > > * See metadata about operators (this can be added from
> > savepoint-metadata)
> > >
> > > Important to highlight that state data table format differs from state
> > > metadata table format. Namely one table has rows for state values and
> > > another has rows for operators, right?
> > > I think that's the reason why you've pinpointed out that the suggested
> > > metadata columns are somewhat clunky.
> > >
> > > As a conclusion I agree to add ${state-name}_ttl metadata column later
> on
> > > since it belongs to the state value and adding a new table type (like
> you
> > > suggested similar to PG [1])
> > > for metadata. Please see how Spark does that too [2].
> > >
> > > If you have better approach then please elaborate with more details and
> > > help me to understand your point.
> > >
> > > > Up until now we've seen even in TB savepoints that the number of keys
> > can
> > > > be extremely huge but not the per key state itself.
> > > > But again, this is a good feature as-is and can be handled in a
> > separate
> > > > jira.
> > >
> > > I've just created https://issues.apache.org/jira/browse/FLINK-37456.
> > >
> > > [1] https://www.postgresql.org/docs/current/view-pg-tables.html
> > > [2]
> > >
> > >
> >
> https://www.databricks.com/blog/announcing-state-reader-api-new-statestore-data-source
> > >
> > > BR,
> > > G
> > >
> > >
> > > On Tue, Mar 11, 2025 at 3:55 AM Shengkai Fang <[email protected]>
> wrote:
> > >
> > > > Hi, Gabor. Thanks for your response.
> > > >
> > > > > 1. State TTL for Value Columns
> > > >
> > > > Thank you for addressing the limitations here. However, I believe it
> > > would
> > > > be beneficial to further clarify the API in this FLIP regarding how
> > users
> > > > can specify the TTL column.
> > > >
> > > > One potential approach that comes to mind is using a standardized
> > naming
> > > > convention such as ${state-name}_ttl for the metadata column that
> > defines
> > > > the TTL value. In terms of implementation, the listReadableMetadata
> > > > function could:
> > > >
> > > > 1. Read the table’s columns and configuration,
> > > > 2. Extract all defined state names, and
> > > > 3. Return a structured list of metadata entries formatted as
> > > > ${state-name}_ttl.
> > > >
> > > > WDYT?
> > > >
> > > > > 2. Adding a new connector with `savepoint-metadata`
> > > >
> > > > Introducing a new connector type at this stage may unnecessarily
> > > complicate
> > > > the system. Given that every table already belongs to a Catalog,
> which
> > is
> > > > designed to provide a Factory for building source or sink
> connectors, I
> > > > propose integrating a dedicated StateCatalog instead. This approach
> > would
> > > > allow us to:
> > > >
> > > > 1. Leverage the Catalog’s existing capabilities to manage TTL
> metadata
> > > > (e.g., state names and TTL logic) without duplicating functionality.
> > > > 2. Provide a unified interface for connector instantiation and
> metadata
> > > > handling through the Catalog’s Factory pattern.
> > > >
> > > > Would this design decision better align with our architecture’s
> > > > extensibility and reduce redundancy?
> > > >
> > > > > Up until now we've seen even in TB savepoints that the number of
> keys
> > > can
> > > > > be extremely huge but not the per key state itself.
> > > > > But again, this is a good feature as-is and can be handled in a
> > > separate
> > > > > jira.
> > > >
> > > > +1 for a separate jira.
> > > >
> > > > Best,
> > > > Shengkai
> > > >
> > > > Gabor Somogyi <[email protected]> 于2025年3月10日周一 19:05写道：
> > > >
> > > > > Hi Shengkai,
> > > > >
> > > > > Please see my comments inline.
> > > > >
> > > > > BR,
> > > > > G
> > > > >
> > > > >
> > > > > On Mon, Mar 3, 2025 at 7:07 AM Shengkai Fang <[email protected]>
> > > wrote:
> > > > >
> > > > > > Hi, Gabor. Thanks for your the FLIP. I have some questions about
> > the
> > > > > FLIP:
> > > > > >
> > > > > > 1. State TTL for Value Columns
> > > > > > How can users retrieve the state TTL (Time-to-Live) for each
> value
> > > > > column?
> > > > > > From my understanding of the current design, it seems that this
> > > > > > functionality is not supported. Could you clarify if there are
> > plans
> > > to
> > > > > > address this limitation?
> > > > > >
> > > > >
> > > > > Since the state processor API is not yet exposing this information
> > this
> > > > > would require several steps.
> > > > > First, the state processor API support needs to be added which can
> be
> > > > then
> > > > > exposed on the SQL API.
> > > > > This is definitely a future improvement which is useful and can be
> > > > handled
> > > > > in a separate jira.
> > > > >
> > > > >
> > > > > > 2. Metadata Table vs. Metadata Column
> > > > > > The metadata information described in the FLIP appears to be
> > intended
> > > > to
> > > > > > describe the state files stored at a specific location. To me,
> this
> > > > > concept
> > > > > > aligns more closely with system tables like pg_tables in
> PostgreSQL
> > > [1]
> > > > > or
> > > > > > the INFORMATION_SCHEMA in MySQL [2].
> > > > > >
> > > > >
> > > > > Adding a new connector with `savepoint-metadata` is a possibility
> > where
> > > > we
> > > > > can create such functionality.
> > > > > I'm not against that, just want to have a common agreement that we
> > > would
> > > > > like to move that direction.
> > > > > (As a side note not just PG but Spark also has similar approach
> and I
> > > > > basically like the idea).
> > > > > If we would go that direction savepoint metadata can be reached in
> a
> > > way
> > > > > that one row would represent
> > > > > an operator with it's values something like this:
> > > > >
> > > > >
> > > > >
> > > >
> > >
> >
> ┌─────────┬─────────┬─────────┬─────────┬─────────┬─────────┬─────────┬────────┐
> > > > >
> > > > >
> > > >
> > >
> >
> │operatorN│operatorU│operatorH│paralleli│maxParall│subtaskSt│coordinat│totalSta│
> > > > > │ame      │id       │ash      │sm       │elism
> > > > > │atesCount│orStateSi│tesSizeI│
> > > > > │         │         │         │         │         │
> > > > >  │zeInBytes│nBytes  │
> > > > >
> > > > >
> > > >
> > >
> >
> ├─────────┼─────────┼─────────┼─────────┼─────────┼─────────┼─────────┼────────┤
> > > > > │Source:  │datagen-s│47aee9439│2        │128      │2        │16
> > > > >  │546     │
> > > > > │datagen-s│ource-uid│4d6ea26e2│         │         │         │
> >  │
> > > > >     │
> > > > > │ource    │         │d544bef0a│         │         │         │
> >  │
> > > > >     │
> > > > > │         │         │37bb5    │         │         │         │
> >  │
> > > > >     │
> > > > >
> > > > >
> > > >
> > >
> >
> ├─────────┼─────────┼─────────┼─────────┼─────────┼─────────┼─────────┼────────┤
> > > > > │long-udf-│long-udf-│6ed3f40bf│2        │128      │2        │0
> > > │0
> > > > >      │
> > > > > │with-mast│with-mast│f3c8dfcdf│         │         │         │
> >  │
> > > > >     │
> > > > > │er-hook  │er-hook-u│cb95128a1│         │         │         │
> >  │
> > > > >     │
> > > > > │         │id       │018f1    │         │         │         │
> >  │
> > > > >     │
> > > > >
> > > > >
> > > >
> > >
> >
> ├─────────┼─────────┼─────────┼─────────┼─────────┼─────────┼─────────┼────────┤
> > > > > │value-pro│value-pro│ca4f5fe9a│2        │128      │2        │0
> > > > > │40726   │
> > > > > │cess     │cess-uid │637b656f0│         │         │         │
> >  │
> > > > >     │
> > > > > │         │         │9ea78b3e7│         │         │         │
> >  │
> > > > >     │
> > > > > │         │         │a15b9    │         │         │         │
> >  │
> > > > >     │
> > > > >
> > > > >
> > > >
> > >
> >
> ├─────────┼─────────┼─────────┼─────────┼─────────┼─────────┼─────────┼────────┤
> > > > >
> > > > > This table can then be joined with the actually existing
> `savepoint`
> > > > > connector created tables based on UID hash (which is unique and
> > always
> > > > > exists).
> > > > > This would mean that the already existing table would need only a
> > > single
> > > > > metadata column which is the UID hash.
> > > > > WDYT?
> > > > > @zakelly, plz share your thoughts too.
> > > > >
> > > > >
> > > > > > If we opt to use metadata columns, every record in the table
> would
> > > end
> > > > up
> > > > > > having identical values for these columns (please correct me if
> I’m
> > > > > > mistaken). On the other hand, the state connector requires users
> to
> > > > > specify
> > > > > > an operator UID or operator UID hash, after which it outputs
> > > > user-defined
> > > > > > values in its records. This approach feels somewhat redundant to
> > me.
> > > > > >
> > > > >
> > > > > If we would add a new `savepoint-metadata` connector then this can
> be
> > > > > addressed.
> > > > > On the other hand UID and UID hash are having either-or
> relationship
> > > from
> > > > > config perspective,
> > > > > so when a user provides the UID then he/she can be interested in
> the
> > > hash
> > > > > for further calculations
> > > > > (the whole Flink internals are depending on the hash). Printing out
> > the
> > > > > human readable UID
> > > > > is an explicit requirement from the user side because hashes are
> not
> > > > human
> > > > > readable.
> > > > >
> > > > >
> > > > > > 3. Handling LIST and MAP States in the State Connector
> > > > > > I have concerns about how the current design handles LIST and MAP
> > > > states.
> > > > > > Specifically, the state connector uses Flink SQL’s MAP and ARRAY
> > > types,
> > > > > > which implies that it attempts to load entire MAP or LIST states
> > into
> > > > > > memory.
> > > > > >
> > > > > > However, in many real-world scenarios, these states can grow very
> > > > large.
> > > > > > Typically, the state API addresses this by providing an iterator
> to
> > > > > > traverse elements within the state incrementally. I’m unsure
> > whether
> > > > I’ve
> > > > > > missed something in FLIP-496 or FLIP-512, but it seems that the
> > > current
> > > > > > design might struggle with scalability in such cases.
> > > > > >
> > > > >
> > > > > You see it good, the current implementation keeps state for a
> single
> > > key
> > > > in
> > > > > memory.
> > > > > Back in the days we've considered this potential issue and
> concluded
> > > that
> > > > > this is not necessarily
> > > > > needed for the initial version and can be done as a later
> > improvement.
> > > > >
> > > > > Up until now we've seen even in TB savepoints that the number of
> keys
> > > can
> > > > > be extremely huge but not the per key state itself.
> > > > > But again, this is a good feature as-is and can be handled in a
> > > separate
> > > > > jira.
> > > > >
> > > > >
> > > > > >
> > > > > > Best,
> > > > > > Shengkai
> > > > > >
> > > > > > [1] https://www.postgresql.org/docs/current/view-pg-tables.html
> > > > > > [2]
> > > > > >
> > > > > >
> > > > >
> > > >
> > >
> >
> https://dev.mysql.com/doc/refman/8.4/en/information-schema-tables-table.html
> > > > > >
> > > > > > Gabor Somogyi <[email protected]> 于2025年3月3日周一 02:00写道：
> > > > > >
> > > > > > > Hi Zakelly,
> > > > > > >
> > > > > > > In order to shoot for simplicity `METADATA VIRTUAL` as key
> words
> > > for
> > > > > > > definition is the target.
> > > > > > > When it's not super complex the latter can be added too.
> > > > > > >
> > > > > > > BR,
> > > > > > > G
> > > > > > >
> > > > > > >
> > > > > > > On Sun, Mar 2, 2025 at 3:37 PM Zakelly Lan <
> > [email protected]>
> > > > > > wrote:
> > > > > > >
> > > > > > > > Hi Gabor,
> > > > > > > >
> > > > > > > > +1 for this.
> > > > > > > >
> > > > > > > > Will the metadata column use `METADATA VIRTUAL` as key words
> > for
> > > > > > > > definition, or `METADATA FROM xxx VIRTUAL` for renaming, just
> > > like
> > > > > the
> > > > > > > > Kafka table?
> > > > > > > >
> > > > > > > >
> > > > > > > > Best,
> > > > > > > > Zakelly
> > > > > > > >
> > > > > > > > On Sat, Mar 1, 2025 at 1:31 PM Gabor Somogyi <
> > > > > > [email protected]>
> > > > > > > > wrote:
> > > > > > > >
> > > > > > > > > Hi All,
> > > > > > > > >
> > > > > > > > > I'd like to start a discussion of FLIP-512: Add meta
> > > information
> > > > to
> > > > > > SQL
> > > > > > > > > state connector [1].
> > > > > > > > > Feel free to add your thoughts to make this feature better.
> > > > > > > > >
> > > > > > > > > [1]
> > > > > > > > >
> > > > > > > > >
> > > > > > > >
> > > > > > >
> > > > > >
> > > > >
> > > >
> > >
> >
> https://cwiki.apache.org/confluence/display/FLINK/FLIP-512%3A+Add+meta+information+to+SQL+state+connector
> > > > > > > > >
> > > > > > > > > BR,
> > > > > > > > > G
> > > > > > > > >
> > > > > > > >
> > > > > > >
> > > > > >
> > > > >
> > > >
> > >
> >
>

Re: [DISCUSS] FLIP-512: Add meta information to SQL state connector

Reply via email to