Thanks for both the valuable input!

Let me take a closer look at the suggestions, like the Catalog capabilities
and possibility of embedding TypeInformation or
StateDescriptor metadata directly into the raw state files...

BR,
G


On Wed, Mar 12, 2025 at 8:17 AM Shengkai Fang <fskm...@gmail.com> wrote:

> Thanks for Zakelly's clarification.
>
> 1. State TTL for Value Columns
>
> +1 to delay the discussion about this.
>
> 2. Metadata Table vs. Metadata Column
>
> I’d like to share my perspective on the State Catalog proposal. While
> introducing this capability is beneficial, there is a blocker: the current
> StateBackend architecture does not permit operators to encode
> TypeInformation into the state—it only preserves the Serializer. This
> limitation creates an asymmetry, as operators alone retain knowledge of the
> data structure’s schema.
>
> To address this, I suggest allowing operators to embed TypeInformation or
> StateDescriptor metadata directly into the raw state files. Such a design
> would enable the Catalog to:
>
> 1. Parse state files and programmatically derive the schema and structural
> guarantees for each state.
> 2. Leverage existing Flink Table utilities, such as
> LegacyTypeInfoDataTypeConverter (in org.apache.flink.table.types.utils), to
> bridge TypeInformation and DataType conversions.
>
> If we can not store the TypeInformation or StateDescriptor into the raw
> state files, I am +1 for this FLIP to use metadata column to retrieve
> information.
>
> Best,
> Shengkai
>
>
>
>
> Zakelly Lan <zakelly....@gmail.com> 于2025年3月12日周三 12:43写道:
>
> > Hi Gabor and Shengkai,
> >
> > Thanks for sharing your thoughts! This is a long discussion and sorry for
> > the late reply (I'm busy catching up with release 2.0 these days).
> >
> > 1. State TTL for Value Columns
> >
> >
> > Let me first clarify your thoughts to ensure I understand correctly.
> IIUC,
> > there is no persistent configuration for state TTL in the checkpoint.
> While
> > you can infer that TTL is enabled by reading the serializer, the
> checkpoint
> > itself only stores the last access time for each value. So the only thing
> > we can show is the last access time for each value. But it is not
> required
> > for all state backends to store this, as they may directly store the
> > expired time. This will also increase the difficulty of implementation &
> > maintenance.
> >
> > This once again reiterates the importance of unified metadata for
> > checkpoints. I’m planning on adding this, and we may collaborate on it in
> > the future.
> >
> > 2. Metadata Table vs. Metadata Column
> >
> >
> > I'm not in favor of adding a new connector for metadata. The metadata is
> > more like one-time information instead of a streaming data that changes
> all
> > the time, so a single connector seems to be an overkill. It is not easy
> to
> > withdraw a connector if we have a better solution in future. I'm not
> > familiar with current Catalog capabilities, and if it could extract and
> > show some operator-level information from savepoint, that would be great.
> >
> > If the Catalog can't do that, I would consider the current FLIP to be a
> > compromise solution.
> >
> > And if we have that unified metadata for checkpoint/savepoint in future,
> we
> > may directly register savepoint in catalog, and create a source without
> > specifying complex columns, as well as describe the savepoint catalog to
> > get the metadata. That's a good solution in my mind.
> >
> >
> > Best,
> > Zakelly
> >
> > On Wed, Mar 12, 2025 at 10:35 AM Shengkai Fang <fskm...@gmail.com>
> wrote:
> >
> > > Hi Gabor,
> > >
> > > > 2. Adding a new connector with `savepoint-metadata`
> > >
> > > I would argue against introducing a new connector type named
> > > savepoint-metadata, as the existing Catalog mechanism can inherently
> > > provide the necessary connector factory capabilities. I’ve detailed
> this
> > > proposal in branch[1]. Please take a moment to review it.
> > >
> > > If we introduce a connector named `savepoint-metadata`, it means user
> can
> > > create a temporary table with connector `savepoint-metadata` and the
> > > connector needs to check whether table schema is same to the schema we
> > > proposed in the FLIP. On the other hand, it's not easy work for others
> to
> > > users a metadata table with same schema.
> > >
> > > [1]
> > >
> > >
> >
> https://github.com/apache/flink/compare/master...fsk119:flink:state-metadata?expand=1#diff-712a7bc92fe46c405fb0e61b475bb2a005cb7a72bab7df28bbb92744bcb5f465R63
> > >
> > > Best,
> > > Shengkai
> > >
> > > Gabor Somogyi <gabor.g.somo...@gmail.com> 于2025年3月11日周二 16:56写道:
> > >
> > > > Hi Shengkai,
> > > >
> > > > > 1. State TTL for Value Columns
> > > >
> > > > From directional perspective I agree your idea how it can be
> > implemented.
> > > > Previously I've mentioned that TTL information is not exposed on the
> > > state
> > > > processor API (which the SQL state connector uses to read data)
> > > > and unless somebody show me the opposite this FLIP is not going to
> > > address
> > > > this to avoid feature creep. Our users are also interested in TTL so
> > > > sooner or later we're going to expose it, this is matter of
> scheduling.
> > > >
> > > > > 2. Adding a new connector with `savepoint-metadata`
> > > >
> > > > Not sure I understand your point at all related StateCatalog. First
> of
> > > all
> > > > I can't agree more that StateCatalog is needed and is a planned
> > building
> > > > block in an upcoming
> > > > FLIP but not sure how can it help now? No matter what, your knowledge
> > is
> > > > essential when we add StateCatalog. Let me expose my understanding in
> > > this
> > > > area:
> > > > * First we need create table statements to access state data and
> > metadata
> > > > * When we have that then we can add StateCatalog which could
> > potentially
> > > > ease the life of users by for ex. giving off-the-shelf tables without
> > > > sweating with create table statements
> > > >
> > > > User expectations:
> > > > * See state data (this is fulfilled with the existing connector)
> > > > * See metadata about state data like TTL (this can be added as
> metadata
> > > > column as you suggested since it belongs to the data)
> > > > * See metadata about operators (this can be added from
> > > savepoint-metadata)
> > > >
> > > > Important to highlight that state data table format differs from
> state
> > > > metadata table format. Namely one table has rows for state values and
> > > > another has rows for operators, right?
> > > > I think that's the reason why you've pinpointed out that the
> suggested
> > > > metadata columns are somewhat clunky.
> > > >
> > > > As a conclusion I agree to add ${state-name}_ttl metadata column
> later
> > on
> > > > since it belongs to the state value and adding a new table type (like
> > you
> > > > suggested similar to PG [1])
> > > > for metadata. Please see how Spark does that too [2].
> > > >
> > > > If you have better approach then please elaborate with more details
> and
> > > > help me to understand your point.
> > > >
> > > > > Up until now we've seen even in TB savepoints that the number of
> keys
> > > can
> > > > > be extremely huge but not the per key state itself.
> > > > > But again, this is a good feature as-is and can be handled in a
> > > separate
> > > > > jira.
> > > >
> > > > I've just created https://issues.apache.org/jira/browse/FLINK-37456.
> > > >
> > > > [1] https://www.postgresql.org/docs/current/view-pg-tables.html
> > > > [2]
> > > >
> > > >
> > >
> >
> https://www.databricks.com/blog/announcing-state-reader-api-new-statestore-data-source
> > > >
> > > > BR,
> > > > G
> > > >
> > > >
> > > > On Tue, Mar 11, 2025 at 3:55 AM Shengkai Fang <fskm...@gmail.com>
> > wrote:
> > > >
> > > > > Hi, Gabor. Thanks for your response.
> > > > >
> > > > > > 1. State TTL for Value Columns
> > > > >
> > > > > Thank you for addressing the limitations here. However, I believe
> it
> > > > would
> > > > > be beneficial to further clarify the API in this FLIP regarding how
> > > users
> > > > > can specify the TTL column.
> > > > >
> > > > > One potential approach that comes to mind is using a standardized
> > > naming
> > > > > convention such as ${state-name}_ttl for the metadata column that
> > > defines
> > > > > the TTL value. In terms of implementation, the listReadableMetadata
> > > > > function could:
> > > > >
> > > > > 1. Read the table’s columns and configuration,
> > > > > 2. Extract all defined state names, and
> > > > > 3. Return a structured list of metadata entries formatted as
> > > > > ${state-name}_ttl.
> > > > >
> > > > > WDYT?
> > > > >
> > > > > > 2. Adding a new connector with `savepoint-metadata`
> > > > >
> > > > > Introducing a new connector type at this stage may unnecessarily
> > > > complicate
> > > > > the system. Given that every table already belongs to a Catalog,
> > which
> > > is
> > > > > designed to provide a Factory for building source or sink
> > connectors, I
> > > > > propose integrating a dedicated StateCatalog instead. This approach
> > > would
> > > > > allow us to:
> > > > >
> > > > > 1. Leverage the Catalog’s existing capabilities to manage TTL
> > metadata
> > > > > (e.g., state names and TTL logic) without duplicating
> functionality.
> > > > > 2. Provide a unified interface for connector instantiation and
> > metadata
> > > > > handling through the Catalog’s Factory pattern.
> > > > >
> > > > > Would this design decision better align with our architecture’s
> > > > > extensibility and reduce redundancy?
> > > > >
> > > > > > Up until now we've seen even in TB savepoints that the number of
> > keys
> > > > can
> > > > > > be extremely huge but not the per key state itself.
> > > > > > But again, this is a good feature as-is and can be handled in a
> > > > separate
> > > > > > jira.
> > > > >
> > > > > +1 for a separate jira.
> > > > >
> > > > > Best,
> > > > > Shengkai
> > > > >
> > > > > Gabor Somogyi <gabor.g.somo...@gmail.com> 于2025年3月10日周一 19:05写道:
> > > > >
> > > > > > Hi Shengkai,
> > > > > >
> > > > > > Please see my comments inline.
> > > > > >
> > > > > > BR,
> > > > > > G
> > > > > >
> > > > > >
> > > > > > On Mon, Mar 3, 2025 at 7:07 AM Shengkai Fang <fskm...@gmail.com>
> > > > wrote:
> > > > > >
> > > > > > > Hi, Gabor. Thanks for your the FLIP. I have some questions
> about
> > > the
> > > > > > FLIP:
> > > > > > >
> > > > > > > 1. State TTL for Value Columns
> > > > > > > How can users retrieve the state TTL (Time-to-Live) for each
> > value
> > > > > > column?
> > > > > > > From my understanding of the current design, it seems that this
> > > > > > > functionality is not supported. Could you clarify if there are
> > > plans
> > > > to
> > > > > > > address this limitation?
> > > > > > >
> > > > > >
> > > > > > Since the state processor API is not yet exposing this
> information
> > > this
> > > > > > would require several steps.
> > > > > > First, the state processor API support needs to be added which
> can
> > be
> > > > > then
> > > > > > exposed on the SQL API.
> > > > > > This is definitely a future improvement which is useful and can
> be
> > > > > handled
> > > > > > in a separate jira.
> > > > > >
> > > > > >
> > > > > > > 2. Metadata Table vs. Metadata Column
> > > > > > > The metadata information described in the FLIP appears to be
> > > intended
> > > > > to
> > > > > > > describe the state files stored at a specific location. To me,
> > this
> > > > > > concept
> > > > > > > aligns more closely with system tables like pg_tables in
> > PostgreSQL
> > > > [1]
> > > > > > or
> > > > > > > the INFORMATION_SCHEMA in MySQL [2].
> > > > > > >
> > > > > >
> > > > > > Adding a new connector with `savepoint-metadata` is a possibility
> > > where
> > > > > we
> > > > > > can create such functionality.
> > > > > > I'm not against that, just want to have a common agreement that
> we
> > > > would
> > > > > > like to move that direction.
> > > > > > (As a side note not just PG but Spark also has similar approach
> > and I
> > > > > > basically like the idea).
> > > > > > If we would go that direction savepoint metadata can be reached
> in
> > a
> > > > way
> > > > > > that one row would represent
> > > > > > an operator with it's values something like this:
> > > > > >
> > > > > >
> > > > > >
> > > > >
> > > >
> > >
> >
> ┌─────────┬─────────┬─────────┬─────────┬─────────┬─────────┬─────────┬────────┐
> > > > > >
> > > > > >
> > > > >
> > > >
> > >
> >
> │operatorN│operatorU│operatorH│paralleli│maxParall│subtaskSt│coordinat│totalSta│
> > > > > > │ame      │id       │ash      │sm       │elism
> > > > > > │atesCount│orStateSi│tesSizeI│
> > > > > > │         │         │         │         │         │
> > > > > >  │zeInBytes│nBytes  │
> > > > > >
> > > > > >
> > > > >
> > > >
> > >
> >
> ├─────────┼─────────┼─────────┼─────────┼─────────┼─────────┼─────────┼────────┤
> > > > > > │Source:  │datagen-s│47aee9439│2        │128      │2        │16
> > > > > >  │546     │
> > > > > > │datagen-s│ource-uid│4d6ea26e2│         │         │         │
> > >  │
> > > > > >     │
> > > > > > │ource    │         │d544bef0a│         │         │         │
> > >  │
> > > > > >     │
> > > > > > │         │         │37bb5    │         │         │         │
> > >  │
> > > > > >     │
> > > > > >
> > > > > >
> > > > >
> > > >
> > >
> >
> ├─────────┼─────────┼─────────┼─────────┼─────────┼─────────┼─────────┼────────┤
> > > > > > │long-udf-│long-udf-│6ed3f40bf│2        │128      │2        │0
> > > > │0
> > > > > >      │
> > > > > > │with-mast│with-mast│f3c8dfcdf│         │         │         │
> > >  │
> > > > > >     │
> > > > > > │er-hook  │er-hook-u│cb95128a1│         │         │         │
> > >  │
> > > > > >     │
> > > > > > │         │id       │018f1    │         │         │         │
> > >  │
> > > > > >     │
> > > > > >
> > > > > >
> > > > >
> > > >
> > >
> >
> ├─────────┼─────────┼─────────┼─────────┼─────────┼─────────┼─────────┼────────┤
> > > > > > │value-pro│value-pro│ca4f5fe9a│2        │128      │2        │0
> > > > > > │40726   │
> > > > > > │cess     │cess-uid │637b656f0│         │         │         │
> > >  │
> > > > > >     │
> > > > > > │         │         │9ea78b3e7│         │         │         │
> > >  │
> > > > > >     │
> > > > > > │         │         │a15b9    │         │         │         │
> > >  │
> > > > > >     │
> > > > > >
> > > > > >
> > > > >
> > > >
> > >
> >
> ├─────────┼─────────┼─────────┼─────────┼─────────┼─────────┼─────────┼────────┤
> > > > > >
> > > > > > This table can then be joined with the actually existing
> > `savepoint`
> > > > > > connector created tables based on UID hash (which is unique and
> > > always
> > > > > > exists).
> > > > > > This would mean that the already existing table would need only a
> > > > single
> > > > > > metadata column which is the UID hash.
> > > > > > WDYT?
> > > > > > @zakelly, plz share your thoughts too.
> > > > > >
> > > > > >
> > > > > > > If we opt to use metadata columns, every record in the table
> > would
> > > > end
> > > > > up
> > > > > > > having identical values for these columns (please correct me if
> > I’m
> > > > > > > mistaken). On the other hand, the state connector requires
> users
> > to
> > > > > > specify
> > > > > > > an operator UID or operator UID hash, after which it outputs
> > > > > user-defined
> > > > > > > values in its records. This approach feels somewhat redundant
> to
> > > me.
> > > > > > >
> > > > > >
> > > > > > If we would add a new `savepoint-metadata` connector then this
> can
> > be
> > > > > > addressed.
> > > > > > On the other hand UID and UID hash are having either-or
> > relationship
> > > > from
> > > > > > config perspective,
> > > > > > so when a user provides the UID then he/she can be interested in
> > the
> > > > hash
> > > > > > for further calculations
> > > > > > (the whole Flink internals are depending on the hash). Printing
> out
> > > the
> > > > > > human readable UID
> > > > > > is an explicit requirement from the user side because hashes are
> > not
> > > > > human
> > > > > > readable.
> > > > > >
> > > > > >
> > > > > > > 3. Handling LIST and MAP States in the State Connector
> > > > > > > I have concerns about how the current design handles LIST and
> MAP
> > > > > states.
> > > > > > > Specifically, the state connector uses Flink SQL’s MAP and
> ARRAY
> > > > types,
> > > > > > > which implies that it attempts to load entire MAP or LIST
> states
> > > into
> > > > > > > memory.
> > > > > > >
> > > > > > > However, in many real-world scenarios, these states can grow
> very
> > > > > large.
> > > > > > > Typically, the state API addresses this by providing an
> iterator
> > to
> > > > > > > traverse elements within the state incrementally. I’m unsure
> > > whether
> > > > > I’ve
> > > > > > > missed something in FLIP-496 or FLIP-512, but it seems that the
> > > > current
> > > > > > > design might struggle with scalability in such cases.
> > > > > > >
> > > > > >
> > > > > > You see it good, the current implementation keeps state for a
> > single
> > > > key
> > > > > in
> > > > > > memory.
> > > > > > Back in the days we've considered this potential issue and
> > concluded
> > > > that
> > > > > > this is not necessarily
> > > > > > needed for the initial version and can be done as a later
> > > improvement.
> > > > > >
> > > > > > Up until now we've seen even in TB savepoints that the number of
> > keys
> > > > can
> > > > > > be extremely huge but not the per key state itself.
> > > > > > But again, this is a good feature as-is and can be handled in a
> > > > separate
> > > > > > jira.
> > > > > >
> > > > > >
> > > > > > >
> > > > > > > Best,
> > > > > > > Shengkai
> > > > > > >
> > > > > > > [1]
> https://www.postgresql.org/docs/current/view-pg-tables.html
> > > > > > > [2]
> > > > > > >
> > > > > > >
> > > > > >
> > > > >
> > > >
> > >
> >
> https://dev.mysql.com/doc/refman/8.4/en/information-schema-tables-table.html
> > > > > > >
> > > > > > > Gabor Somogyi <gabor.g.somo...@gmail.com> 于2025年3月3日周一
> 02:00写道:
> > > > > > >
> > > > > > > > Hi Zakelly,
> > > > > > > >
> > > > > > > > In order to shoot for simplicity `METADATA VIRTUAL` as key
> > words
> > > > for
> > > > > > > > definition is the target.
> > > > > > > > When it's not super complex the latter can be added too.
> > > > > > > >
> > > > > > > > BR,
> > > > > > > > G
> > > > > > > >
> > > > > > > >
> > > > > > > > On Sun, Mar 2, 2025 at 3:37 PM Zakelly Lan <
> > > zakelly....@gmail.com>
> > > > > > > wrote:
> > > > > > > >
> > > > > > > > > Hi Gabor,
> > > > > > > > >
> > > > > > > > > +1 for this.
> > > > > > > > >
> > > > > > > > > Will the metadata column use `METADATA VIRTUAL` as key
> words
> > > for
> > > > > > > > > definition, or `METADATA FROM xxx VIRTUAL` for renaming,
> just
> > > > like
> > > > > > the
> > > > > > > > > Kafka table?
> > > > > > > > >
> > > > > > > > >
> > > > > > > > > Best,
> > > > > > > > > Zakelly
> > > > > > > > >
> > > > > > > > > On Sat, Mar 1, 2025 at 1:31 PM Gabor Somogyi <
> > > > > > > gabor.g.somo...@gmail.com>
> > > > > > > > > wrote:
> > > > > > > > >
> > > > > > > > > > Hi All,
> > > > > > > > > >
> > > > > > > > > > I'd like to start a discussion of FLIP-512: Add meta
> > > > information
> > > > > to
> > > > > > > SQL
> > > > > > > > > > state connector [1].
> > > > > > > > > > Feel free to add your thoughts to make this feature
> better.
> > > > > > > > > >
> > > > > > > > > > [1]
> > > > > > > > > >
> > > > > > > > > >
> > > > > > > > >
> > > > > > > >
> > > > > > >
> > > > > >
> > > > >
> > > >
> > >
> >
> https://cwiki.apache.org/confluence/display/FLINK/FLIP-512%3A+Add+meta+information+to+SQL+state+connector
> > > > > > > > > >
> > > > > > > > > > BR,
> > > > > > > > > > G
> > > > > > > > > >
> > > > > > > > >
> > > > > > > >
> > > > > > >
> > > > > >
> > > > >
> > > >
> > >
> >
>

Reply via email to