Re: [DISCUSS] FLIP-512: Add meta information to SQL state connector

Shengkai Fang Wed, 12 Mar 2025 18:45:51 -0700

Looking forward to hearing the good news!

Best,
Shengkai


Gabor Somogyi <gabor.g.somo...@gmail.com> 于2025年3月12日周三 22:24写道：

> Thanks for both the valuable input!
>
> Let me take a closer look at the suggestions, like the Catalog capabilities
> and possibility of embedding TypeInformation or
> StateDescriptor metadata directly into the raw state files...
>
> BR,
> G
>
>
> On Wed, Mar 12, 2025 at 8:17 AM Shengkai Fang <fskm...@gmail.com> wrote:
>
> > Thanks for Zakelly's clarification.
> >
> > 1. State TTL for Value Columns
> >
> > +1 to delay the discussion about this.
> >
> > 2. Metadata Table vs. Metadata Column
> >
> > I’d like to share my perspective on the State Catalog proposal. While
> > introducing this capability is beneficial, there is a blocker: the
> current
> > StateBackend architecture does not permit operators to encode
> > TypeInformation into the state—it only preserves the Serializer. This
> > limitation creates an asymmetry, as operators alone retain knowledge of
> the
> > data structure’s schema.
> >
> > To address this, I suggest allowing operators to embed TypeInformation or
> > StateDescriptor metadata directly into the raw state files. Such a design
> > would enable the Catalog to:
> >
> > 1. Parse state files and programmatically derive the schema and
> structural
> > guarantees for each state.
> > 2. Leverage existing Flink Table utilities, such as
> > LegacyTypeInfoDataTypeConverter (in org.apache.flink.table.types.utils),
> to
> > bridge TypeInformation and DataType conversions.
> >
> > If we can not store the TypeInformation or StateDescriptor into the raw
> > state files, I am +1 for this FLIP to use metadata column to retrieve
> > information.
> >
> > Best,
> > Shengkai
> >
> >
> >
> >
> > Zakelly Lan <zakelly....@gmail.com> 于2025年3月12日周三 12:43写道：
> >
> > > Hi Gabor and Shengkai,
> > >
> > > Thanks for sharing your thoughts! This is a long discussion and sorry
> for
> > > the late reply (I'm busy catching up with release 2.0 these days).
> > >
> > > 1. State TTL for Value Columns
> > >
> > >
> > > Let me first clarify your thoughts to ensure I understand correctly.
> > IIUC,
> > > there is no persistent configuration for state TTL in the checkpoint.
> > While
> > > you can infer that TTL is enabled by reading the serializer, the
> > checkpoint
> > > itself only stores the last access time for each value. So the only
> thing
> > > we can show is the last access time for each value. But it is not
> > required
> > > for all state backends to store this, as they may directly store the
> > > expired time. This will also increase the difficulty of implementation
> &
> > > maintenance.
> > >
> > > This once again reiterates the importance of unified metadata for
> > > checkpoints. I’m planning on adding this, and we may collaborate on it
> in
> > > the future.
> > >
> > > 2. Metadata Table vs. Metadata Column
> > >
> > >
> > > I'm not in favor of adding a new connector for metadata. The metadata
> is
> > > more like one-time information instead of a streaming data that changes
> > all
> > > the time, so a single connector seems to be an overkill. It is not easy
> > to
> > > withdraw a connector if we have a better solution in future. I'm not
> > > familiar with current Catalog capabilities, and if it could extract and
> > > show some operator-level information from savepoint, that would be
> great.
> > >
> > > If the Catalog can't do that, I would consider the current FLIP to be a
> > > compromise solution.
> > >
> > > And if we have that unified metadata for checkpoint/savepoint in
> future,
> > we
> > > may directly register savepoint in catalog, and create a source without
> > > specifying complex columns, as well as describe the savepoint catalog
> to
> > > get the metadata. That's a good solution in my mind.
> > >
> > >
> > > Best,
> > > Zakelly
> > >
> > > On Wed, Mar 12, 2025 at 10:35 AM Shengkai Fang <fskm...@gmail.com>
> > wrote:
> > >
> > > > Hi Gabor,
> > > >
> > > > > 2. Adding a new connector with `savepoint-metadata`
> > > >
> > > > I would argue against introducing a new connector type named
> > > > savepoint-metadata, as the existing Catalog mechanism can inherently
> > > > provide the necessary connector factory capabilities. I’ve detailed
> > this
> > > > proposal in branch[1]. Please take a moment to review it.
> > > >
> > > > If we introduce a connector named `savepoint-metadata`, it means user
> > can
> > > > create a temporary table with connector `savepoint-metadata` and the
> > > > connector needs to check whether table schema is same to the schema
> we
> > > > proposed in the FLIP. On the other hand, it's not easy work for
> others
> > to
> > > > users a metadata table with same schema.
> > > >
> > > > [1]
> > > >
> > > >
> > >
> >
> https://github.com/apache/flink/compare/master...fsk119:flink:state-metadata?expand=1#diff-712a7bc92fe46c405fb0e61b475bb2a005cb7a72bab7df28bbb92744bcb5f465R63
> > > >
> > > > Best,
> > > > Shengkai
> > > >
> > > > Gabor Somogyi <gabor.g.somo...@gmail.com> 于2025年3月11日周二 16:56写道：
> > > >
> > > > > Hi Shengkai,
> > > > >
> > > > > > 1. State TTL for Value Columns
> > > > >
> > > > > From directional perspective I agree your idea how it can be
> > > implemented.
> > > > > Previously I've mentioned that TTL information is not exposed on
> the
> > > > state
> > > > > processor API (which the SQL state connector uses to read data)
> > > > > and unless somebody show me the opposite this FLIP is not going to
> > > > address
> > > > > this to avoid feature creep. Our users are also interested in TTL
> so
> > > > > sooner or later we're going to expose it, this is matter of
> > scheduling.
> > > > >
> > > > > > 2. Adding a new connector with `savepoint-metadata`
> > > > >
> > > > > Not sure I understand your point at all related StateCatalog. First
> > of
> > > > all
> > > > > I can't agree more that StateCatalog is needed and is a planned
> > > building
> > > > > block in an upcoming
> > > > > FLIP but not sure how can it help now? No matter what, your
> knowledge
> > > is
> > > > > essential when we add StateCatalog. Let me expose my understanding
> in
> > > > this
> > > > > area:
> > > > > * First we need create table statements to access state data and
> > > metadata
> > > > > * When we have that then we can add StateCatalog which could
> > > potentially
> > > > > ease the life of users by for ex. giving off-the-shelf tables
> without
> > > > > sweating with create table statements
> > > > >
> > > > > User expectations:
> > > > > * See state data (this is fulfilled with the existing connector)
> > > > > * See metadata about state data like TTL (this can be added as
> > metadata
> > > > > column as you suggested since it belongs to the data)
> > > > > * See metadata about operators (this can be added from
> > > > savepoint-metadata)
> > > > >
> > > > > Important to highlight that state data table format differs from
> > state
> > > > > metadata table format. Namely one table has rows for state values
> and
> > > > > another has rows for operators, right?
> > > > > I think that's the reason why you've pinpointed out that the
> > suggested
> > > > > metadata columns are somewhat clunky.
> > > > >
> > > > > As a conclusion I agree to add ${state-name}_ttl metadata column
> > later
> > > on
> > > > > since it belongs to the state value and adding a new table type
> (like
> > > you
> > > > > suggested similar to PG [1])
> > > > > for metadata. Please see how Spark does that too [2].
> > > > >
> > > > > If you have better approach then please elaborate with more details
> > and
> > > > > help me to understand your point.
> > > > >
> > > > > > Up until now we've seen even in TB savepoints that the number of
> > keys
> > > > can
> > > > > > be extremely huge but not the per key state itself.
> > > > > > But again, this is a good feature as-is and can be handled in a
> > > > separate
> > > > > > jira.
> > > > >
> > > > > I've just created
> https://issues.apache.org/jira/browse/FLINK-37456.
> > > > >
> > > > > [1] https://www.postgresql.org/docs/current/view-pg-tables.html
> > > > > [2]
> > > > >
> > > > >
> > > >
> > >
> >
> https://www.databricks.com/blog/announcing-state-reader-api-new-statestore-data-source
> > > > >
> > > > > BR,
> > > > > G
> > > > >
> > > > >
> > > > > On Tue, Mar 11, 2025 at 3:55 AM Shengkai Fang <fskm...@gmail.com>
> > > wrote:
> > > > >
> > > > > > Hi, Gabor. Thanks for your response.
> > > > > >
> > > > > > > 1. State TTL for Value Columns
> > > > > >
> > > > > > Thank you for addressing the limitations here. However, I believe
> > it
> > > > > would
> > > > > > be beneficial to further clarify the API in this FLIP regarding
> how
> > > > users
> > > > > > can specify the TTL column.
> > > > > >
> > > > > > One potential approach that comes to mind is using a standardized
> > > > naming
> > > > > > convention such as ${state-name}_ttl for the metadata column that
> > > > defines
> > > > > > the TTL value. In terms of implementation, the
> listReadableMetadata
> > > > > > function could:
> > > > > >
> > > > > > 1. Read the table’s columns and configuration,
> > > > > > 2. Extract all defined state names, and
> > > > > > 3. Return a structured list of metadata entries formatted as
> > > > > > ${state-name}_ttl.
> > > > > >
> > > > > > WDYT?
> > > > > >
> > > > > > > 2. Adding a new connector with `savepoint-metadata`
> > > > > >
> > > > > > Introducing a new connector type at this stage may unnecessarily
> > > > > complicate
> > > > > > the system. Given that every table already belongs to a Catalog,
> > > which
> > > > is
> > > > > > designed to provide a Factory for building source or sink
> > > connectors, I
> > > > > > propose integrating a dedicated StateCatalog instead. This
> approach
> > > > would
> > > > > > allow us to:
> > > > > >
> > > > > > 1. Leverage the Catalog’s existing capabilities to manage TTL
> > > metadata
> > > > > > (e.g., state names and TTL logic) without duplicating
> > functionality.
> > > > > > 2. Provide a unified interface for connector instantiation and
> > > metadata
> > > > > > handling through the Catalog’s Factory pattern.
> > > > > >
> > > > > > Would this design decision better align with our architecture’s
> > > > > > extensibility and reduce redundancy?
> > > > > >
> > > > > > > Up until now we've seen even in TB savepoints that the number
> of
> > > keys
> > > > > can
> > > > > > > be extremely huge but not the per key state itself.
> > > > > > > But again, this is a good feature as-is and can be handled in a
> > > > > separate
> > > > > > > jira.
> > > > > >
> > > > > > +1 for a separate jira.
> > > > > >
> > > > > > Best,
> > > > > > Shengkai
> > > > > >
> > > > > > Gabor Somogyi <gabor.g.somo...@gmail.com> 于2025年3月10日周一 19:05写道：
> > > > > >
> > > > > > > Hi Shengkai,
> > > > > > >
> > > > > > > Please see my comments inline.
> > > > > > >
> > > > > > > BR,
> > > > > > > G
> > > > > > >
> > > > > > >
> > > > > > > On Mon, Mar 3, 2025 at 7:07 AM Shengkai Fang <
> fskm...@gmail.com>
> > > > > wrote:
> > > > > > >
> > > > > > > > Hi, Gabor. Thanks for your the FLIP. I have some questions
> > about
> > > > the
> > > > > > > FLIP:
> > > > > > > >
> > > > > > > > 1. State TTL for Value Columns
> > > > > > > > How can users retrieve the state TTL (Time-to-Live) for each
> > > value
> > > > > > > column?
> > > > > > > > From my understanding of the current design, it seems that
> this
> > > > > > > > functionality is not supported. Could you clarify if there
> are
> > > > plans
> > > > > to
> > > > > > > > address this limitation?
> > > > > > > >
> > > > > > >
> > > > > > > Since the state processor API is not yet exposing this
> > information
> > > > this
> > > > > > > would require several steps.
> > > > > > > First, the state processor API support needs to be added which
> > can
> > > be
> > > > > > then
> > > > > > > exposed on the SQL API.
> > > > > > > This is definitely a future improvement which is useful and can
> > be
> > > > > > handled
> > > > > > > in a separate jira.
> > > > > > >
> > > > > > >
> > > > > > > > 2. Metadata Table vs. Metadata Column
> > > > > > > > The metadata information described in the FLIP appears to be
> > > > intended
> > > > > > to
> > > > > > > > describe the state files stored at a specific location. To
> me,
> > > this
> > > > > > > concept
> > > > > > > > aligns more closely with system tables like pg_tables in
> > > PostgreSQL
> > > > > [1]
> > > > > > > or
> > > > > > > > the INFORMATION_SCHEMA in MySQL [2].
> > > > > > > >
> > > > > > >
> > > > > > > Adding a new connector with `savepoint-metadata` is a
> possibility
> > > > where
> > > > > > we
> > > > > > > can create such functionality.
> > > > > > > I'm not against that, just want to have a common agreement that
> > we
> > > > > would
> > > > > > > like to move that direction.
> > > > > > > (As a side note not just PG but Spark also has similar approach
> > > and I
> > > > > > > basically like the idea).
> > > > > > > If we would go that direction savepoint metadata can be reached
> > in
> > > a
> > > > > way
> > > > > > > that one row would represent
> > > > > > > an operator with it's values something like this:
> > > > > > >
> > > > > > >
> > > > > > >
> > > > > >
> > > > >
> > > >
> > >
> >
> ┌─────────┬─────────┬─────────┬─────────┬─────────┬─────────┬─────────┬────────┐
> > > > > > >
> > > > > > >
> > > > > >
> > > > >
> > > >
> > >
> >
> │operatorN│operatorU│operatorH│paralleli│maxParall│subtaskSt│coordinat│totalSta│
> > > > > > > │ame      │id       │ash      │sm       │elism
> > > > > > > │atesCount│orStateSi│tesSizeI│
> > > > > > > │         │         │         │         │         │
> > > > > > >  │zeInBytes│nBytes  │
> > > > > > >
> > > > > > >
> > > > > >
> > > > >
> > > >
> > >
> >
> ├─────────┼─────────┼─────────┼─────────┼─────────┼─────────┼─────────┼────────┤
> > > > > > > │Source:  │datagen-s│47aee9439│2        │128      │2        │16
> > > > > > >  │546     │
> > > > > > > │datagen-s│ource-uid│4d6ea26e2│         │         │         │
> > > >  │
> > > > > > >     │
> > > > > > > │ource    │         │d544bef0a│         │         │         │
> > > >  │
> > > > > > >     │
> > > > > > > │         │         │37bb5    │         │         │         │
> > > >  │
> > > > > > >     │
> > > > > > >
> > > > > > >
> > > > > >
> > > > >
> > > >
> > >
> >
> ├─────────┼─────────┼─────────┼─────────┼─────────┼─────────┼─────────┼────────┤
> > > > > > > │long-udf-│long-udf-│6ed3f40bf│2        │128      │2        │0
> > > > > │0
> > > > > > >      │
> > > > > > > │with-mast│with-mast│f3c8dfcdf│         │         │         │
> > > >  │
> > > > > > >     │
> > > > > > > │er-hook  │er-hook-u│cb95128a1│         │         │         │
> > > >  │
> > > > > > >     │
> > > > > > > │         │id       │018f1    │         │         │         │
> > > >  │
> > > > > > >     │
> > > > > > >
> > > > > > >
> > > > > >
> > > > >
> > > >
> > >
> >
> ├─────────┼─────────┼─────────┼─────────┼─────────┼─────────┼─────────┼────────┤
> > > > > > > │value-pro│value-pro│ca4f5fe9a│2        │128      │2        │0
> > > > > > > │40726   │
> > > > > > > │cess     │cess-uid │637b656f0│         │         │         │
> > > >  │
> > > > > > >     │
> > > > > > > │         │         │9ea78b3e7│         │         │         │
> > > >  │
> > > > > > >     │
> > > > > > > │         │         │a15b9    │         │         │         │
> > > >  │
> > > > > > >     │
> > > > > > >
> > > > > > >
> > > > > >
> > > > >
> > > >
> > >
> >
> ├─────────┼─────────┼─────────┼─────────┼─────────┼─────────┼─────────┼────────┤
> > > > > > >
> > > > > > > This table can then be joined with the actually existing
> > > `savepoint`
> > > > > > > connector created tables based on UID hash (which is unique and
> > > > always
> > > > > > > exists).
> > > > > > > This would mean that the already existing table would need
> only a
> > > > > single
> > > > > > > metadata column which is the UID hash.
> > > > > > > WDYT?
> > > > > > > @zakelly, plz share your thoughts too.
> > > > > > >
> > > > > > >
> > > > > > > > If we opt to use metadata columns, every record in the table
> > > would
> > > > > end
> > > > > > up
> > > > > > > > having identical values for these columns (please correct me
> if
> > > I’m
> > > > > > > > mistaken). On the other hand, the state connector requires
> > users
> > > to
> > > > > > > specify
> > > > > > > > an operator UID or operator UID hash, after which it outputs
> > > > > > user-defined
> > > > > > > > values in its records. This approach feels somewhat redundant
> > to
> > > > me.
> > > > > > > >
> > > > > > >
> > > > > > > If we would add a new `savepoint-metadata` connector then this
> > can
> > > be
> > > > > > > addressed.
> > > > > > > On the other hand UID and UID hash are having either-or
> > > relationship
> > > > > from
> > > > > > > config perspective,
> > > > > > > so when a user provides the UID then he/she can be interested
> in
> > > the
> > > > > hash
> > > > > > > for further calculations
> > > > > > > (the whole Flink internals are depending on the hash). Printing
> > out
> > > > the
> > > > > > > human readable UID
> > > > > > > is an explicit requirement from the user side because hashes
> are
> > > not
> > > > > > human
> > > > > > > readable.
> > > > > > >
> > > > > > >
> > > > > > > > 3. Handling LIST and MAP States in the State Connector
> > > > > > > > I have concerns about how the current design handles LIST and
> > MAP
> > > > > > states.
> > > > > > > > Specifically, the state connector uses Flink SQL’s MAP and
> > ARRAY
> > > > > types,
> > > > > > > > which implies that it attempts to load entire MAP or LIST
> > states
> > > > into
> > > > > > > > memory.
> > > > > > > >
> > > > > > > > However, in many real-world scenarios, these states can grow
> > very
> > > > > > large.
> > > > > > > > Typically, the state API addresses this by providing an
> > iterator
> > > to
> > > > > > > > traverse elements within the state incrementally. I’m unsure
> > > > whether
> > > > > > I’ve
> > > > > > > > missed something in FLIP-496 or FLIP-512, but it seems that
> the
> > > > > current
> > > > > > > > design might struggle with scalability in such cases.
> > > > > > > >
> > > > > > >
> > > > > > > You see it good, the current implementation keeps state for a
> > > single
> > > > > key
> > > > > > in
> > > > > > > memory.
> > > > > > > Back in the days we've considered this potential issue and
> > > concluded
> > > > > that
> > > > > > > this is not necessarily
> > > > > > > needed for the initial version and can be done as a later
> > > > improvement.
> > > > > > >
> > > > > > > Up until now we've seen even in TB savepoints that the number
> of
> > > keys
> > > > > can
> > > > > > > be extremely huge but not the per key state itself.
> > > > > > > But again, this is a good feature as-is and can be handled in a
> > > > > separate
> > > > > > > jira.
> > > > > > >
> > > > > > >
> > > > > > > >
> > > > > > > > Best,
> > > > > > > > Shengkai
> > > > > > > >
> > > > > > > > [1]
> > https://www.postgresql.org/docs/current/view-pg-tables.html
> > > > > > > > [2]
> > > > > > > >
> > > > > > > >
> > > > > > >
> > > > > >
> > > > >
> > > >
> > >
> >
> https://dev.mysql.com/doc/refman/8.4/en/information-schema-tables-table.html
> > > > > > > >
> > > > > > > > Gabor Somogyi <gabor.g.somo...@gmail.com> 于2025年3月3日周一
> > 02:00写道：
> > > > > > > >
> > > > > > > > > Hi Zakelly,
> > > > > > > > >
> > > > > > > > > In order to shoot for simplicity `METADATA VIRTUAL` as key
> > > words
> > > > > for
> > > > > > > > > definition is the target.
> > > > > > > > > When it's not super complex the latter can be added too.
> > > > > > > > >
> > > > > > > > > BR,
> > > > > > > > > G
> > > > > > > > >
> > > > > > > > >
> > > > > > > > > On Sun, Mar 2, 2025 at 3:37 PM Zakelly Lan <
> > > > zakelly....@gmail.com>
> > > > > > > > wrote:
> > > > > > > > >
> > > > > > > > > > Hi Gabor,
> > > > > > > > > >
> > > > > > > > > > +1 for this.
> > > > > > > > > >
> > > > > > > > > > Will the metadata column use `METADATA VIRTUAL` as key
> > words
> > > > for
> > > > > > > > > > definition, or `METADATA FROM xxx VIRTUAL` for renaming,
> > just
> > > > > like
> > > > > > > the
> > > > > > > > > > Kafka table?
> > > > > > > > > >
> > > > > > > > > >
> > > > > > > > > > Best,
> > > > > > > > > > Zakelly
> > > > > > > > > >
> > > > > > > > > > On Sat, Mar 1, 2025 at 1:31 PM Gabor Somogyi <
> > > > > > > > gabor.g.somo...@gmail.com>
> > > > > > > > > > wrote:
> > > > > > > > > >
> > > > > > > > > > > Hi All,
> > > > > > > > > > >
> > > > > > > > > > > I'd like to start a discussion of FLIP-512: Add meta
> > > > > information
> > > > > > to
> > > > > > > > SQL
> > > > > > > > > > > state connector [1].
> > > > > > > > > > > Feel free to add your thoughts to make this feature
> > better.
> > > > > > > > > > >
> > > > > > > > > > > [1]
> > > > > > > > > > >
> > > > > > > > > > >
> > > > > > > > > >
> > > > > > > > >
> > > > > > > >
> > > > > > >
> > > > > >
> > > > >
> > > >
> > >
> >
> https://cwiki.apache.org/confluence/display/FLINK/FLIP-512%3A+Add+meta+information+to+SQL+state+connector
> > > > > > > > > > >
> > > > > > > > > > > BR,
> > > > > > > > > > > G
> > > > > > > > > > >
> > > > > > > > > >
> > > > > > > > >
> > > > > > > >
> > > > > > >
> > > > > >
> > > > >
> > > >
> > >
> >
>

Re: [DISCUSS] FLIP-512: Add meta information to SQL state connector

Reply via email to