Re: [DISCUSS] FLIP-512: Add meta information to SQL state connector

Gabor Somogyi Tue, 11 Mar 2025 01:58:17 -0700

Hi Shengkai,

> 1. State TTL for Value Columns


>From directional perspective I agree your idea how it can be implemented.
Previously I've mentioned that TTL information is not exposed on the state
processor API (which the SQL state connector uses to read data)
and unless somebody show me the opposite this FLIP is not going to address
this to avoid feature creep. Our users are also interested in TTL so
sooner or later we're going to expose it, this is matter of scheduling.

> 2. Adding a new connector with `savepoint-metadata`

Not sure I understand your point at all related StateCatalog. First of all
I can't agree more that StateCatalog is needed and is a planned building
block in an upcoming
FLIP but not sure how can it help now? No matter what, your knowledge is
essential when we add StateCatalog. Let me expose my understanding in this
area:
* First we need create table statements to access state data and metadata
* When we have that then we can add StateCatalog which could potentially
ease the life of users by for ex. giving off-the-shelf tables without
sweating with create table statements

User expectations:
* See state data (this is fulfilled with the existing connector)
* See metadata about state data like TTL (this can be added as metadata
column as you suggested since it belongs to the data)
* See metadata about operators (this can be added from savepoint-metadata)

Important to highlight that state data table format differs from state
metadata table format. Namely one table has rows for state values and
another has rows for operators, right?
I think that's the reason why you've pinpointed out that the suggested
metadata columns are somewhat clunky.

As a conclusion I agree to add ${state-name}_ttl metadata column later on
since it belongs to the state value and adding a new table type (like you
suggested similar to PG [1])
for metadata. Please see how Spark does that too [2].

If you have better approach then please elaborate with more details and
help me to understand your point.

> Up until now we've seen even in TB savepoints that the number of keys can
> be extremely huge but not the per key state itself.
> But again, this is a good feature as-is and can be handled in a separate
> jira.

I've just created https://issues.apache.org/jira/browse/FLINK-37456.

[1] https://www.postgresql.org/docs/current/view-pg-tables.html
[2]
https://www.databricks.com/blog/announcing-state-reader-api-new-statestore-data-source

BR,
G


On Tue, Mar 11, 2025 at 3:55 AM Shengkai Fang <[email protected]> wrote:

> Hi, Gabor. Thanks for your response.
>
> > 1. State TTL for Value Columns
>
> Thank you for addressing the limitations here. However, I believe it would
> be beneficial to further clarify the API in this FLIP regarding how users
> can specify the TTL column.
>
> One potential approach that comes to mind is using a standardized naming
> convention such as ${state-name}_ttl for the metadata column that defines
> the TTL value. In terms of implementation, the listReadableMetadata
> function could:
>
> 1. Read the table’s columns and configuration,
> 2. Extract all defined state names, and
> 3. Return a structured list of metadata entries formatted as
> ${state-name}_ttl.
>
> WDYT?
>
> > 2. Adding a new connector with `savepoint-metadata`
>
> Introducing a new connector type at this stage may unnecessarily complicate
> the system. Given that every table already belongs to a Catalog, which is
> designed to provide a Factory for building source or sink connectors, I
> propose integrating a dedicated StateCatalog instead. This approach would
> allow us to:
>
> 1. Leverage the Catalog’s existing capabilities to manage TTL metadata
> (e.g., state names and TTL logic) without duplicating functionality.
> 2. Provide a unified interface for connector instantiation and metadata
> handling through the Catalog’s Factory pattern.
>
> Would this design decision better align with our architecture’s
> extensibility and reduce redundancy?
>
> > Up until now we've seen even in TB savepoints that the number of keys can
> > be extremely huge but not the per key state itself.
> > But again, this is a good feature as-is and can be handled in a separate
> > jira.
>
> +1 for a separate jira.
>
> Best,
> Shengkai
>
> Gabor Somogyi <[email protected]> 于2025年3月10日周一 19:05写道：
>
> > Hi Shengkai,
> >
> > Please see my comments inline.
> >
> > BR,
> > G
> >
> >
> > On Mon, Mar 3, 2025 at 7:07 AM Shengkai Fang <[email protected]> wrote:
> >
> > > Hi, Gabor. Thanks for your the FLIP. I have some questions about the
> > FLIP:
> > >
> > > 1. State TTL for Value Columns
> > > How can users retrieve the state TTL (Time-to-Live) for each value
> > column?
> > > From my understanding of the current design, it seems that this
> > > functionality is not supported. Could you clarify if there are plans to
> > > address this limitation?
> > >
> >
> > Since the state processor API is not yet exposing this information this
> > would require several steps.
> > First, the state processor API support needs to be added which can be
> then
> > exposed on the SQL API.
> > This is definitely a future improvement which is useful and can be
> handled
> > in a separate jira.
> >
> >
> > > 2. Metadata Table vs. Metadata Column
> > > The metadata information described in the FLIP appears to be intended
> to
> > > describe the state files stored at a specific location. To me, this
> > concept
> > > aligns more closely with system tables like pg_tables in PostgreSQL [1]
> > or
> > > the INFORMATION_SCHEMA in MySQL [2].
> > >
> >
> > Adding a new connector with `savepoint-metadata` is a possibility where
> we
> > can create such functionality.
> > I'm not against that, just want to have a common agreement that we would
> > like to move that direction.
> > (As a side note not just PG but Spark also has similar approach and I
> > basically like the idea).
> > If we would go that direction savepoint metadata can be reached in a way
> > that one row would represent
> > an operator with it's values something like this:
> >
> >
> >
> ┌─────────┬─────────┬─────────┬─────────┬─────────┬─────────┬─────────┬────────┐
> >
> >
> │operatorN│operatorU│operatorH│paralleli│maxParall│subtaskSt│coordinat│totalSta│
> > │ame      │id       │ash      │sm       │elism
> > │atesCount│orStateSi│tesSizeI│
> > │         │         │         │         │         │
> >  │zeInBytes│nBytes  │
> >
> >
> ├─────────┼─────────┼─────────┼─────────┼─────────┼─────────┼─────────┼────────┤
> > │Source:  │datagen-s│47aee9439│2        │128      │2        │16
> >  │546     │
> > │datagen-s│ource-uid│4d6ea26e2│         │         │         │         │
> >     │
> > │ource    │         │d544bef0a│         │         │         │         │
> >     │
> > │         │         │37bb5    │         │         │         │         │
> >     │
> >
> >
> ├─────────┼─────────┼─────────┼─────────┼─────────┼─────────┼─────────┼────────┤
> > │long-udf-│long-udf-│6ed3f40bf│2        │128      │2        │0        │0
> >      │
> > │with-mast│with-mast│f3c8dfcdf│         │         │         │         │
> >     │
> > │er-hook  │er-hook-u│cb95128a1│         │         │         │         │
> >     │
> > │         │id       │018f1    │         │         │         │         │
> >     │
> >
> >
> ├─────────┼─────────┼─────────┼─────────┼─────────┼─────────┼─────────┼────────┤
> > │value-pro│value-pro│ca4f5fe9a│2        │128      │2        │0
> > │40726   │
> > │cess     │cess-uid │637b656f0│         │         │         │         │
> >     │
> > │         │         │9ea78b3e7│         │         │         │         │
> >     │
> > │         │         │a15b9    │         │         │         │         │
> >     │
> >
> >
> ├─────────┼─────────┼─────────┼─────────┼─────────┼─────────┼─────────┼────────┤
> >
> > This table can then be joined with the actually existing `savepoint`
> > connector created tables based on UID hash (which is unique and always
> > exists).
> > This would mean that the already existing table would need only a single
> > metadata column which is the UID hash.
> > WDYT?
> > @zakelly, plz share your thoughts too.
> >
> >
> > > If we opt to use metadata columns, every record in the table would end
> up
> > > having identical values for these columns (please correct me if I’m
> > > mistaken). On the other hand, the state connector requires users to
> > specify
> > > an operator UID or operator UID hash, after which it outputs
> user-defined
> > > values in its records. This approach feels somewhat redundant to me.
> > >
> >
> > If we would add a new `savepoint-metadata` connector then this can be
> > addressed.
> > On the other hand UID and UID hash are having either-or relationship from
> > config perspective,
> > so when a user provides the UID then he/she can be interested in the hash
> > for further calculations
> > (the whole Flink internals are depending on the hash). Printing out the
> > human readable UID
> > is an explicit requirement from the user side because hashes are not
> human
> > readable.
> >
> >
> > > 3. Handling LIST and MAP States in the State Connector
> > > I have concerns about how the current design handles LIST and MAP
> states.
> > > Specifically, the state connector uses Flink SQL’s MAP and ARRAY types,
> > > which implies that it attempts to load entire MAP or LIST states into
> > > memory.
> > >
> > > However, in many real-world scenarios, these states can grow very
> large.
> > > Typically, the state API addresses this by providing an iterator to
> > > traverse elements within the state incrementally. I’m unsure whether
> I’ve
> > > missed something in FLIP-496 or FLIP-512, but it seems that the current
> > > design might struggle with scalability in such cases.
> > >
> >
> > You see it good, the current implementation keeps state for a single key
> in
> > memory.
> > Back in the days we've considered this potential issue and concluded that
> > this is not necessarily
> > needed for the initial version and can be done as a later improvement.
> >
> > Up until now we've seen even in TB savepoints that the number of keys can
> > be extremely huge but not the per key state itself.
> > But again, this is a good feature as-is and can be handled in a separate
> > jira.
> >
> >
> > >
> > > Best,
> > > Shengkai
> > >
> > > [1] https://www.postgresql.org/docs/current/view-pg-tables.html
> > > [2]
> > >
> > >
> >
> https://dev.mysql.com/doc/refman/8.4/en/information-schema-tables-table.html
> > >
> > > Gabor Somogyi <[email protected]> 于2025年3月3日周一 02:00写道：
> > >
> > > > Hi Zakelly,
> > > >
> > > > In order to shoot for simplicity `METADATA VIRTUAL` as key words for
> > > > definition is the target.
> > > > When it's not super complex the latter can be added too.
> > > >
> > > > BR,
> > > > G
> > > >
> > > >
> > > > On Sun, Mar 2, 2025 at 3:37 PM Zakelly Lan <[email protected]>
> > > wrote:
> > > >
> > > > > Hi Gabor,
> > > > >
> > > > > +1 for this.
> > > > >
> > > > > Will the metadata column use `METADATA VIRTUAL` as key words for
> > > > > definition, or `METADATA FROM xxx VIRTUAL` for renaming, just like
> > the
> > > > > Kafka table?
> > > > >
> > > > >
> > > > > Best,
> > > > > Zakelly
> > > > >
> > > > > On Sat, Mar 1, 2025 at 1:31 PM Gabor Somogyi <
> > > [email protected]>
> > > > > wrote:
> > > > >
> > > > > > Hi All,
> > > > > >
> > > > > > I'd like to start a discussion of FLIP-512: Add meta information
> to
> > > SQL
> > > > > > state connector [1].
> > > > > > Feel free to add your thoughts to make this feature better.
> > > > > >
> > > > > > [1]
> > > > > >
> > > > > >
> > > > >
> > > >
> > >
> >
> https://cwiki.apache.org/confluence/display/FLINK/FLIP-512%3A+Add+meta+information+to+SQL+state+connector
> > > > > >
> > > > > > BR,
> > > > > > G
> > > > > >
> > > > >
> > > >
> > >
> >
>

Re: [DISCUSS] FLIP-512: Add meta information to SQL state connector

Reply via email to