Hi, Gabor. Thanks for your response.

> 1. State TTL for Value Columns

Thank you for addressing the limitations here. However, I believe it would
be beneficial to further clarify the API in this FLIP regarding how users
can specify the TTL column.

One potential approach that comes to mind is using a standardized naming
convention such as ${state-name}_ttl for the metadata column that defines
the TTL value. In terms of implementation, the listReadableMetadata
function could:

1. Read the table’s columns and configuration,
2. Extract all defined state names, and
3. Return a structured list of metadata entries formatted as
${state-name}_ttl.

WDYT?

> 2. Adding a new connector with `savepoint-metadata`

Introducing a new connector type at this stage may unnecessarily complicate
the system. Given that every table already belongs to a Catalog, which is
designed to provide a Factory for building source or sink connectors, I
propose integrating a dedicated StateCatalog instead. This approach would
allow us to:

1. Leverage the Catalog’s existing capabilities to manage TTL metadata
(e.g., state names and TTL logic) without duplicating functionality.
2. Provide a unified interface for connector instantiation and metadata
handling through the Catalog’s Factory pattern.

Would this design decision better align with our architecture’s
extensibility and reduce redundancy?

> Up until now we've seen even in TB savepoints that the number of keys can
> be extremely huge but not the per key state itself.
> But again, this is a good feature as-is and can be handled in a separate
> jira.

+1 for a separate jira.

Best,
Shengkai

Gabor Somogyi <gabor.g.somo...@gmail.com> 于2025年3月10日周一 19:05写道:

> Hi Shengkai,
>
> Please see my comments inline.
>
> BR,
> G
>
>
> On Mon, Mar 3, 2025 at 7:07 AM Shengkai Fang <fskm...@gmail.com> wrote:
>
> > Hi, Gabor. Thanks for your the FLIP. I have some questions about the
> FLIP:
> >
> > 1. State TTL for Value Columns
> > How can users retrieve the state TTL (Time-to-Live) for each value
> column?
> > From my understanding of the current design, it seems that this
> > functionality is not supported. Could you clarify if there are plans to
> > address this limitation?
> >
>
> Since the state processor API is not yet exposing this information this
> would require several steps.
> First, the state processor API support needs to be added which can be then
> exposed on the SQL API.
> This is definitely a future improvement which is useful and can be handled
> in a separate jira.
>
>
> > 2. Metadata Table vs. Metadata Column
> > The metadata information described in the FLIP appears to be intended to
> > describe the state files stored at a specific location. To me, this
> concept
> > aligns more closely with system tables like pg_tables in PostgreSQL [1]
> or
> > the INFORMATION_SCHEMA in MySQL [2].
> >
>
> Adding a new connector with `savepoint-metadata` is a possibility where we
> can create such functionality.
> I'm not against that, just want to have a common agreement that we would
> like to move that direction.
> (As a side note not just PG but Spark also has similar approach and I
> basically like the idea).
> If we would go that direction savepoint metadata can be reached in a way
> that one row would represent
> an operator with it's values something like this:
>
>
> ┌─────────┬─────────┬─────────┬─────────┬─────────┬─────────┬─────────┬────────┐
>
> │operatorN│operatorU│operatorH│paralleli│maxParall│subtaskSt│coordinat│totalSta│
> │ame      │id       │ash      │sm       │elism
> │atesCount│orStateSi│tesSizeI│
> │         │         │         │         │         │
>  │zeInBytes│nBytes  │
>
> ├─────────┼─────────┼─────────┼─────────┼─────────┼─────────┼─────────┼────────┤
> │Source:  │datagen-s│47aee9439│2        │128      │2        │16
>  │546     │
> │datagen-s│ource-uid│4d6ea26e2│         │         │         │         │
>     │
> │ource    │         │d544bef0a│         │         │         │         │
>     │
> │         │         │37bb5    │         │         │         │         │
>     │
>
> ├─────────┼─────────┼─────────┼─────────┼─────────┼─────────┼─────────┼────────┤
> │long-udf-│long-udf-│6ed3f40bf│2        │128      │2        │0        │0
>      │
> │with-mast│with-mast│f3c8dfcdf│         │         │         │         │
>     │
> │er-hook  │er-hook-u│cb95128a1│         │         │         │         │
>     │
> │         │id       │018f1    │         │         │         │         │
>     │
>
> ├─────────┼─────────┼─────────┼─────────┼─────────┼─────────┼─────────┼────────┤
> │value-pro│value-pro│ca4f5fe9a│2        │128      │2        │0
> │40726   │
> │cess     │cess-uid │637b656f0│         │         │         │         │
>     │
> │         │         │9ea78b3e7│         │         │         │         │
>     │
> │         │         │a15b9    │         │         │         │         │
>     │
>
> ├─────────┼─────────┼─────────┼─────────┼─────────┼─────────┼─────────┼────────┤
>
> This table can then be joined with the actually existing `savepoint`
> connector created tables based on UID hash (which is unique and always
> exists).
> This would mean that the already existing table would need only a single
> metadata column which is the UID hash.
> WDYT?
> @zakelly, plz share your thoughts too.
>
>
> > If we opt to use metadata columns, every record in the table would end up
> > having identical values for these columns (please correct me if I’m
> > mistaken). On the other hand, the state connector requires users to
> specify
> > an operator UID or operator UID hash, after which it outputs user-defined
> > values in its records. This approach feels somewhat redundant to me.
> >
>
> If we would add a new `savepoint-metadata` connector then this can be
> addressed.
> On the other hand UID and UID hash are having either-or relationship from
> config perspective,
> so when a user provides the UID then he/she can be interested in the hash
> for further calculations
> (the whole Flink internals are depending on the hash). Printing out the
> human readable UID
> is an explicit requirement from the user side because hashes are not human
> readable.
>
>
> > 3. Handling LIST and MAP States in the State Connector
> > I have concerns about how the current design handles LIST and MAP states.
> > Specifically, the state connector uses Flink SQL’s MAP and ARRAY types,
> > which implies that it attempts to load entire MAP or LIST states into
> > memory.
> >
> > However, in many real-world scenarios, these states can grow very large.
> > Typically, the state API addresses this by providing an iterator to
> > traverse elements within the state incrementally. I’m unsure whether I’ve
> > missed something in FLIP-496 or FLIP-512, but it seems that the current
> > design might struggle with scalability in such cases.
> >
>
> You see it good, the current implementation keeps state for a single key in
> memory.
> Back in the days we've considered this potential issue and concluded that
> this is not necessarily
> needed for the initial version and can be done as a later improvement.
>
> Up until now we've seen even in TB savepoints that the number of keys can
> be extremely huge but not the per key state itself.
> But again, this is a good feature as-is and can be handled in a separate
> jira.
>
>
> >
> > Best,
> > Shengkai
> >
> > [1] https://www.postgresql.org/docs/current/view-pg-tables.html
> > [2]
> >
> >
> https://dev.mysql.com/doc/refman/8.4/en/information-schema-tables-table.html
> >
> > Gabor Somogyi <gabor.g.somo...@gmail.com> 于2025年3月3日周一 02:00写道:
> >
> > > Hi Zakelly,
> > >
> > > In order to shoot for simplicity `METADATA VIRTUAL` as key words for
> > > definition is the target.
> > > When it's not super complex the latter can be added too.
> > >
> > > BR,
> > > G
> > >
> > >
> > > On Sun, Mar 2, 2025 at 3:37 PM Zakelly Lan <zakelly....@gmail.com>
> > wrote:
> > >
> > > > Hi Gabor,
> > > >
> > > > +1 for this.
> > > >
> > > > Will the metadata column use `METADATA VIRTUAL` as key words for
> > > > definition, or `METADATA FROM xxx VIRTUAL` for renaming, just like
> the
> > > > Kafka table?
> > > >
> > > >
> > > > Best,
> > > > Zakelly
> > > >
> > > > On Sat, Mar 1, 2025 at 1:31 PM Gabor Somogyi <
> > gabor.g.somo...@gmail.com>
> > > > wrote:
> > > >
> > > > > Hi All,
> > > > >
> > > > > I'd like to start a discussion of FLIP-512: Add meta information to
> > SQL
> > > > > state connector [1].
> > > > > Feel free to add your thoughts to make this feature better.
> > > > >
> > > > > [1]
> > > > >
> > > > >
> > > >
> > >
> >
> https://cwiki.apache.org/confluence/display/FLINK/FLIP-512%3A+Add+meta+information+to+SQL+state+connector
> > > > >
> > > > > BR,
> > > > > G
> > > > >
> > > >
> > >
> >
>

Reply via email to