Hi Shengkai,

Please see my comments inline.

BR,
G


On Mon, Mar 3, 2025 at 7:07 AM Shengkai Fang <fskm...@gmail.com> wrote:

> Hi, Gabor. Thanks for your the FLIP. I have some questions about the FLIP:
>
> 1. State TTL for Value Columns
> How can users retrieve the state TTL (Time-to-Live) for each value column?
> From my understanding of the current design, it seems that this
> functionality is not supported. Could you clarify if there are plans to
> address this limitation?
>

Since the state processor API is not yet exposing this information this
would require several steps.
First, the state processor API support needs to be added which can be then
exposed on the SQL API.
This is definitely a future improvement which is useful and can be handled
in a separate jira.


> 2. Metadata Table vs. Metadata Column
> The metadata information described in the FLIP appears to be intended to
> describe the state files stored at a specific location. To me, this concept
> aligns more closely with system tables like pg_tables in PostgreSQL [1] or
> the INFORMATION_SCHEMA in MySQL [2].
>

Adding a new connector with `savepoint-metadata` is a possibility where we
can create such functionality.
I'm not against that, just want to have a common agreement that we would
like to move that direction.
(As a side note not just PG but Spark also has similar approach and I
basically like the idea).
If we would go that direction savepoint metadata can be reached in a way
that one row would represent
an operator with it's values something like this:

┌─────────┬─────────┬─────────┬─────────┬─────────┬─────────┬─────────┬────────┐
│operatorN│operatorU│operatorH│paralleli│maxParall│subtaskSt│coordinat│totalSta│
│ame      │id       │ash      │sm       │elism    │atesCount│orStateSi│tesSizeI│
│         │         │         │         │         │         │zeInBytes│nBytes  │
├─────────┼─────────┼─────────┼─────────┼─────────┼─────────┼─────────┼────────┤
│Source:  │datagen-s│47aee9439│2        │128      │2        │16       │546     │
│datagen-s│ource-uid│4d6ea26e2│         │         │         │         │        │
│ource    │         │d544bef0a│         │         │         │         │        │
│         │         │37bb5    │         │         │         │         │        │
├─────────┼─────────┼─────────┼─────────┼─────────┼─────────┼─────────┼────────┤
│long-udf-│long-udf-│6ed3f40bf│2        │128      │2        │0        │0       │
│with-mast│with-mast│f3c8dfcdf│         │         │         │         │        │
│er-hook  │er-hook-u│cb95128a1│         │         │         │         │        │
│         │id       │018f1    │         │         │         │         │        │
├─────────┼─────────┼─────────┼─────────┼─────────┼─────────┼─────────┼────────┤
│value-pro│value-pro│ca4f5fe9a│2        │128      │2        │0        │40726   │
│cess     │cess-uid │637b656f0│         │         │         │         │        │
│         │         │9ea78b3e7│         │         │         │         │        │
│         │         │a15b9    │         │         │         │         │        │
├─────────┼─────────┼─────────┼─────────┼─────────┼─────────┼─────────┼────────┤

This table can then be joined with the actually existing `savepoint`
connector created tables based on UID hash (which is unique and always
exists).
This would mean that the already existing table would need only a single
metadata column which is the UID hash.
WDYT?
@zakelly, plz share your thoughts too.


> If we opt to use metadata columns, every record in the table would end up
> having identical values for these columns (please correct me if I’m
> mistaken). On the other hand, the state connector requires users to specify
> an operator UID or operator UID hash, after which it outputs user-defined
> values in its records. This approach feels somewhat redundant to me.
>

If we would add a new `savepoint-metadata` connector then this can be
addressed.
On the other hand UID and UID hash are having either-or relationship from
config perspective,
so when a user provides the UID then he/she can be interested in the hash
for further calculations
(the whole Flink internals are depending on the hash). Printing out the
human readable UID
is an explicit requirement from the user side because hashes are not human
readable.


> 3. Handling LIST and MAP States in the State Connector
> I have concerns about how the current design handles LIST and MAP states.
> Specifically, the state connector uses Flink SQL’s MAP and ARRAY types,
> which implies that it attempts to load entire MAP or LIST states into
> memory.
>
> However, in many real-world scenarios, these states can grow very large.
> Typically, the state API addresses this by providing an iterator to
> traverse elements within the state incrementally. I’m unsure whether I’ve
> missed something in FLIP-496 or FLIP-512, but it seems that the current
> design might struggle with scalability in such cases.
>

You see it good, the current implementation keeps state for a single key in
memory.
Back in the days we've considered this potential issue and concluded that
this is not necessarily
needed for the initial version and can be done as a later improvement.

Up until now we've seen even in TB savepoints that the number of keys can
be extremely huge but not the per key state itself.
But again, this is a good feature as-is and can be handled in a separate
jira.


>
> Best,
> Shengkai
>
> [1] https://www.postgresql.org/docs/current/view-pg-tables.html
> [2]
>
> https://dev.mysql.com/doc/refman/8.4/en/information-schema-tables-table.html
>
> Gabor Somogyi <gabor.g.somo...@gmail.com> 于2025年3月3日周一 02:00写道:
>
> > Hi Zakelly,
> >
> > In order to shoot for simplicity `METADATA VIRTUAL` as key words for
> > definition is the target.
> > When it's not super complex the latter can be added too.
> >
> > BR,
> > G
> >
> >
> > On Sun, Mar 2, 2025 at 3:37 PM Zakelly Lan <zakelly....@gmail.com>
> wrote:
> >
> > > Hi Gabor,
> > >
> > > +1 for this.
> > >
> > > Will the metadata column use `METADATA VIRTUAL` as key words for
> > > definition, or `METADATA FROM xxx VIRTUAL` for renaming, just like the
> > > Kafka table?
> > >
> > >
> > > Best,
> > > Zakelly
> > >
> > > On Sat, Mar 1, 2025 at 1:31 PM Gabor Somogyi <
> gabor.g.somo...@gmail.com>
> > > wrote:
> > >
> > > > Hi All,
> > > >
> > > > I'd like to start a discussion of FLIP-512: Add meta information to
> SQL
> > > > state connector [1].
> > > > Feel free to add your thoughts to make this feature better.
> > > >
> > > > [1]
> > > >
> > > >
> > >
> >
> https://cwiki.apache.org/confluence/display/FLINK/FLIP-512%3A+Add+meta+information+to+SQL+state+connector
> > > >
> > > > BR,
> > > > G
> > > >
> > >
> >
>

Reply via email to