Looking forward to hearing the good news! Best, Shengkai
Gabor Somogyi <gabor.g.somo...@gmail.com> 于2025年3月12日周三 22:24写道: > Thanks for both the valuable input! > > Let me take a closer look at the suggestions, like the Catalog capabilities > and possibility of embedding TypeInformation or > StateDescriptor metadata directly into the raw state files... > > BR, > G > > > On Wed, Mar 12, 2025 at 8:17 AM Shengkai Fang <fskm...@gmail.com> wrote: > > > Thanks for Zakelly's clarification. > > > > 1. State TTL for Value Columns > > > > +1 to delay the discussion about this. > > > > 2. Metadata Table vs. Metadata Column > > > > I’d like to share my perspective on the State Catalog proposal. While > > introducing this capability is beneficial, there is a blocker: the > current > > StateBackend architecture does not permit operators to encode > > TypeInformation into the state—it only preserves the Serializer. This > > limitation creates an asymmetry, as operators alone retain knowledge of > the > > data structure’s schema. > > > > To address this, I suggest allowing operators to embed TypeInformation or > > StateDescriptor metadata directly into the raw state files. Such a design > > would enable the Catalog to: > > > > 1. Parse state files and programmatically derive the schema and > structural > > guarantees for each state. > > 2. Leverage existing Flink Table utilities, such as > > LegacyTypeInfoDataTypeConverter (in org.apache.flink.table.types.utils), > to > > bridge TypeInformation and DataType conversions. > > > > If we can not store the TypeInformation or StateDescriptor into the raw > > state files, I am +1 for this FLIP to use metadata column to retrieve > > information. > > > > Best, > > Shengkai > > > > > > > > > > Zakelly Lan <zakelly....@gmail.com> 于2025年3月12日周三 12:43写道: > > > > > Hi Gabor and Shengkai, > > > > > > Thanks for sharing your thoughts! This is a long discussion and sorry > for > > > the late reply (I'm busy catching up with release 2.0 these days). > > > > > > 1. State TTL for Value Columns > > > > > > > > > Let me first clarify your thoughts to ensure I understand correctly. > > IIUC, > > > there is no persistent configuration for state TTL in the checkpoint. > > While > > > you can infer that TTL is enabled by reading the serializer, the > > checkpoint > > > itself only stores the last access time for each value. So the only > thing > > > we can show is the last access time for each value. But it is not > > required > > > for all state backends to store this, as they may directly store the > > > expired time. This will also increase the difficulty of implementation > & > > > maintenance. > > > > > > This once again reiterates the importance of unified metadata for > > > checkpoints. I’m planning on adding this, and we may collaborate on it > in > > > the future. > > > > > > 2. Metadata Table vs. Metadata Column > > > > > > > > > I'm not in favor of adding a new connector for metadata. The metadata > is > > > more like one-time information instead of a streaming data that changes > > all > > > the time, so a single connector seems to be an overkill. It is not easy > > to > > > withdraw a connector if we have a better solution in future. I'm not > > > familiar with current Catalog capabilities, and if it could extract and > > > show some operator-level information from savepoint, that would be > great. > > > > > > If the Catalog can't do that, I would consider the current FLIP to be a > > > compromise solution. > > > > > > And if we have that unified metadata for checkpoint/savepoint in > future, > > we > > > may directly register savepoint in catalog, and create a source without > > > specifying complex columns, as well as describe the savepoint catalog > to > > > get the metadata. That's a good solution in my mind. > > > > > > > > > Best, > > > Zakelly > > > > > > On Wed, Mar 12, 2025 at 10:35 AM Shengkai Fang <fskm...@gmail.com> > > wrote: > > > > > > > Hi Gabor, > > > > > > > > > 2. Adding a new connector with `savepoint-metadata` > > > > > > > > I would argue against introducing a new connector type named > > > > savepoint-metadata, as the existing Catalog mechanism can inherently > > > > provide the necessary connector factory capabilities. I’ve detailed > > this > > > > proposal in branch[1]. Please take a moment to review it. > > > > > > > > If we introduce a connector named `savepoint-metadata`, it means user > > can > > > > create a temporary table with connector `savepoint-metadata` and the > > > > connector needs to check whether table schema is same to the schema > we > > > > proposed in the FLIP. On the other hand, it's not easy work for > others > > to > > > > users a metadata table with same schema. > > > > > > > > [1] > > > > > > > > > > > > > > https://github.com/apache/flink/compare/master...fsk119:flink:state-metadata?expand=1#diff-712a7bc92fe46c405fb0e61b475bb2a005cb7a72bab7df28bbb92744bcb5f465R63 > > > > > > > > Best, > > > > Shengkai > > > > > > > > Gabor Somogyi <gabor.g.somo...@gmail.com> 于2025年3月11日周二 16:56写道: > > > > > > > > > Hi Shengkai, > > > > > > > > > > > 1. State TTL for Value Columns > > > > > > > > > > From directional perspective I agree your idea how it can be > > > implemented. > > > > > Previously I've mentioned that TTL information is not exposed on > the > > > > state > > > > > processor API (which the SQL state connector uses to read data) > > > > > and unless somebody show me the opposite this FLIP is not going to > > > > address > > > > > this to avoid feature creep. Our users are also interested in TTL > so > > > > > sooner or later we're going to expose it, this is matter of > > scheduling. > > > > > > > > > > > 2. Adding a new connector with `savepoint-metadata` > > > > > > > > > > Not sure I understand your point at all related StateCatalog. First > > of > > > > all > > > > > I can't agree more that StateCatalog is needed and is a planned > > > building > > > > > block in an upcoming > > > > > FLIP but not sure how can it help now? No matter what, your > knowledge > > > is > > > > > essential when we add StateCatalog. Let me expose my understanding > in > > > > this > > > > > area: > > > > > * First we need create table statements to access state data and > > > metadata > > > > > * When we have that then we can add StateCatalog which could > > > potentially > > > > > ease the life of users by for ex. giving off-the-shelf tables > without > > > > > sweating with create table statements > > > > > > > > > > User expectations: > > > > > * See state data (this is fulfilled with the existing connector) > > > > > * See metadata about state data like TTL (this can be added as > > metadata > > > > > column as you suggested since it belongs to the data) > > > > > * See metadata about operators (this can be added from > > > > savepoint-metadata) > > > > > > > > > > Important to highlight that state data table format differs from > > state > > > > > metadata table format. Namely one table has rows for state values > and > > > > > another has rows for operators, right? > > > > > I think that's the reason why you've pinpointed out that the > > suggested > > > > > metadata columns are somewhat clunky. > > > > > > > > > > As a conclusion I agree to add ${state-name}_ttl metadata column > > later > > > on > > > > > since it belongs to the state value and adding a new table type > (like > > > you > > > > > suggested similar to PG [1]) > > > > > for metadata. Please see how Spark does that too [2]. > > > > > > > > > > If you have better approach then please elaborate with more details > > and > > > > > help me to understand your point. > > > > > > > > > > > Up until now we've seen even in TB savepoints that the number of > > keys > > > > can > > > > > > be extremely huge but not the per key state itself. > > > > > > But again, this is a good feature as-is and can be handled in a > > > > separate > > > > > > jira. > > > > > > > > > > I've just created > https://issues.apache.org/jira/browse/FLINK-37456. > > > > > > > > > > [1] https://www.postgresql.org/docs/current/view-pg-tables.html > > > > > [2] > > > > > > > > > > > > > > > > > > > > https://www.databricks.com/blog/announcing-state-reader-api-new-statestore-data-source > > > > > > > > > > BR, > > > > > G > > > > > > > > > > > > > > > On Tue, Mar 11, 2025 at 3:55 AM Shengkai Fang <fskm...@gmail.com> > > > wrote: > > > > > > > > > > > Hi, Gabor. Thanks for your response. > > > > > > > > > > > > > 1. State TTL for Value Columns > > > > > > > > > > > > Thank you for addressing the limitations here. However, I believe > > it > > > > > would > > > > > > be beneficial to further clarify the API in this FLIP regarding > how > > > > users > > > > > > can specify the TTL column. > > > > > > > > > > > > One potential approach that comes to mind is using a standardized > > > > naming > > > > > > convention such as ${state-name}_ttl for the metadata column that > > > > defines > > > > > > the TTL value. In terms of implementation, the > listReadableMetadata > > > > > > function could: > > > > > > > > > > > > 1. Read the table’s columns and configuration, > > > > > > 2. Extract all defined state names, and > > > > > > 3. Return a structured list of metadata entries formatted as > > > > > > ${state-name}_ttl. > > > > > > > > > > > > WDYT? > > > > > > > > > > > > > 2. Adding a new connector with `savepoint-metadata` > > > > > > > > > > > > Introducing a new connector type at this stage may unnecessarily > > > > > complicate > > > > > > the system. Given that every table already belongs to a Catalog, > > > which > > > > is > > > > > > designed to provide a Factory for building source or sink > > > connectors, I > > > > > > propose integrating a dedicated StateCatalog instead. This > approach > > > > would > > > > > > allow us to: > > > > > > > > > > > > 1. Leverage the Catalog’s existing capabilities to manage TTL > > > metadata > > > > > > (e.g., state names and TTL logic) without duplicating > > functionality. > > > > > > 2. Provide a unified interface for connector instantiation and > > > metadata > > > > > > handling through the Catalog’s Factory pattern. > > > > > > > > > > > > Would this design decision better align with our architecture’s > > > > > > extensibility and reduce redundancy? > > > > > > > > > > > > > Up until now we've seen even in TB savepoints that the number > of > > > keys > > > > > can > > > > > > > be extremely huge but not the per key state itself. > > > > > > > But again, this is a good feature as-is and can be handled in a > > > > > separate > > > > > > > jira. > > > > > > > > > > > > +1 for a separate jira. > > > > > > > > > > > > Best, > > > > > > Shengkai > > > > > > > > > > > > Gabor Somogyi <gabor.g.somo...@gmail.com> 于2025年3月10日周一 19:05写道: > > > > > > > > > > > > > Hi Shengkai, > > > > > > > > > > > > > > Please see my comments inline. > > > > > > > > > > > > > > BR, > > > > > > > G > > > > > > > > > > > > > > > > > > > > > On Mon, Mar 3, 2025 at 7:07 AM Shengkai Fang < > fskm...@gmail.com> > > > > > wrote: > > > > > > > > > > > > > > > Hi, Gabor. Thanks for your the FLIP. I have some questions > > about > > > > the > > > > > > > FLIP: > > > > > > > > > > > > > > > > 1. State TTL for Value Columns > > > > > > > > How can users retrieve the state TTL (Time-to-Live) for each > > > value > > > > > > > column? > > > > > > > > From my understanding of the current design, it seems that > this > > > > > > > > functionality is not supported. Could you clarify if there > are > > > > plans > > > > > to > > > > > > > > address this limitation? > > > > > > > > > > > > > > > > > > > > > > Since the state processor API is not yet exposing this > > information > > > > this > > > > > > > would require several steps. > > > > > > > First, the state processor API support needs to be added which > > can > > > be > > > > > > then > > > > > > > exposed on the SQL API. > > > > > > > This is definitely a future improvement which is useful and can > > be > > > > > > handled > > > > > > > in a separate jira. > > > > > > > > > > > > > > > > > > > > > > 2. Metadata Table vs. Metadata Column > > > > > > > > The metadata information described in the FLIP appears to be > > > > intended > > > > > > to > > > > > > > > describe the state files stored at a specific location. To > me, > > > this > > > > > > > concept > > > > > > > > aligns more closely with system tables like pg_tables in > > > PostgreSQL > > > > > [1] > > > > > > > or > > > > > > > > the INFORMATION_SCHEMA in MySQL [2]. > > > > > > > > > > > > > > > > > > > > > > Adding a new connector with `savepoint-metadata` is a > possibility > > > > where > > > > > > we > > > > > > > can create such functionality. > > > > > > > I'm not against that, just want to have a common agreement that > > we > > > > > would > > > > > > > like to move that direction. > > > > > > > (As a side note not just PG but Spark also has similar approach > > > and I > > > > > > > basically like the idea). > > > > > > > If we would go that direction savepoint metadata can be reached > > in > > > a > > > > > way > > > > > > > that one row would represent > > > > > > > an operator with it's values something like this: > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > ┌─────────┬─────────┬─────────┬─────────┬─────────┬─────────┬─────────┬────────┐ > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > │operatorN│operatorU│operatorH│paralleli│maxParall│subtaskSt│coordinat│totalSta│ > > > > > > > │ame │id │ash │sm │elism > > > > > > > │atesCount│orStateSi│tesSizeI│ > > > > > > > │ │ │ │ │ │ > > > > > > > │zeInBytes│nBytes │ > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > ├─────────┼─────────┼─────────┼─────────┼─────────┼─────────┼─────────┼────────┤ > > > > > > > │Source: │datagen-s│47aee9439│2 │128 │2 │16 > > > > > > > │546 │ > > > > > > > │datagen-s│ource-uid│4d6ea26e2│ │ │ │ > > > > │ > > > > > > > │ > > > > > > > │ource │ │d544bef0a│ │ │ │ > > > > │ > > > > > > > │ > > > > > > > │ │ │37bb5 │ │ │ │ > > > > │ > > > > > > > │ > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > ├─────────┼─────────┼─────────┼─────────┼─────────┼─────────┼─────────┼────────┤ > > > > > > > │long-udf-│long-udf-│6ed3f40bf│2 │128 │2 │0 > > > > > │0 > > > > > > > │ > > > > > > > │with-mast│with-mast│f3c8dfcdf│ │ │ │ > > > > │ > > > > > > > │ > > > > > > > │er-hook │er-hook-u│cb95128a1│ │ │ │ > > > > │ > > > > > > > │ > > > > > > > │ │id │018f1 │ │ │ │ > > > > │ > > > > > > > │ > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > ├─────────┼─────────┼─────────┼─────────┼─────────┼─────────┼─────────┼────────┤ > > > > > > > │value-pro│value-pro│ca4f5fe9a│2 │128 │2 │0 > > > > > > > │40726 │ > > > > > > > │cess │cess-uid │637b656f0│ │ │ │ > > > > │ > > > > > > > │ > > > > > > > │ │ │9ea78b3e7│ │ │ │ > > > > │ > > > > > > > │ > > > > > > > │ │ │a15b9 │ │ │ │ > > > > │ > > > > > > > │ > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > ├─────────┼─────────┼─────────┼─────────┼─────────┼─────────┼─────────┼────────┤ > > > > > > > > > > > > > > This table can then be joined with the actually existing > > > `savepoint` > > > > > > > connector created tables based on UID hash (which is unique and > > > > always > > > > > > > exists). > > > > > > > This would mean that the already existing table would need > only a > > > > > single > > > > > > > metadata column which is the UID hash. > > > > > > > WDYT? > > > > > > > @zakelly, plz share your thoughts too. > > > > > > > > > > > > > > > > > > > > > > If we opt to use metadata columns, every record in the table > > > would > > > > > end > > > > > > up > > > > > > > > having identical values for these columns (please correct me > if > > > I’m > > > > > > > > mistaken). On the other hand, the state connector requires > > users > > > to > > > > > > > specify > > > > > > > > an operator UID or operator UID hash, after which it outputs > > > > > > user-defined > > > > > > > > values in its records. This approach feels somewhat redundant > > to > > > > me. > > > > > > > > > > > > > > > > > > > > > > If we would add a new `savepoint-metadata` connector then this > > can > > > be > > > > > > > addressed. > > > > > > > On the other hand UID and UID hash are having either-or > > > relationship > > > > > from > > > > > > > config perspective, > > > > > > > so when a user provides the UID then he/she can be interested > in > > > the > > > > > hash > > > > > > > for further calculations > > > > > > > (the whole Flink internals are depending on the hash). Printing > > out > > > > the > > > > > > > human readable UID > > > > > > > is an explicit requirement from the user side because hashes > are > > > not > > > > > > human > > > > > > > readable. > > > > > > > > > > > > > > > > > > > > > > 3. Handling LIST and MAP States in the State Connector > > > > > > > > I have concerns about how the current design handles LIST and > > MAP > > > > > > states. > > > > > > > > Specifically, the state connector uses Flink SQL’s MAP and > > ARRAY > > > > > types, > > > > > > > > which implies that it attempts to load entire MAP or LIST > > states > > > > into > > > > > > > > memory. > > > > > > > > > > > > > > > > However, in many real-world scenarios, these states can grow > > very > > > > > > large. > > > > > > > > Typically, the state API addresses this by providing an > > iterator > > > to > > > > > > > > traverse elements within the state incrementally. I’m unsure > > > > whether > > > > > > I’ve > > > > > > > > missed something in FLIP-496 or FLIP-512, but it seems that > the > > > > > current > > > > > > > > design might struggle with scalability in such cases. > > > > > > > > > > > > > > > > > > > > > > You see it good, the current implementation keeps state for a > > > single > > > > > key > > > > > > in > > > > > > > memory. > > > > > > > Back in the days we've considered this potential issue and > > > concluded > > > > > that > > > > > > > this is not necessarily > > > > > > > needed for the initial version and can be done as a later > > > > improvement. > > > > > > > > > > > > > > Up until now we've seen even in TB savepoints that the number > of > > > keys > > > > > can > > > > > > > be extremely huge but not the per key state itself. > > > > > > > But again, this is a good feature as-is and can be handled in a > > > > > separate > > > > > > > jira. > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > Best, > > > > > > > > Shengkai > > > > > > > > > > > > > > > > [1] > > https://www.postgresql.org/docs/current/view-pg-tables.html > > > > > > > > [2] > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > https://dev.mysql.com/doc/refman/8.4/en/information-schema-tables-table.html > > > > > > > > > > > > > > > > Gabor Somogyi <gabor.g.somo...@gmail.com> 于2025年3月3日周一 > > 02:00写道: > > > > > > > > > > > > > > > > > Hi Zakelly, > > > > > > > > > > > > > > > > > > In order to shoot for simplicity `METADATA VIRTUAL` as key > > > words > > > > > for > > > > > > > > > definition is the target. > > > > > > > > > When it's not super complex the latter can be added too. > > > > > > > > > > > > > > > > > > BR, > > > > > > > > > G > > > > > > > > > > > > > > > > > > > > > > > > > > > On Sun, Mar 2, 2025 at 3:37 PM Zakelly Lan < > > > > zakelly....@gmail.com> > > > > > > > > wrote: > > > > > > > > > > > > > > > > > > > Hi Gabor, > > > > > > > > > > > > > > > > > > > > +1 for this. > > > > > > > > > > > > > > > > > > > > Will the metadata column use `METADATA VIRTUAL` as key > > words > > > > for > > > > > > > > > > definition, or `METADATA FROM xxx VIRTUAL` for renaming, > > just > > > > > like > > > > > > > the > > > > > > > > > > Kafka table? > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > Best, > > > > > > > > > > Zakelly > > > > > > > > > > > > > > > > > > > > On Sat, Mar 1, 2025 at 1:31 PM Gabor Somogyi < > > > > > > > > gabor.g.somo...@gmail.com> > > > > > > > > > > wrote: > > > > > > > > > > > > > > > > > > > > > Hi All, > > > > > > > > > > > > > > > > > > > > > > I'd like to start a discussion of FLIP-512: Add meta > > > > > information > > > > > > to > > > > > > > > SQL > > > > > > > > > > > state connector [1]. > > > > > > > > > > > Feel free to add your thoughts to make this feature > > better. > > > > > > > > > > > > > > > > > > > > > > [1] > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > https://cwiki.apache.org/confluence/display/FLINK/FLIP-512%3A+Add+meta+information+to+SQL+state+connector > > > > > > > > > > > > > > > > > > > > > > BR, > > > > > > > > > > > G > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > >