Hi! @Zakelly Lan <zakelly....@gmail.com> I think what Gabor means is that users want to have predefined SQL scripts to perform state analysis tasks to debug/identify problems. Such as write a SQL script that joins the metadata table with the state and do some analytics on it.
If we have a meta table then the SQL script that can do this is fixed and users can trigger this on demand by simply providing a new savepoint path. If we have a different mechanism to extract metadata that is not SQL native then manual steps need to be executed and a custom SQL script would need to be written that adds the manually extracted metadata into the script. Cheers, Gyula On Thu, Mar 20, 2025 at 4:32 AM Zakelly Lan <zakelly....@gmail.com> wrote: > Hi all, > > Thanks for your answers! Getting everyone aligned on this topic is > challenging, but it’s definitely worth the effort since it will help > streamline things moving forward. > > @Gabor are you saying that users are using some scripts to define the SQL > metadata connector and get the information, right? If so, would a CLI tool > be more convenient? It's easy to invoke and can get the result swiftly. And > there should be some other systems to track the checkpoint lineage and > analyze if there are outliers in metadata (e.g. state size of one operator) > right? Well, maybe I missed something so please correct me if I'm wrong. > > I think the overall vision in Flink SQL is to provide a SQL native > > environment where we can serve complex use-cases like you would expect > in a > > regular database. > > > @Gyula Well, this is a good point. From the perspective of comprehensive > SQL experience, I'd +1 for treating metadata as data. Although I doubt if > there is a need for processing metadata, I won't be against a separate > connector. > > Regarding the CLI tool, I still think it’s worth implementing. Such a tool > could provide savepoint information before resuming from a savepoint, which > would enhance the user experience in CLI-based workflows. It would be good > if someone could implement this feature. We shouldn’t worry about whether > this tool might be retired in the future. Regardless of the SQL-based > solution we eventually adopt, this capability will remain essential for CLI > users. This is another topic. > > > Best, > Zakelly > > > On Thu, Mar 20, 2025 at 10:37 AM Shengkai Fang <fskm...@gmail.com> wrote: > > > Hi. > > > > After reading the doc[1], I think Spark provides a function for users to > > consume the metadata from the savepoint. In Flink SQL, similar > > functionality is implemented through Polymorphic Table Functions (PTF) as > > proposed in FLIP-440[2]. Below is a code example[3] illustrating this > > concept: > > > > ``` > > public static class ScalarArgsFunction extends > > TestProcessTableFunctionBase { > > public void eval(Integer i, Boolean b) { > > collectObjects(i, b); > > } > > } > > ``` > > > > ``` > > INSERT INTO sink SELECT * FROM f(i => 42, b => CAST('TRUE' AS BOOLEAN)) > > `` > > > > So we can add a builtin function named `read_state_metadata` to read > > savepoint data. > > > > Best, > > Shengkai > > > > [1] > > > > > https://docs.databricks.com/aws/en/structured-streaming/read-state?language=SQL > > [2] > > > https://cwiki.apache.org/confluence/pages/viewpage.action?pageId=298781093 > > [3] > > > > > https://github.com/apache/flink/blob/master/flink-table/flink-table-planner/src/test/java/org/apache/flink/table/planner/plan/nodes/exec/stream/ProcessTableFunctionTestPrograms.java#L140 > > > > Gyula Fóra <gyula.f...@gmail.com> 于2025年3月19日周三 18:37写道: > > > > > Hi All! > > > > > > Thank you for the answers and concerns from everyone. > > > > > > On the CLI vs State Metadata Connector/Table question I would also like > > to > > > step back a little and look at the bigger picture. > > > > > > I think the overall vision in Flink SQL is to provide a SQL native > > > environment where we can serve complex use-cases like you would expect > > in a > > > regular database. > > > Most features, developments in the recent years have gone this way. > > > > > > The State Metadata Table would be a natural and straightforward fit > here. > > > So from my side, +1 for that. > > > > > > However I could understand if we are not ready to add a new > > > connector/format due to maintenance concerns (and in general concern > > about > > > the design). > > > If that's the issue then we should spend more time on the design to get > > > comfortable with the approach and seek feedback from the wider > community > > > > > > I am -1 for the CLI/tooling approach as that will not provide the > > > featureset we are looking for that is not already covered by the Java > > > connector. And that approach would come with the same maintenance > > > implications. > > > > > > Cheers > > > Gyula > > > > > > > > > On Wed, Mar 19, 2025 at 11:24 AM Gabor Somogyi < > > gabor.g.somo...@gmail.com> > > > wrote: > > > > > > > Hi Zaklely, Shengkai > > > > > > > > Several topics are going on so adding gist answers to them. When some > > > topic > > > > is not touched please highlight it. > > > > > > > > @Shengkai: I've read through all the previous FLIPs related catalogs > > and > > > if > > > > we would like to keep the concepts there > > > > then one-to-one mapping relationship between savepoint and catalog > is a > > > > reasonable direction. In short I'm happy that > > > > you've highlighted this and agree as a whole. I've written it down > > > > previously, just want to double confirm that state catalog is > > > > essential and planned. When we reach this point then your input is > more > > > > than welcome. > > > > > > > > @Zakelly: We've tried the CLI and separate library approaches with > > users > > > > already and these are not something which is welcome because of the > > > > following: > > > > * Users want to have automated tasks and not manual CLI/library > output > > > > parsing. This can be hacked around but our experience is negative on > > this > > > > because it's just brittle. > > > > * From development perspective It's way much bigger effort than a > > > connector > > > > (hard to test, packaging/version handling is and extra layer of > > > complexity, > > > > external FS authentication is pain for users, expecting them to > > download > > > > savepoints also) > > > > * Purely personal opinion but if we would find better ways later then > > > > retire a CLI is not more lightweight than retire a connector > > > > > > > > > It would be great if you give some examples on how user could > > leverage > > > > the separate connector to process the metadata. > > > > > > > > The most simplest cases: > > > > * give me the overgroving state uids > > > > * give me the not known (new or renamed) state uids > > > > * give me the state uids where state size drastically dropped compare > > to > > > a > > > > previous savepoint (accidental state loss) > > > > > > > > Since it was mentioned: as a general offtopic teaser, yeah it would > be > > > good > > > > to have some sort of checkpoint/savepoint lineage or however we call > > it. > > > > Since we've not yet reached this point there are no technical > details, > > > it's > > > > more like a vision. It's a common pattern that > > > > jobs are physically running but somehow the state processing is stuck > > and > > > > it would be good to add some way to find it out automatically. > > > > The important saying here is automation and not manual evaluation > since > > > > handling 10k+ jobs is just not allowing that. > > > > > > > > BR, > > > > G > > > > > > > > > > > > On Wed, Mar 19, 2025 at 6:46 AM Shengkai Fang <fskm...@gmail.com> > > wrote: > > > > > > > > > Hi, All. > > > > > > > > > > About State Catalog, I want to share more thoughts about this. > > > > > > > > > > In the initial design concept, I understood that a savepoint and a > > > state > > > > > catalog have a one-to-one mapping relationship. Each operator > > > corresponds > > > > > to a database, and the state of each operator is represented as > > > > individual > > > > > tables. The rationale behind this design is: > > > > > > > > > > *State Diversity*: An operator may involve multiple types of > states. > > > For > > > > > example, in our VVR design, a "multi-join" operator uses keyed > states > > > for > > > > > two input streams and a broadcast state for the third stream. This > > > makes > > > > it > > > > > challenging to represent all states of an operator within a single > > > table. > > > > > *Scalability*: Internally, an operator might have multiple keyed > > states > > > > > (e.g., value state and list state). However, large list states may > > not > > > > fit > > > > > entirely in memory. To address this, we recommend implementing each > > > state > > > > > as a separate table. > > > > > > > > > > To resolve the loosely coupled relationships between operator > states, > > > we > > > > > propose embedding predefined views within the catalog. These views > > > > simplify > > > > > user understanding of operator implementations and provide a more > > > > intuitive > > > > > perspective. For instance, a join operator may have multiple state > > > > > implementations (depending on whether the join key includes unique > > > > > attributes), but users primarily care about the data associated > with > > a > > > > > specific join key across input streams. > > > > > > > > > > Returning to the one-to-one mapping between savepoints and > catalogs, > > we > > > > aim > > > > > to manage multiple user state catalogs through a catalog store. > When > > a > > > > user > > > > > triggers a savepoint for a job on the platform: > > > > > > > > > > 1. The platform sends a REST request to the JobManager. > > > > > 2. Simultaneously, it registers a new state catalog in the catalog > > > store, > > > > > enabling immediate analysis of state data on the platform. > > > > > 3. Deleting a savepoint would also trigger the removal of its > > > associated > > > > > catalog. > > > > > > > > > > This vision assumes that states are self-describing or that a state > > > > > metaservice is introduced to analyze savepoint structures. > > > > > > > > > > > How can users create logic to identify differences between > multiple > > > > > savepoints? > > > > > > > > > > Since savepoints and state catalogs are one-to-one mapped, users > can > > > > query > > > > > metadata via their respective catalogs. For example: > > > > > > > > > > 1. `savepoint-${id}`.`system`.`metadata_table`.`<operator-name>` > > > provides > > > > > operator-specific metadata (e.g., state size, type). > > > > > 2. Comparing metadata tables (e.g., schema versions, state entry > > > counts) > > > > > across catalogs reveals structural or quantitative differences. > > > > > 3. For deeper analysis, users could write SQL queries to compare > > > specific > > > > > state partitions or leverage the metaservice to track state > evolution > > > > > (e.g., added/removed operators, modified state configurations). > > > > > > > > > > If we plan to introduce a state catalog in the future, I would lean > > > > toward > > > > > using metadata tables. If a utility tool can address the challenges > > we > > > > > face, could we avoid introducing an additional connector? > > > > > > > > > > Best, > > > > > Shengkai > > > > > > > > > > Gyula Fóra <gyula.f...@gmail.com> 于2025年3月17日周一 20:25写道: > > > > > > > > > > > Hi All! > > > > > > > > > > > > Without going into too much detail here are my 2 cents regarding > > the > > > > > > virtual column / catalog metadata / table (connector) discussion > > for > > > > the > > > > > > State metadata. > > > > > > > > > > > > State metadata such as the types of states, their properties, > > names, > > > > > sizes > > > > > > etc are all valuable information that can be used to enrich the > > > > > > computations we do on state. > > > > > > We can either analyze it standalone (such as discover anomalies, > > for > > > > > large > > > > > > jobs with many states), across multiple savepoints (discover how > > > state > > > > > > changed over time) or by joining it with keyed or non-keyed state > > > data > > > > to > > > > > > serve more complex queries on the state. > > > > > > > > > > > > The only solution that seems to serve all these use-cases and > > > > > requirements > > > > > > in a straightforward and SQL canonical way is to simply expose > the > > > > state > > > > > > metadata as a separate table. This is a metadata table but you > can > > > also > > > > > > think of it as data table, it makes no practical difference here. > > > > > > > > > > > > Once we have a catalog later, the catalog can offer this table > out > > of > > > > the > > > > > > box, the same way databases provide metadata tables. For this to > > work > > > > > > however we need another, simpler connector that creates this > table. > > > > > > > > > > > > +1 for state metadata as a separate connector/table, instead of > > > adding > > > > > > virtual columns and adhoc catalog metadata that is hard to use > in a > > > > large > > > > > > number of queries. > > > > > > > > > > > > Cheers, > > > > > > Gyula > > > > > > > > > > > > On Mon, Mar 17, 2025 at 12:44 PM Gabor Somogyi < > > > > > gabor.g.somo...@gmail.com> > > > > > > wrote: > > > > > > > > > > > > > 1. State TTL for Value Columns > > > > > > > > > > > > > > > I’m planning on adding this, and we may collaborate on it in > > the > > > > > > future. > > > > > > > > > > > > > > +1 on this, just ping me. > > > > > > > > > > > > > > 2. Metadata Table vs. Metadata Column > > > > > > > > > > > > > > After some code digging and POC all I can say that with heavy > > > effort > > > > we > > > > > > can > > > > > > > maybe add such changes that we're able to show metadata of a > > > > savepoint > > > > > > from > > > > > > > catalog. > > > > > > > I'm not against that but from user perspective this has limited > > > > value, > > > > > > let > > > > > > > me explain why. > > > > > > > > > > > > > > From high level perspective I see the following which I see > > > agreement > > > > > on: > > > > > > > * We should have a catalog which is representing one or more > jobs > > > > > > savepoint > > > > > > > data set (future plan) > > > > > > > * Savepoints should be able to be registered in the catalog > which > > > are > > > > > > then > > > > > > > databases (future plan) > > > > > > > * There must be a possiblity to create tables from databases > > where > > > > > users > > > > > > > can read state data (exists already) > > > > > > > > > > > > > > In terms of metadata, If I understand correctly then the > > suggested > > > > > > approach > > > > > > > would be to access > > > > > > > it from the catalog describe command, right? Adding that info > > when > > > > > > specific > > > > > > > database describe command > > > > > > > is executed could be done. > > > > > > > > > > > > > > The question is for instance how can users create such a logic > > that > > > > > tells > > > > > > > them what is > > > > > > > the difference between multiple savepoints? > > > > > > > Just to give some examples: > > > > > > > * per operator size changes between savepoints > > > > > > > * show values from operator data where state size reaches a > > > boundary > > > > > > > * in general "find which checkpoint ruined things" is quite > > common > > > > > > pattern > > > > > > > What I would like to highlight here is that from Flink point of > > > view > > > > > the > > > > > > > metadata can be > > > > > > > considered as a static side output information but for users > > these > > > > > values > > > > > > > are actual real data > > > > > > > where logic is planned to build around. > > > > > > > > > > > > > > > The metadata is more like one-time information instead of a > > > > streaming > > > > > > > data that changes all > > > > > > > the time, so a single connector seems to be an overkill. > > > > > > > > > > > > > > State data is also static within a savepoint and that's the > > reason > > > > why > > > > > > the > > > > > > > state processor API is working in batch mode. > > > > > > > When we handle multiple checkpoints in a streaming fashion then > > > this > > > > > can > > > > > > be > > > > > > > viewed from another angle. > > > > > > > > > > > > > > We can come up with more lightweight solution other than a new > > > > > connector > > > > > > > but enforcing users to parse the catalog > > > > > > > describe command output in order to compare multiple savepoints > > > > doesn't > > > > > > > sound smooth user experience. > > > > > > > Honestly I've no other idea how exposing metadata as real user > > data > > > > so > > > > > > > waiting on other approaches. > > > > > > > > > > > > > > BR, > > > > > > > G > > > > > > > > > > > > > > > > > > > > > On Thu, Mar 13, 2025 at 2:44 AM Shengkai Fang < > fskm...@gmail.com > > > > > > > > wrote: > > > > > > > > > > > > > > > Looking forward to hearing the good news! > > > > > > > > > > > > > > > > Best, > > > > > > > > Shengkai > > > > > > > > > > > > > > > > Gabor Somogyi <gabor.g.somo...@gmail.com> 于2025年3月12日周三 > > 22:24写道: > > > > > > > > > > > > > > > > > Thanks for both the valuable input! > > > > > > > > > > > > > > > > > > Let me take a closer look at the suggestions, like the > > Catalog > > > > > > > > capabilities > > > > > > > > > and possibility of embedding TypeInformation or > > > > > > > > > StateDescriptor metadata directly into the raw state > files... > > > > > > > > > > > > > > > > > > BR, > > > > > > > > > G > > > > > > > > > > > > > > > > > > > > > > > > > > > On Wed, Mar 12, 2025 at 8:17 AM Shengkai Fang < > > > fskm...@gmail.com > > > > > > > > > > > > wrote: > > > > > > > > > > > > > > > > > > > Thanks for Zakelly's clarification. > > > > > > > > > > > > > > > > > > > > 1. State TTL for Value Columns > > > > > > > > > > > > > > > > > > > > +1 to delay the discussion about this. > > > > > > > > > > > > > > > > > > > > 2. Metadata Table vs. Metadata Column > > > > > > > > > > > > > > > > > > > > I’d like to share my perspective on the State Catalog > > > proposal. > > > > > > While > > > > > > > > > > introducing this capability is beneficial, there is a > > > blocker: > > > > > the > > > > > > > > > current > > > > > > > > > > StateBackend architecture does not permit operators to > > encode > > > > > > > > > > TypeInformation into the state—it only preserves the > > > > Serializer. > > > > > > This > > > > > > > > > > limitation creates an asymmetry, as operators alone > retain > > > > > > knowledge > > > > > > > of > > > > > > > > > the > > > > > > > > > > data structure’s schema. > > > > > > > > > > > > > > > > > > > > To address this, I suggest allowing operators to embed > > > > > > > TypeInformation > > > > > > > > or > > > > > > > > > > StateDescriptor metadata directly into the raw state > files. > > > > Such > > > > > a > > > > > > > > design > > > > > > > > > > would enable the Catalog to: > > > > > > > > > > > > > > > > > > > > 1. Parse state files and programmatically derive the > schema > > > and > > > > > > > > > structural > > > > > > > > > > guarantees for each state. > > > > > > > > > > 2. Leverage existing Flink Table utilities, such as > > > > > > > > > > LegacyTypeInfoDataTypeConverter (in > > > > > > > > org.apache.flink.table.types.utils), > > > > > > > > > to > > > > > > > > > > bridge TypeInformation and DataType conversions. > > > > > > > > > > > > > > > > > > > > If we can not store the TypeInformation or > StateDescriptor > > > into > > > > > the > > > > > > > raw > > > > > > > > > > state files, I am +1 for this FLIP to use metadata column > > to > > > > > > retrieve > > > > > > > > > > information. > > > > > > > > > > > > > > > > > > > > Best, > > > > > > > > > > Shengkai > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > Zakelly Lan <zakelly....@gmail.com> 于2025年3月12日周三 > 12:43写道: > > > > > > > > > > > > > > > > > > > > > Hi Gabor and Shengkai, > > > > > > > > > > > > > > > > > > > > > > Thanks for sharing your thoughts! This is a long > > discussion > > > > and > > > > > > > sorry > > > > > > > > > for > > > > > > > > > > > the late reply (I'm busy catching up with release 2.0 > > these > > > > > > days). > > > > > > > > > > > > > > > > > > > > > > 1. State TTL for Value Columns > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > Let me first clarify your thoughts to ensure I > understand > > > > > > > correctly. > > > > > > > > > > IIUC, > > > > > > > > > > > there is no persistent configuration for state TTL in > the > > > > > > > checkpoint. > > > > > > > > > > While > > > > > > > > > > > you can infer that TTL is enabled by reading the > > > serializer, > > > > > the > > > > > > > > > > checkpoint > > > > > > > > > > > itself only stores the last access time for each value. > > So > > > > the > > > > > > only > > > > > > > > > thing > > > > > > > > > > > we can show is the last access time for each value. But > > it > > > is > > > > > not > > > > > > > > > > required > > > > > > > > > > > for all state backends to store this, as they may > > directly > > > > > store > > > > > > > the > > > > > > > > > > > expired time. This will also increase the difficulty of > > > > > > > > implementation > > > > > > > > > & > > > > > > > > > > > maintenance. > > > > > > > > > > > > > > > > > > > > > > This once again reiterates the importance of unified > > > metadata > > > > > for > > > > > > > > > > > checkpoints. I’m planning on adding this, and we may > > > > > collaborate > > > > > > on > > > > > > > > it > > > > > > > > > in > > > > > > > > > > > the future. > > > > > > > > > > > > > > > > > > > > > > 2. Metadata Table vs. Metadata Column > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > I'm not in favor of adding a new connector for > metadata. > > > The > > > > > > > metadata > > > > > > > > > is > > > > > > > > > > > more like one-time information instead of a streaming > > data > > > > that > > > > > > > > changes > > > > > > > > > > all > > > > > > > > > > > the time, so a single connector seems to be an > overkill. > > It > > > > is > > > > > > not > > > > > > > > easy > > > > > > > > > > to > > > > > > > > > > > withdraw a connector if we have a better solution in > > > future. > > > > > I'm > > > > > > > not > > > > > > > > > > > familiar with current Catalog capabilities, and if it > > could > > > > > > extract > > > > > > > > and > > > > > > > > > > > show some operator-level information from savepoint, > that > > > > would > > > > > > be > > > > > > > > > great. > > > > > > > > > > > > > > > > > > > > > > If the Catalog can't do that, I would consider the > > current > > > > FLIP > > > > > > to > > > > > > > > be a > > > > > > > > > > > compromise solution. > > > > > > > > > > > > > > > > > > > > > > And if we have that unified metadata for > > > checkpoint/savepoint > > > > > in > > > > > > > > > future, > > > > > > > > > > we > > > > > > > > > > > may directly register savepoint in catalog, and create > a > > > > source > > > > > > > > without > > > > > > > > > > > specifying complex columns, as well as describe the > > > savepoint > > > > > > > catalog > > > > > > > > > to > > > > > > > > > > > get the metadata. That's a good solution in my mind. > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > Best, > > > > > > > > > > > Zakelly > > > > > > > > > > > > > > > > > > > > > > On Wed, Mar 12, 2025 at 10:35 AM Shengkai Fang < > > > > > > fskm...@gmail.com> > > > > > > > > > > wrote: > > > > > > > > > > > > > > > > > > > > > > > Hi Gabor, > > > > > > > > > > > > > > > > > > > > > > > > > 2. Adding a new connector with `savepoint-metadata` > > > > > > > > > > > > > > > > > > > > > > > > I would argue against introducing a new connector > type > > > > named > > > > > > > > > > > > savepoint-metadata, as the existing Catalog mechanism > > can > > > > > > > > inherently > > > > > > > > > > > > provide the necessary connector factory capabilities. > > > I’ve > > > > > > > detailed > > > > > > > > > > this > > > > > > > > > > > > proposal in branch[1]. Please take a moment to review > > it. > > > > > > > > > > > > > > > > > > > > > > > > If we introduce a connector named > `savepoint-metadata`, > > > it > > > > > > means > > > > > > > > user > > > > > > > > > > can > > > > > > > > > > > > create a temporary table with connector > > > > `savepoint-metadata` > > > > > > and > > > > > > > > the > > > > > > > > > > > > connector needs to check whether table schema is same > > to > > > > the > > > > > > > schema > > > > > > > > > we > > > > > > > > > > > > proposed in the FLIP. On the other hand, it's not > easy > > > work > > > > > for > > > > > > > > > others > > > > > > > > > > to > > > > > > > > > > > > users a metadata table with same schema. > > > > > > > > > > > > > > > > > > > > > > > > [1] > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > https://github.com/apache/flink/compare/master...fsk119:flink:state-metadata?expand=1#diff-712a7bc92fe46c405fb0e61b475bb2a005cb7a72bab7df28bbb92744bcb5f465R63 > > > > > > > > > > > > > > > > > > > > > > > > Best, > > > > > > > > > > > > Shengkai > > > > > > > > > > > > > > > > > > > > > > > > Gabor Somogyi <gabor.g.somo...@gmail.com> > > 于2025年3月11日周二 > > > > > > 16:56写道: > > > > > > > > > > > > > > > > > > > > > > > > > Hi Shengkai, > > > > > > > > > > > > > > > > > > > > > > > > > > > 1. State TTL for Value Columns > > > > > > > > > > > > > > > > > > > > > > > > > > From directional perspective I agree your idea how > it > > > can > > > > > be > > > > > > > > > > > implemented. > > > > > > > > > > > > > Previously I've mentioned that TTL information is > not > > > > > exposed > > > > > > > on > > > > > > > > > the > > > > > > > > > > > > state > > > > > > > > > > > > > processor API (which the SQL state connector uses > to > > > read > > > > > > data) > > > > > > > > > > > > > and unless somebody show me the opposite this FLIP > is > > > not > > > > > > going > > > > > > > > to > > > > > > > > > > > > address > > > > > > > > > > > > > this to avoid feature creep. Our users are also > > > > interested > > > > > in > > > > > > > TTL > > > > > > > > > so > > > > > > > > > > > > > sooner or later we're going to expose it, this is > > > matter > > > > of > > > > > > > > > > scheduling. > > > > > > > > > > > > > > > > > > > > > > > > > > > 2. Adding a new connector with > `savepoint-metadata` > > > > > > > > > > > > > > > > > > > > > > > > > > Not sure I understand your point at all related > > > > > StateCatalog. > > > > > > > > First > > > > > > > > > > of > > > > > > > > > > > > all > > > > > > > > > > > > > I can't agree more that StateCatalog is needed and > > is a > > > > > > planned > > > > > > > > > > > building > > > > > > > > > > > > > block in an upcoming > > > > > > > > > > > > > FLIP but not sure how can it help now? No matter > > what, > > > > your > > > > > > > > > knowledge > > > > > > > > > > > is > > > > > > > > > > > > > essential when we add StateCatalog. Let me expose > my > > > > > > > > understanding > > > > > > > > > in > > > > > > > > > > > > this > > > > > > > > > > > > > area: > > > > > > > > > > > > > * First we need create table statements to access > > state > > > > > data > > > > > > > and > > > > > > > > > > > metadata > > > > > > > > > > > > > * When we have that then we can add StateCatalog > > which > > > > > could > > > > > > > > > > > potentially > > > > > > > > > > > > > ease the life of users by for ex. giving > > off-the-shelf > > > > > tables > > > > > > > > > without > > > > > > > > > > > > > sweating with create table statements > > > > > > > > > > > > > > > > > > > > > > > > > > User expectations: > > > > > > > > > > > > > * See state data (this is fulfilled with the > existing > > > > > > > connector) > > > > > > > > > > > > > * See metadata about state data like TTL (this can > be > > > > added > > > > > > as > > > > > > > > > > metadata > > > > > > > > > > > > > column as you suggested since it belongs to the > data) > > > > > > > > > > > > > * See metadata about operators (this can be added > > from > > > > > > > > > > > > savepoint-metadata) > > > > > > > > > > > > > > > > > > > > > > > > > > Important to highlight that state data table format > > > > differs > > > > > > > from > > > > > > > > > > state > > > > > > > > > > > > > metadata table format. Namely one table has rows > for > > > > state > > > > > > > values > > > > > > > > > and > > > > > > > > > > > > > another has rows for operators, right? > > > > > > > > > > > > > I think that's the reason why you've pinpointed out > > > that > > > > > the > > > > > > > > > > suggested > > > > > > > > > > > > > metadata columns are somewhat clunky. > > > > > > > > > > > > > > > > > > > > > > > > > > As a conclusion I agree to add ${state-name}_ttl > > > metadata > > > > > > > column > > > > > > > > > > later > > > > > > > > > > > on > > > > > > > > > > > > > since it belongs to the state value and adding a > new > > > > table > > > > > > type > > > > > > > > > (like > > > > > > > > > > > you > > > > > > > > > > > > > suggested similar to PG [1]) > > > > > > > > > > > > > for metadata. Please see how Spark does that too > [2]. > > > > > > > > > > > > > > > > > > > > > > > > > > If you have better approach then please elaborate > > with > > > > more > > > > > > > > details > > > > > > > > > > and > > > > > > > > > > > > > help me to understand your point. > > > > > > > > > > > > > > > > > > > > > > > > > > > Up until now we've seen even in TB savepoints > that > > > the > > > > > > number > > > > > > > > of > > > > > > > > > > keys > > > > > > > > > > > > can > > > > > > > > > > > > > > be extremely huge but not the per key state > itself. > > > > > > > > > > > > > > But again, this is a good feature as-is and can > be > > > > > handled > > > > > > > in a > > > > > > > > > > > > separate > > > > > > > > > > > > > > jira. > > > > > > > > > > > > > > > > > > > > > > > > > > I've just created > > > > > > > > > https://issues.apache.org/jira/browse/FLINK-37456. > > > > > > > > > > > > > > > > > > > > > > > > > > [1] > > > > > > > https://www.postgresql.org/docs/current/view-pg-tables.html > > > > > > > > > > > > > [2] > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > https://www.databricks.com/blog/announcing-state-reader-api-new-statestore-data-source > > > > > > > > > > > > > > > > > > > > > > > > > > BR, > > > > > > > > > > > > > G > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > On Tue, Mar 11, 2025 at 3:55 AM Shengkai Fang < > > > > > > > fskm...@gmail.com > > > > > > > > > > > > > > > > > > > > wrote: > > > > > > > > > > > > > > > > > > > > > > > > > > > Hi, Gabor. Thanks for your response. > > > > > > > > > > > > > > > > > > > > > > > > > > > > > 1. State TTL for Value Columns > > > > > > > > > > > > > > > > > > > > > > > > > > > > Thank you for addressing the limitations here. > > > > However, I > > > > > > > > believe > > > > > > > > > > it > > > > > > > > > > > > > would > > > > > > > > > > > > > > be beneficial to further clarify the API in this > > FLIP > > > > > > > regarding > > > > > > > > > how > > > > > > > > > > > > users > > > > > > > > > > > > > > can specify the TTL column. > > > > > > > > > > > > > > > > > > > > > > > > > > > > One potential approach that comes to mind is > using > > a > > > > > > > > standardized > > > > > > > > > > > > naming > > > > > > > > > > > > > > convention such as ${state-name}_ttl for the > > metadata > > > > > > column > > > > > > > > that > > > > > > > > > > > > defines > > > > > > > > > > > > > > the TTL value. In terms of implementation, the > > > > > > > > > listReadableMetadata > > > > > > > > > > > > > > function could: > > > > > > > > > > > > > > > > > > > > > > > > > > > > 1. Read the table’s columns and configuration, > > > > > > > > > > > > > > 2. Extract all defined state names, and > > > > > > > > > > > > > > 3. Return a structured list of metadata entries > > > > formatted > > > > > > as > > > > > > > > > > > > > > ${state-name}_ttl. > > > > > > > > > > > > > > > > > > > > > > > > > > > > WDYT? > > > > > > > > > > > > > > > > > > > > > > > > > > > > > 2. Adding a new connector with > > `savepoint-metadata` > > > > > > > > > > > > > > > > > > > > > > > > > > > > Introducing a new connector type at this stage > may > > > > > > > > unnecessarily > > > > > > > > > > > > > complicate > > > > > > > > > > > > > > the system. Given that every table already > belongs > > > to a > > > > > > > > Catalog, > > > > > > > > > > > which > > > > > > > > > > > > is > > > > > > > > > > > > > > designed to provide a Factory for building source > > or > > > > sink > > > > > > > > > > > connectors, I > > > > > > > > > > > > > > propose integrating a dedicated StateCatalog > > instead. > > > > > This > > > > > > > > > approach > > > > > > > > > > > > would > > > > > > > > > > > > > > allow us to: > > > > > > > > > > > > > > > > > > > > > > > > > > > > 1. Leverage the Catalog’s existing capabilities > to > > > > manage > > > > > > TTL > > > > > > > > > > > metadata > > > > > > > > > > > > > > (e.g., state names and TTL logic) without > > duplicating > > > > > > > > > > functionality. > > > > > > > > > > > > > > 2. Provide a unified interface for connector > > > > > instantiation > > > > > > > and > > > > > > > > > > > metadata > > > > > > > > > > > > > > handling through the Catalog’s Factory pattern. > > > > > > > > > > > > > > > > > > > > > > > > > > > > Would this design decision better align with our > > > > > > > architecture’s > > > > > > > > > > > > > > extensibility and reduce redundancy? > > > > > > > > > > > > > > > > > > > > > > > > > > > > > Up until now we've seen even in TB savepoints > > that > > > > the > > > > > > > number > > > > > > > > > of > > > > > > > > > > > keys > > > > > > > > > > > > > can > > > > > > > > > > > > > > > be extremely huge but not the per key state > > itself. > > > > > > > > > > > > > > > But again, this is a good feature as-is and can > > be > > > > > > handled > > > > > > > > in a > > > > > > > > > > > > > separate > > > > > > > > > > > > > > > jira. > > > > > > > > > > > > > > > > > > > > > > > > > > > > +1 for a separate jira. > > > > > > > > > > > > > > > > > > > > > > > > > > > > Best, > > > > > > > > > > > > > > Shengkai > > > > > > > > > > > > > > > > > > > > > > > > > > > > Gabor Somogyi <gabor.g.somo...@gmail.com> > > > > 于2025年3月10日周一 > > > > > > > > 19:05写道: > > > > > > > > > > > > > > > > > > > > > > > > > > > > > Hi Shengkai, > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > Please see my comments inline. > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > BR, > > > > > > > > > > > > > > > G > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > On Mon, Mar 3, 2025 at 7:07 AM Shengkai Fang < > > > > > > > > > fskm...@gmail.com> > > > > > > > > > > > > > wrote: > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > Hi, Gabor. Thanks for your the FLIP. I have > > some > > > > > > > questions > > > > > > > > > > about > > > > > > > > > > > > the > > > > > > > > > > > > > > > FLIP: > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > 1. State TTL for Value Columns > > > > > > > > > > > > > > > > How can users retrieve the state TTL > > > (Time-to-Live) > > > > > for > > > > > > > > each > > > > > > > > > > > value > > > > > > > > > > > > > > > column? > > > > > > > > > > > > > > > > From my understanding of the current design, > it > > > > seems > > > > > > > that > > > > > > > > > this > > > > > > > > > > > > > > > > functionality is not supported. Could you > > clarify > > > > if > > > > > > > there > > > > > > > > > are > > > > > > > > > > > > plans > > > > > > > > > > > > > to > > > > > > > > > > > > > > > > address this limitation? > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > Since the state processor API is not yet > exposing > > > > this > > > > > > > > > > information > > > > > > > > > > > > this > > > > > > > > > > > > > > > would require several steps. > > > > > > > > > > > > > > > First, the state processor API support needs to > > be > > > > > added > > > > > > > > which > > > > > > > > > > can > > > > > > > > > > > be > > > > > > > > > > > > > > then > > > > > > > > > > > > > > > exposed on the SQL API. > > > > > > > > > > > > > > > This is definitely a future improvement which > is > > > > useful > > > > > > and > > > > > > > > can > > > > > > > > > > be > > > > > > > > > > > > > > handled > > > > > > > > > > > > > > > in a separate jira. > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > 2. Metadata Table vs. Metadata Column > > > > > > > > > > > > > > > > The metadata information described in the > FLIP > > > > > appears > > > > > > to > > > > > > > > be > > > > > > > > > > > > intended > > > > > > > > > > > > > > to > > > > > > > > > > > > > > > > describe the state files stored at a specific > > > > > location. > > > > > > > To > > > > > > > > > me, > > > > > > > > > > > this > > > > > > > > > > > > > > > concept > > > > > > > > > > > > > > > > aligns more closely with system tables like > > > > pg_tables > > > > > > in > > > > > > > > > > > PostgreSQL > > > > > > > > > > > > > [1] > > > > > > > > > > > > > > > or > > > > > > > > > > > > > > > > the INFORMATION_SCHEMA in MySQL [2]. > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > Adding a new connector with > `savepoint-metadata` > > > is a > > > > > > > > > possibility > > > > > > > > > > > > where > > > > > > > > > > > > > > we > > > > > > > > > > > > > > > can create such functionality. > > > > > > > > > > > > > > > I'm not against that, just want to have a > common > > > > > > agreement > > > > > > > > that > > > > > > > > > > we > > > > > > > > > > > > > would > > > > > > > > > > > > > > > like to move that direction. > > > > > > > > > > > > > > > (As a side note not just PG but Spark also has > > > > similar > > > > > > > > approach > > > > > > > > > > > and I > > > > > > > > > > > > > > > basically like the idea). > > > > > > > > > > > > > > > If we would go that direction savepoint > metadata > > > can > > > > be > > > > > > > > reached > > > > > > > > > > in > > > > > > > > > > > a > > > > > > > > > > > > > way > > > > > > > > > > > > > > > that one row would represent > > > > > > > > > > > > > > > an operator with it's values something like > this: > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > ┌─────────┬─────────┬─────────┬─────────┬─────────┬─────────┬─────────┬────────┐ > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > │operatorN│operatorU│operatorH│paralleli│maxParall│subtaskSt│coordinat│totalSta│ > > > > > > > > > > > > > > > │ame │id │ash │sm │elism > > > > > > > > > > > > > > > │atesCount│orStateSi│tesSizeI│ > > > > > > > > > > > > > > > │ │ │ │ │ > > │ > > > > > > > > > > > > > > > │zeInBytes│nBytes │ > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > ├─────────┼─────────┼─────────┼─────────┼─────────┼─────────┼─────────┼────────┤ > > > > > > > > > > > > > > > │Source: │datagen-s│47aee9439│2 │128 > > > │2 > > > > > > > > │16 > > > > > > > > > > > > > > > │546 │ > > > > > > > > > > > > > > > │datagen-s│ource-uid│4d6ea26e2│ │ > > │ > > > > > > > │ > > > > > > > > > > > > │ > > > > > > > > > > > > > > > │ > > > > > > > > > > > > > > > │ource │ │d544bef0a│ │ > > │ > > > > > > > │ > > > > > > > > > > > > │ > > > > > > > > > > > > > > > │ > > > > > > > > > > > > > > > │ │ │37bb5 │ │ > > │ > > > > > > > │ > > > > > > > > > > > > │ > > > > > > > > > > > > > > > │ > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > ├─────────┼─────────┼─────────┼─────────┼─────────┼─────────┼─────────┼────────┤ > > > > > > > > > > > > > > > │long-udf-│long-udf-│6ed3f40bf│2 │128 > > > │2 > > > > > > > > │0 > > > > > > > > > > > > > │0 > > > > > > > > > > > > > > > │ > > > > > > > > > > > > > > > │with-mast│with-mast│f3c8dfcdf│ │ > > │ > > > > > > > │ > > > > > > > > > > > > │ > > > > > > > > > > > > > > > │ > > > > > > > > > > > > > > > │er-hook │er-hook-u│cb95128a1│ │ > > │ > > > > > > > │ > > > > > > > > > > > > │ > > > > > > > > > > > > > > > │ > > > > > > > > > > > > > > > │ │id │018f1 │ │ > > │ > > > > > > > │ > > > > > > > > > > > > │ > > > > > > > > > > > > > > > │ > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > ├─────────┼─────────┼─────────┼─────────┼─────────┼─────────┼─────────┼────────┤ > > > > > > > > > > > > > > > │value-pro│value-pro│ca4f5fe9a│2 │128 > > > │2 > > > > > > > > │0 > > > > > > > > > > > > > > > │40726 │ > > > > > > > > > > > > > > > │cess │cess-uid │637b656f0│ │ > > │ > > > > > > > │ > > > > > > > > > > > > │ > > > > > > > > > > > > > > > │ > > > > > > > > > > > > > > > │ │ │9ea78b3e7│ │ > > │ > > > > > > > │ > > > > > > > > > > > > │ > > > > > > > > > > > > > > > │ > > > > > > > > > > > > > > > │ │ │a15b9 │ │ > > │ > > > > > > > │ > > > > > > > > > > > > │ > > > > > > > > > > > > > > > │ > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > ├─────────┼─────────┼─────────┼─────────┼─────────┼─────────┼─────────┼────────┤ > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > This table can then be joined with the actually > > > > > existing > > > > > > > > > > > `savepoint` > > > > > > > > > > > > > > > connector created tables based on UID hash > (which > > > is > > > > > > unique > > > > > > > > and > > > > > > > > > > > > always > > > > > > > > > > > > > > > exists). > > > > > > > > > > > > > > > This would mean that the already existing table > > > would > > > > > > need > > > > > > > > > only a > > > > > > > > > > > > > single > > > > > > > > > > > > > > > metadata column which is the UID hash. > > > > > > > > > > > > > > > WDYT? > > > > > > > > > > > > > > > @zakelly, plz share your thoughts too. > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > If we opt to use metadata columns, every > record > > > in > > > > > the > > > > > > > > table > > > > > > > > > > > would > > > > > > > > > > > > > end > > > > > > > > > > > > > > up > > > > > > > > > > > > > > > > having identical values for these columns > > (please > > > > > > correct > > > > > > > > me > > > > > > > > > if > > > > > > > > > > > I’m > > > > > > > > > > > > > > > > mistaken). On the other hand, the state > > connector > > > > > > > requires > > > > > > > > > > users > > > > > > > > > > > to > > > > > > > > > > > > > > > specify > > > > > > > > > > > > > > > > an operator UID or operator UID hash, after > > which > > > > it > > > > > > > > outputs > > > > > > > > > > > > > > user-defined > > > > > > > > > > > > > > > > values in its records. This approach feels > > > somewhat > > > > > > > > redundant > > > > > > > > > > to > > > > > > > > > > > > me. > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > If we would add a new `savepoint-metadata` > > > connector > > > > > then > > > > > > > > this > > > > > > > > > > can > > > > > > > > > > > be > > > > > > > > > > > > > > > addressed. > > > > > > > > > > > > > > > On the other hand UID and UID hash are having > > > > either-or > > > > > > > > > > > relationship > > > > > > > > > > > > > from > > > > > > > > > > > > > > > config perspective, > > > > > > > > > > > > > > > so when a user provides the UID then he/she can > > be > > > > > > > interested > > > > > > > > > in > > > > > > > > > > > the > > > > > > > > > > > > > hash > > > > > > > > > > > > > > > for further calculations > > > > > > > > > > > > > > > (the whole Flink internals are depending on the > > > > hash). > > > > > > > > Printing > > > > > > > > > > out > > > > > > > > > > > > the > > > > > > > > > > > > > > > human readable UID > > > > > > > > > > > > > > > is an explicit requirement from the user side > > > because > > > > > > > hashes > > > > > > > > > are > > > > > > > > > > > not > > > > > > > > > > > > > > human > > > > > > > > > > > > > > > readable. > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > 3. Handling LIST and MAP States in the State > > > > > Connector > > > > > > > > > > > > > > > > I have concerns about how the current design > > > > handles > > > > > > LIST > > > > > > > > and > > > > > > > > > > MAP > > > > > > > > > > > > > > states. > > > > > > > > > > > > > > > > Specifically, the state connector uses Flink > > > SQL’s > > > > > MAP > > > > > > > and > > > > > > > > > > ARRAY > > > > > > > > > > > > > types, > > > > > > > > > > > > > > > > which implies that it attempts to load entire > > MAP > > > > or > > > > > > LIST > > > > > > > > > > states > > > > > > > > > > > > into > > > > > > > > > > > > > > > > memory. > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > However, in many real-world scenarios, these > > > states > > > > > can > > > > > > > > grow > > > > > > > > > > very > > > > > > > > > > > > > > large. > > > > > > > > > > > > > > > > Typically, the state API addresses this by > > > > providing > > > > > an > > > > > > > > > > iterator > > > > > > > > > > > to > > > > > > > > > > > > > > > > traverse elements within the state > > incrementally. > > > > I’m > > > > > > > > unsure > > > > > > > > > > > > whether > > > > > > > > > > > > > > I’ve > > > > > > > > > > > > > > > > missed something in FLIP-496 or FLIP-512, but > > it > > > > > seems > > > > > > > that > > > > > > > > > the > > > > > > > > > > > > > current > > > > > > > > > > > > > > > > design might struggle with scalability in > such > > > > cases. > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > You see it good, the current implementation > keeps > > > > state > > > > > > > for a > > > > > > > > > > > single > > > > > > > > > > > > > key > > > > > > > > > > > > > > in > > > > > > > > > > > > > > > memory. > > > > > > > > > > > > > > > Back in the days we've considered this > potential > > > > issue > > > > > > and > > > > > > > > > > > concluded > > > > > > > > > > > > > that > > > > > > > > > > > > > > > this is not necessarily > > > > > > > > > > > > > > > needed for the initial version and can be done > > as a > > > > > later > > > > > > > > > > > > improvement. > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > Up until now we've seen even in TB savepoints > > that > > > > the > > > > > > > number > > > > > > > > > of > > > > > > > > > > > keys > > > > > > > > > > > > > can > > > > > > > > > > > > > > > be extremely huge but not the per key state > > itself. > > > > > > > > > > > > > > > But again, this is a good feature as-is and can > > be > > > > > > handled > > > > > > > > in a > > > > > > > > > > > > > separate > > > > > > > > > > > > > > > jira. > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > Best, > > > > > > > > > > > > > > > > Shengkai > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > [1] > > > > > > > > > > > > https://www.postgresql.org/docs/current/view-pg-tables.html > > > > > > > > > > > > > > > > [2] > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > https://dev.mysql.com/doc/refman/8.4/en/information-schema-tables-table.html > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > Gabor Somogyi <gabor.g.somo...@gmail.com> > > > > > 于2025年3月3日周一 > > > > > > > > > > 02:00写道: > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > Hi Zakelly, > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > In order to shoot for simplicity `METADATA > > > > VIRTUAL` > > > > > > as > > > > > > > > key > > > > > > > > > > > words > > > > > > > > > > > > > for > > > > > > > > > > > > > > > > > definition is the target. > > > > > > > > > > > > > > > > > When it's not super complex the latter can > be > > > > added > > > > > > > too. > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > BR, > > > > > > > > > > > > > > > > > G > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > On Sun, Mar 2, 2025 at 3:37 PM Zakelly Lan > < > > > > > > > > > > > > zakelly....@gmail.com> > > > > > > > > > > > > > > > > wrote: > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > Hi Gabor, > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > +1 for this. > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > Will the metadata column use `METADATA > > > VIRTUAL` > > > > > as > > > > > > > key > > > > > > > > > > words > > > > > > > > > > > > for > > > > > > > > > > > > > > > > > > definition, or `METADATA FROM xxx > VIRTUAL` > > > for > > > > > > > > renaming, > > > > > > > > > > just > > > > > > > > > > > > > like > > > > > > > > > > > > > > > the > > > > > > > > > > > > > > > > > > Kafka table? > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > Best, > > > > > > > > > > > > > > > > > > Zakelly > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > On Sat, Mar 1, 2025 at 1:31 PM Gabor > > Somogyi > > > < > > > > > > > > > > > > > > > > gabor.g.somo...@gmail.com> > > > > > > > > > > > > > > > > > > wrote: > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > Hi All, > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > I'd like to start a discussion of > > FLIP-512: > > > > Add > > > > > > > meta > > > > > > > > > > > > > information > > > > > > > > > > > > > > to > > > > > > > > > > > > > > > > SQL > > > > > > > > > > > > > > > > > > > state connector [1]. > > > > > > > > > > > > > > > > > > > Feel free to add your thoughts to make > > this > > > > > > feature > > > > > > > > > > better. > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > [1] > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > https://cwiki.apache.org/confluence/display/FLINK/FLIP-512%3A+Add+meta+information+to+SQL+state+connector > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > BR, > > > > > > > > > > > > > > > > > > > G > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > >