Hi All, I've also lack of the knowledge of PTF so I've read just the motivation part:
"The SQL 2016 standard introduced a way of defining custom SQL operators defined by ISO/IEC 19075-7:2021 (Part 7: Polymorphic table functions). ~200 pages define how this new kind of function can consume and produce tables with various execution properties. Unfortunately, this part of the standard is not publicly available." Of course we can take a look at some examples but do we really want to expose state data with this construct which is described in ~200 pages and part of the standard is not publicly available? π I mean the dataset is couple of rows and the use-case is join with another table like with state data. If somebody can give advantages I would buy that but from my limited understanding this would be an overkill here. BR, G On Wed, Mar 26, 2025 at 8:28β―AM Gyula FΓ³ra <gyula.f...@gmail.com> wrote: > Hi Zakelly , Shengkai! > > I don't know too much about PTFs, it would be interesting to see how the > usage would look in practice. > > Do you have some mockup/example in mind how the PTF would look for example > when want to: > - Simply display/aggregate whats in the metadata > - Join keyed state with some metadata columns > > Thanks > Gyula > > On Wed, Mar 26, 2025 at 7:33β―AM Zakelly Lan <zakelly....@gmail.com> wrote: > > > Hi everyone, > > > > I'm fine with a seperate SQL connector for metadata, so maybe we could > > update the FLIP about our discussion? And Shengkai provides a PTF > > implementation, does that also meet the requirement? > > > > > > Best, > > Zakelly > > > > On Thu, Mar 20, 2025 at 4:47β―PM Gabor Somogyi <gabor.g.somo...@gmail.com > > > > wrote: > > > > > Hi All, > > > > > > @Zakelly: Gyula summarised it correctly what I meant so please treat > the > > > content as mine. > > > As an addition I'm not against to add CLI at all, I'm just stating that > > in > > > some cases like this, users would like to have > > > a self-serving solution where they can provide SQL statements which can > > > trigger alerts automatically. > > > > > > My personal opinion is that CLI would be beneficial for several cases. > A > > > good example is when users want to restart job > > > from specific Kafka offsets which are persisted in a savepoint. For > such > > > scenario users are more than happy since they > > > expect manual intervention with full control. So all in all one can > count > > > on my +1 when CLI FLIP would come up... > > > > > > BR, > > > G > > > > > > > > > On Thu, Mar 20, 2025 at 8:20β―AM Gyula FΓ³ra <gyula.f...@gmail.com> > wrote: > > > > > >> Hi! > > >> > > >> @Zakelly Lan <zakelly....@gmail.com> > > >> I think what Gabor means is that users want to have predefined SQL > > scripts > > >> to perform state analysis tasks to debug/identify problems. > > >> Such as write a SQL script that joins the metadata table with the > state > > >> and > > >> do some analytics on it. > > >> > > >> If we have a meta table then the SQL script that can do this is fixed > > and > > >> users can trigger this on demand by simply providing a new savepoint > > path. > > >> > > >> If we have a different mechanism to extract metadata that is not SQL > > >> native > > >> then manual steps need to be executed and a custom SQL script would > need > > >> to > > >> be written that adds the manually extracted metadata into the script. > > >> > > >> Cheers, > > >> Gyula > > >> > > >> On Thu, Mar 20, 2025 at 4:32β―AM Zakelly Lan <zakelly....@gmail.com> > > >> wrote: > > >> > > >> > Hi all, > > >> > > > >> > Thanks for your answers! Getting everyone aligned on this topic is > > >> > challenging, but itβs definitely worth the effort since it will help > > >> > streamline things moving forward. > > >> > > > >> > @Gabor are you saying that users are using some scripts to define > the > > >> SQL > > >> > metadata connector and get the information, right? If so, would a > CLI > > >> tool > > >> > be more convenient? It's easy to invoke and can get the result > > swiftly. > > >> And > > >> > there should be some other systems to track the checkpoint lineage > and > > >> > analyze if there are outliers in metadata (e.g. state size of one > > >> operator) > > >> > right? Well, maybe I missed something so please correct me if I'm > > wrong. > > >> > > > >> > I think the overall vision in Flink SQL is to provide a SQL native > > >> > > environment where we can serve complex use-cases like you would > > expect > > >> > in a > > >> > > regular database. > > >> > > > >> > > > >> > @Gyula Well, this is a good point. From the perspective of > > comprehensive > > >> > SQL experience, I'd +1 for treating metadata as data. Although I > doubt > > >> if > > >> > there is a need for processing metadata, I won't be against a > separate > > >> > connector. > > >> > > > >> > Regarding the CLI tool, I still think itβs worth implementing. Such > a > > >> tool > > >> > could provide savepoint information before resuming from a > savepoint, > > >> which > > >> > would enhance the user experience in CLI-based workflows. It would > be > > >> good > > >> > if someone could implement this feature. We shouldnβt worry about > > >> whether > > >> > this tool might be retired in the future. Regardless of the > SQL-based > > >> > solution we eventually adopt, this capability will remain essential > > for > > >> CLI > > >> > users. This is another topic. > > >> > > > >> > > > >> > Best, > > >> > Zakelly > > >> > > > >> > > > >> > On Thu, Mar 20, 2025 at 10:37β―AM Shengkai Fang <fskm...@gmail.com> > > >> wrote: > > >> > > > >> > > Hi. > > >> > > > > >> > > After reading the doc[1], I think Spark provides a function for > > users > > >> to > > >> > > consume the metadata from the savepoint. In Flink SQL, similar > > >> > > functionality is implemented through Polymorphic Table Functions > > >> (PTF) as > > >> > > proposed in FLIP-440[2]. Below is a code example[3] illustrating > > this > > >> > > concept: > > >> > > > > >> > > ``` > > >> > > public static class ScalarArgsFunction extends > > >> > > TestProcessTableFunctionBase { > > >> > > public void eval(Integer i, Boolean b) { > > >> > > collectObjects(i, b); > > >> > > } > > >> > > } > > >> > > ``` > > >> > > > > >> > > ``` > > >> > > INSERT INTO sink SELECT * FROM f(i => 42, b => CAST('TRUE' AS > > >> BOOLEAN)) > > >> > > `` > > >> > > > > >> > > So we can add a builtin function named `read_state_metadata` to > read > > >> > > savepoint data. > > >> > > > > >> > > Best, > > >> > > Shengkai > > >> > > > > >> > > [1] > > >> > > > > >> > > > > >> > > > >> > > > https://docs.databricks.com/aws/en/structured-streaming/read-state?language=SQL > > >> > > [2] > > >> > > > > >> > > > >> > > > https://cwiki.apache.org/confluence/pages/viewpage.action?pageId=298781093 > > >> > > [3] > > >> > > > > >> > > > > >> > > > >> > > > https://github.com/apache/flink/blob/master/flink-table/flink-table-planner/src/test/java/org/apache/flink/table/planner/plan/nodes/exec/stream/ProcessTableFunctionTestPrograms.java#L140 > > >> > > > > >> > > Gyula FΓ³ra <gyula.f...@gmail.com> δΊ2025εΉ΄3ζ19ζ₯ε¨δΈ 18:37ειοΌ > > >> > > > > >> > > > Hi All! > > >> > > > > > >> > > > Thank you for the answers and concerns from everyone. > > >> > > > > > >> > > > On the CLI vs State Metadata Connector/Table question I would > also > > >> like > > >> > > to > > >> > > > step back a little and look at the bigger picture. > > >> > > > > > >> > > > I think the overall vision in Flink SQL is to provide a SQL > native > > >> > > > environment where we can serve complex use-cases like you would > > >> expect > > >> > > in a > > >> > > > regular database. > > >> > > > Most features, developments in the recent years have gone this > > way. > > >> > > > > > >> > > > The State Metadata Table would be a natural and straightforward > > fit > > >> > here. > > >> > > > So from my side, +1 for that. > > >> > > > > > >> > > > However I could understand if we are not ready to add a new > > >> > > > connector/format due to maintenance concerns (and in general > > concern > > >> > > about > > >> > > > the design). > > >> > > > If that's the issue then we should spend more time on the design > > to > > >> get > > >> > > > comfortable with the approach and seek feedback from the wider > > >> > community > > >> > > > > > >> > > > I am -1 for the CLI/tooling approach as that will not provide > the > > >> > > > featureset we are looking for that is not already covered by the > > >> Java > > >> > > > connector. And that approach would come with the same > maintenance > > >> > > > implications. > > >> > > > > > >> > > > Cheers > > >> > > > Gyula > > >> > > > > > >> > > > > > >> > > > On Wed, Mar 19, 2025 at 11:24β―AM Gabor Somogyi < > > >> > > gabor.g.somo...@gmail.com> > > >> > > > wrote: > > >> > > > > > >> > > > > Hi Zaklely, Shengkai > > >> > > > > > > >> > > > > Several topics are going on so adding gist answers to them. > When > > >> some > > >> > > > topic > > >> > > > > is not touched please highlight it. > > >> > > > > > > >> > > > > @Shengkai: I've read through all the previous FLIPs related > > >> catalogs > > >> > > and > > >> > > > if > > >> > > > > we would like to keep the concepts there > > >> > > > > then one-to-one mapping relationship between savepoint and > > catalog > > >> > is a > > >> > > > > reasonable direction. In short I'm happy that > > >> > > > > you've highlighted this and agree as a whole. I've written it > > down > > >> > > > > previously, just want to double confirm that state catalog is > > >> > > > > essential and planned. When we reach this point then your > input > > is > > >> > more > > >> > > > > than welcome. > > >> > > > > > > >> > > > > @Zakelly: We've tried the CLI and separate library approaches > > with > > >> > > users > > >> > > > > already and these are not something which is welcome because > of > > >> the > > >> > > > > following: > > >> > > > > * Users want to have automated tasks and not manual > CLI/library > > >> > output > > >> > > > > parsing. This can be hacked around but our experience is > > negative > > >> on > > >> > > this > > >> > > > > because it's just brittle. > > >> > > > > * From development perspective It's way much bigger effort > than > > a > > >> > > > connector > > >> > > > > (hard to test, packaging/version handling is and extra layer > of > > >> > > > complexity, > > >> > > > > external FS authentication is pain for users, expecting them > to > > >> > > download > > >> > > > > savepoints also) > > >> > > > > * Purely personal opinion but if we would find better ways > later > > >> then > > >> > > > > retire a CLI is not more lightweight than retire a connector > > >> > > > > > > >> > > > > > It would be great if you give some examples on how user > could > > >> > > leverage > > >> > > > > the separate connector to process the metadata. > > >> > > > > > > >> > > > > The most simplest cases: > > >> > > > > * give me the overgroving state uids > > >> > > > > * give me the not known (new or renamed) state uids > > >> > > > > * give me the state uids where state size drastically dropped > > >> compare > > >> > > to > > >> > > > a > > >> > > > > previous savepoint (accidental state loss) > > >> > > > > > > >> > > > > Since it was mentioned: as a general offtopic teaser, yeah it > > >> would > > >> > be > > >> > > > good > > >> > > > > to have some sort of checkpoint/savepoint lineage or however > we > > >> call > > >> > > it. > > >> > > > > Since we've not yet reached this point there are no technical > > >> > details, > > >> > > > it's > > >> > > > > more like a vision. It's a common pattern that > > >> > > > > jobs are physically running but somehow the state processing > is > > >> stuck > > >> > > and > > >> > > > > it would be good to add some way to find it out automatically. > > >> > > > > The important saying here is automation and not manual > > evaluation > > >> > since > > >> > > > > handling 10k+ jobs is just not allowing that. > > >> > > > > > > >> > > > > BR, > > >> > > > > G > > >> > > > > > > >> > > > > > > >> > > > > On Wed, Mar 19, 2025 at 6:46β―AM Shengkai Fang < > > fskm...@gmail.com> > > >> > > wrote: > > >> > > > > > > >> > > > > > Hi, All. > > >> > > > > > > > >> > > > > > About State Catalog, I want to share more thoughts about > this. > > >> > > > > > > > >> > > > > > In the initial design concept, I understood that a savepoint > > >> and a > > >> > > > state > > >> > > > > > catalog have a one-to-one mapping relationship. Each > operator > > >> > > > corresponds > > >> > > > > > to a database, and the state of each operator is represented > > as > > >> > > > > individual > > >> > > > > > tables. The rationale behind this design is: > > >> > > > > > > > >> > > > > > *State Diversity*: An operator may involve multiple types of > > >> > states. > > >> > > > For > > >> > > > > > example, in our VVR design, a "multi-join" operator uses > keyed > > >> > states > > >> > > > for > > >> > > > > > two input streams and a broadcast state for the third > stream. > > >> This > > >> > > > makes > > >> > > > > it > > >> > > > > > challenging to represent all states of an operator within a > > >> single > > >> > > > table. > > >> > > > > > *Scalability*: Internally, an operator might have multiple > > keyed > > >> > > states > > >> > > > > > (e.g., value state and list state). However, large list > states > > >> may > > >> > > not > > >> > > > > fit > > >> > > > > > entirely in memory. To address this, we recommend > implementing > > >> each > > >> > > > state > > >> > > > > > as a separate table. > > >> > > > > > > > >> > > > > > To resolve the loosely coupled relationships between > operator > > >> > states, > > >> > > > we > > >> > > > > > propose embedding predefined views within the catalog. These > > >> views > > >> > > > > simplify > > >> > > > > > user understanding of operator implementations and provide a > > >> more > > >> > > > > intuitive > > >> > > > > > perspective. For instance, a join operator may have multiple > > >> state > > >> > > > > > implementations (depending on whether the join key includes > > >> unique > > >> > > > > > attributes), but users primarily care about the data > > associated > > >> > with > > >> > > a > > >> > > > > > specific join key across input streams. > > >> > > > > > > > >> > > > > > Returning to the one-to-one mapping between savepoints and > > >> > catalogs, > > >> > > we > > >> > > > > aim > > >> > > > > > to manage multiple user state catalogs through a catalog > > store. > > >> > When > > >> > > a > > >> > > > > user > > >> > > > > > triggers a savepoint for a job on the platform: > > >> > > > > > > > >> > > > > > 1. The platform sends a REST request to the JobManager. > > >> > > > > > 2. Simultaneously, it registers a new state catalog in the > > >> catalog > > >> > > > store, > > >> > > > > > enabling immediate analysis of state data on the platform. > > >> > > > > > 3. Deleting a savepoint would also trigger the removal of > its > > >> > > > associated > > >> > > > > > catalog. > > >> > > > > > > > >> > > > > > This vision assumes that states are self-describing or that > a > > >> state > > >> > > > > > metaservice is introduced to analyze savepoint structures. > > >> > > > > > > > >> > > > > > > How can users create logic to identify differences between > > >> > multiple > > >> > > > > > savepoints? > > >> > > > > > > > >> > > > > > Since savepoints and state catalogs are one-to-one mapped, > > users > > >> > can > > >> > > > > query > > >> > > > > > metadata via their respective catalogs. For example: > > >> > > > > > > > >> > > > > > 1. > > `savepoint-${id}`.`system`.`metadata_table`.`<operator-name>` > > >> > > > provides > > >> > > > > > operator-specific metadata (e.g., state size, type). > > >> > > > > > 2. Comparing metadata tables (e.g., schema versions, state > > entry > > >> > > > counts) > > >> > > > > > across catalogs reveals structural or quantitative > > differences. > > >> > > > > > 3. For deeper analysis, users could write SQL queries to > > compare > > >> > > > specific > > >> > > > > > state partitions or leverage the metaservice to track state > > >> > evolution > > >> > > > > > (e.g., added/removed operators, modified state > > configurations). > > >> > > > > > > > >> > > > > > If we plan to introduce a state catalog in the future, I > would > > >> lean > > >> > > > > toward > > >> > > > > > using metadata tables. If a utility tool can address the > > >> challenges > > >> > > we > > >> > > > > > face, could we avoid introducing an additional connector? > > >> > > > > > > > >> > > > > > Best, > > >> > > > > > Shengkai > > >> > > > > > > > >> > > > > > Gyula FΓ³ra <gyula.f...@gmail.com> δΊ2025εΉ΄3ζ17ζ₯ε¨δΈ 20:25ειοΌ > > >> > > > > > > > >> > > > > > > Hi All! > > >> > > > > > > > > >> > > > > > > Without going into too much detail here are my 2 cents > > >> regarding > > >> > > the > > >> > > > > > > virtual column / catalog metadata / table (connector) > > >> discussion > > >> > > for > > >> > > > > the > > >> > > > > > > State metadata. > > >> > > > > > > > > >> > > > > > > State metadata such as the types of states, their > > properties, > > >> > > names, > > >> > > > > > sizes > > >> > > > > > > etc are all valuable information that can be used to > enrich > > >> the > > >> > > > > > > computations we do on state. > > >> > > > > > > We can either analyze it standalone (such as discover > > >> anomalies, > > >> > > for > > >> > > > > > large > > >> > > > > > > jobs with many states), across multiple savepoints > (discover > > >> how > > >> > > > state > > >> > > > > > > changed over time) or by joining it with keyed or > non-keyed > > >> state > > >> > > > data > > >> > > > > to > > >> > > > > > > serve more complex queries on the state. > > >> > > > > > > > > >> > > > > > > The only solution that seems to serve all these use-cases > > and > > >> > > > > > requirements > > >> > > > > > > in a straightforward and SQL canonical way is to simply > > expose > > >> > the > > >> > > > > state > > >> > > > > > > metadata as a separate table. This is a metadata table but > > you > > >> > can > > >> > > > also > > >> > > > > > > think of it as data table, it makes no practical > difference > > >> here. > > >> > > > > > > > > >> > > > > > > Once we have a catalog later, the catalog can offer this > > table > > >> > out > > >> > > of > > >> > > > > the > > >> > > > > > > box, the same way databases provide metadata tables. For > > this > > >> to > > >> > > work > > >> > > > > > > however we need another, simpler connector that creates > this > > >> > table. > > >> > > > > > > > > >> > > > > > > +1 for state metadata as a separate connector/table, > instead > > >> of > > >> > > > adding > > >> > > > > > > virtual columns and adhoc catalog metadata that is hard to > > use > > >> > in a > > >> > > > > large > > >> > > > > > > number of queries. > > >> > > > > > > > > >> > > > > > > Cheers, > > >> > > > > > > Gyula > > >> > > > > > > > > >> > > > > > > On Mon, Mar 17, 2025 at 12:44β―PM Gabor Somogyi < > > >> > > > > > gabor.g.somo...@gmail.com> > > >> > > > > > > wrote: > > >> > > > > > > > > >> > > > > > > > 1. State TTL for Value Columns > > >> > > > > > > > > > >> > > > > > > > > Iβm planning on adding this, and we may collaborate on > > it > > >> in > > >> > > the > > >> > > > > > > future. > > >> > > > > > > > > > >> > > > > > > > +1 on this, just ping me. > > >> > > > > > > > > > >> > > > > > > > 2. Metadata Table vs. Metadata Column > > >> > > > > > > > > > >> > > > > > > > After some code digging and POC all I can say that with > > >> heavy > > >> > > > effort > > >> > > > > we > > >> > > > > > > can > > >> > > > > > > > maybe add such changes that we're able to show metadata > > of a > > >> > > > > savepoint > > >> > > > > > > from > > >> > > > > > > > catalog. > > >> > > > > > > > I'm not against that but from user perspective this has > > >> limited > > >> > > > > value, > > >> > > > > > > let > > >> > > > > > > > me explain why. > > >> > > > > > > > > > >> > > > > > > > From high level perspective I see the following which I > > see > > >> > > > agreement > > >> > > > > > on: > > >> > > > > > > > * We should have a catalog which is representing one or > > more > > >> > jobs > > >> > > > > > > savepoint > > >> > > > > > > > data set (future plan) > > >> > > > > > > > * Savepoints should be able to be registered in the > > catalog > > >> > which > > >> > > > are > > >> > > > > > > then > > >> > > > > > > > databases (future plan) > > >> > > > > > > > * There must be a possiblity to create tables from > > databases > > >> > > where > > >> > > > > > users > > >> > > > > > > > can read state data (exists already) > > >> > > > > > > > > > >> > > > > > > > In terms of metadata, If I understand correctly then the > > >> > > suggested > > >> > > > > > > approach > > >> > > > > > > > would be to access > > >> > > > > > > > it from the catalog describe command, right? Adding that > > >> info > > >> > > when > > >> > > > > > > specific > > >> > > > > > > > database describe command > > >> > > > > > > > is executed could be done. > > >> > > > > > > > > > >> > > > > > > > The question is for instance how can users create such a > > >> logic > > >> > > that > > >> > > > > > tells > > >> > > > > > > > them what is > > >> > > > > > > > the difference between multiple savepoints? > > >> > > > > > > > Just to give some examples: > > >> > > > > > > > * per operator size changes between savepoints > > >> > > > > > > > * show values from operator data where state size > reaches > > a > > >> > > > boundary > > >> > > > > > > > * in general "find which checkpoint ruined things" is > > quite > > >> > > common > > >> > > > > > > pattern > > >> > > > > > > > What I would like to highlight here is that from Flink > > >> point of > > >> > > > view > > >> > > > > > the > > >> > > > > > > > metadata can be > > >> > > > > > > > considered as a static side output information but for > > users > > >> > > these > > >> > > > > > values > > >> > > > > > > > are actual real data > > >> > > > > > > > where logic is planned to build around. > > >> > > > > > > > > > >> > > > > > > > > The metadata is more like one-time information instead > > of > > >> a > > >> > > > > streaming > > >> > > > > > > > data that changes all > > >> > > > > > > > the time, so a single connector seems to be an overkill. > > >> > > > > > > > > > >> > > > > > > > State data is also static within a savepoint and that's > > the > > >> > > reason > > >> > > > > why > > >> > > > > > > the > > >> > > > > > > > state processor API is working in batch mode. > > >> > > > > > > > When we handle multiple checkpoints in a streaming > fashion > > >> then > > >> > > > this > > >> > > > > > can > > >> > > > > > > be > > >> > > > > > > > viewed from another angle. > > >> > > > > > > > > > >> > > > > > > > We can come up with more lightweight solution other > than a > > >> new > > >> > > > > > connector > > >> > > > > > > > but enforcing users to parse the catalog > > >> > > > > > > > describe command output in order to compare multiple > > >> savepoints > > >> > > > > doesn't > > >> > > > > > > > sound smooth user experience. > > >> > > > > > > > Honestly I've no other idea how exposing metadata as > real > > >> user > > >> > > data > > >> > > > > so > > >> > > > > > > > waiting on other approaches. > > >> > > > > > > > > > >> > > > > > > > BR, > > >> > > > > > > > G > > >> > > > > > > > > > >> > > > > > > > > > >> > > > > > > > On Thu, Mar 13, 2025 at 2:44β―AM Shengkai Fang < > > >> > fskm...@gmail.com > > >> > > > > > >> > > > > > wrote: > > >> > > > > > > > > > >> > > > > > > > > Looking forward to hearing the good news! > > >> > > > > > > > > > > >> > > > > > > > > Best, > > >> > > > > > > > > Shengkai > > >> > > > > > > > > > > >> > > > > > > > > Gabor Somogyi <gabor.g.somo...@gmail.com> > δΊ2025εΉ΄3ζ12ζ₯ε¨δΈ > > >> > > 22:24ειοΌ > > >> > > > > > > > > > > >> > > > > > > > > > Thanks for both the valuable input! > > >> > > > > > > > > > > > >> > > > > > > > > > Let me take a closer look at the suggestions, like > the > > >> > > Catalog > > >> > > > > > > > > capabilities > > >> > > > > > > > > > and possibility of embedding TypeInformation or > > >> > > > > > > > > > StateDescriptor metadata directly into the raw state > > >> > files... > > >> > > > > > > > > > > > >> > > > > > > > > > BR, > > >> > > > > > > > > > G > > >> > > > > > > > > > > > >> > > > > > > > > > > > >> > > > > > > > > > On Wed, Mar 12, 2025 at 8:17β―AM Shengkai Fang < > > >> > > > fskm...@gmail.com > > >> > > > > > > > >> > > > > > > > wrote: > > >> > > > > > > > > > > > >> > > > > > > > > > > Thanks for Zakelly's clarification. > > >> > > > > > > > > > > > > >> > > > > > > > > > > 1. State TTL for Value Columns > > >> > > > > > > > > > > > > >> > > > > > > > > > > +1 to delay the discussion about this. > > >> > > > > > > > > > > > > >> > > > > > > > > > > 2. Metadata Table vs. Metadata Column > > >> > > > > > > > > > > > > >> > > > > > > > > > > Iβd like to share my perspective on the State > > Catalog > > >> > > > proposal. > > >> > > > > > > While > > >> > > > > > > > > > > introducing this capability is beneficial, there > is > > a > > >> > > > blocker: > > >> > > > > > the > > >> > > > > > > > > > current > > >> > > > > > > > > > > StateBackend architecture does not permit > operators > > to > > >> > > encode > > >> > > > > > > > > > > TypeInformation into the stateβit only preserves > the > > >> > > > > Serializer. > > >> > > > > > > This > > >> > > > > > > > > > > limitation creates an asymmetry, as operators > alone > > >> > retain > > >> > > > > > > knowledge > > >> > > > > > > > of > > >> > > > > > > > > > the > > >> > > > > > > > > > > data structureβs schema. > > >> > > > > > > > > > > > > >> > > > > > > > > > > To address this, I suggest allowing operators to > > embed > > >> > > > > > > > TypeInformation > > >> > > > > > > > > or > > >> > > > > > > > > > > StateDescriptor metadata directly into the raw > state > > >> > files. > > >> > > > > Such > > >> > > > > > a > > >> > > > > > > > > design > > >> > > > > > > > > > > would enable the Catalog to: > > >> > > > > > > > > > > > > >> > > > > > > > > > > 1. Parse state files and programmatically derive > the > > >> > schema > > >> > > > and > > >> > > > > > > > > > structural > > >> > > > > > > > > > > guarantees for each state. > > >> > > > > > > > > > > 2. Leverage existing Flink Table utilities, such > as > > >> > > > > > > > > > > LegacyTypeInfoDataTypeConverter (in > > >> > > > > > > > > org.apache.flink.table.types.utils), > > >> > > > > > > > > > to > > >> > > > > > > > > > > bridge TypeInformation and DataType conversions. > > >> > > > > > > > > > > > > >> > > > > > > > > > > If we can not store the TypeInformation or > > >> > StateDescriptor > > >> > > > into > > >> > > > > > the > > >> > > > > > > > raw > > >> > > > > > > > > > > state files, I am +1 for this FLIP to use metadata > > >> column > > >> > > to > > >> > > > > > > retrieve > > >> > > > > > > > > > > information. > > >> > > > > > > > > > > > > >> > > > > > > > > > > Best, > > >> > > > > > > > > > > Shengkai > > >> > > > > > > > > > > > > >> > > > > > > > > > > > > >> > > > > > > > > > > > > >> > > > > > > > > > > > > >> > > > > > > > > > > Zakelly Lan <zakelly....@gmail.com> δΊ2025εΉ΄3ζ12ζ₯ε¨δΈ > > >> > 12:43ειοΌ > > >> > > > > > > > > > > > > >> > > > > > > > > > > > Hi Gabor and Shengkai, > > >> > > > > > > > > > > > > > >> > > > > > > > > > > > Thanks for sharing your thoughts! This is a long > > >> > > discussion > > >> > > > > and > > >> > > > > > > > sorry > > >> > > > > > > > > > for > > >> > > > > > > > > > > > the late reply (I'm busy catching up with > release > > >> 2.0 > > >> > > these > > >> > > > > > > days). > > >> > > > > > > > > > > > > > >> > > > > > > > > > > > 1. State TTL for Value Columns > > >> > > > > > > > > > > > > > >> > > > > > > > > > > > > > >> > > > > > > > > > > > Let me first clarify your thoughts to ensure I > > >> > understand > > >> > > > > > > > correctly. > > >> > > > > > > > > > > IIUC, > > >> > > > > > > > > > > > there is no persistent configuration for state > TTL > > >> in > > >> > the > > >> > > > > > > > checkpoint. > > >> > > > > > > > > > > While > > >> > > > > > > > > > > > you can infer that TTL is enabled by reading the > > >> > > > serializer, > > >> > > > > > the > > >> > > > > > > > > > > checkpoint > > >> > > > > > > > > > > > itself only stores the last access time for each > > >> value. > > >> > > So > > >> > > > > the > > >> > > > > > > only > > >> > > > > > > > > > thing > > >> > > > > > > > > > > > we can show is the last access time for each > > value. > > >> But > > >> > > it > > >> > > > is > > >> > > > > > not > > >> > > > > > > > > > > required > > >> > > > > > > > > > > > for all state backends to store this, as they > may > > >> > > directly > > >> > > > > > store > > >> > > > > > > > the > > >> > > > > > > > > > > > expired time. This will also increase the > > >> difficulty of > > >> > > > > > > > > implementation > > >> > > > > > > > > > & > > >> > > > > > > > > > > > maintenance. > > >> > > > > > > > > > > > > > >> > > > > > > > > > > > This once again reiterates the importance of > > unified > > >> > > > metadata > > >> > > > > > for > > >> > > > > > > > > > > > checkpoints. Iβm planning on adding this, and we > > may > > >> > > > > > collaborate > > >> > > > > > > on > > >> > > > > > > > > it > > >> > > > > > > > > > in > > >> > > > > > > > > > > > the future. > > >> > > > > > > > > > > > > > >> > > > > > > > > > > > 2. Metadata Table vs. Metadata Column > > >> > > > > > > > > > > > > > >> > > > > > > > > > > > > > >> > > > > > > > > > > > I'm not in favor of adding a new connector for > > >> > metadata. > > >> > > > The > > >> > > > > > > > metadata > > >> > > > > > > > > > is > > >> > > > > > > > > > > > more like one-time information instead of a > > >> streaming > > >> > > data > > >> > > > > that > > >> > > > > > > > > changes > > >> > > > > > > > > > > all > > >> > > > > > > > > > > > the time, so a single connector seems to be an > > >> > overkill. > > >> > > It > > >> > > > > is > > >> > > > > > > not > > >> > > > > > > > > easy > > >> > > > > > > > > > > to > > >> > > > > > > > > > > > withdraw a connector if we have a better > solution > > in > > >> > > > future. > > >> > > > > > I'm > > >> > > > > > > > not > > >> > > > > > > > > > > > familiar with current Catalog capabilities, and > if > > >> it > > >> > > could > > >> > > > > > > extract > > >> > > > > > > > > and > > >> > > > > > > > > > > > show some operator-level information from > > savepoint, > > >> > that > > >> > > > > would > > >> > > > > > > be > > >> > > > > > > > > > great. > > >> > > > > > > > > > > > > > >> > > > > > > > > > > > If the Catalog can't do that, I would consider > the > > >> > > current > > >> > > > > FLIP > > >> > > > > > > to > > >> > > > > > > > > be a > > >> > > > > > > > > > > > compromise solution. > > >> > > > > > > > > > > > > > >> > > > > > > > > > > > And if we have that unified metadata for > > >> > > > checkpoint/savepoint > > >> > > > > > in > > >> > > > > > > > > > future, > > >> > > > > > > > > > > we > > >> > > > > > > > > > > > may directly register savepoint in catalog, and > > >> create > > >> > a > > >> > > > > source > > >> > > > > > > > > without > > >> > > > > > > > > > > > specifying complex columns, as well as describe > > the > > >> > > > savepoint > > >> > > > > > > > catalog > > >> > > > > > > > > > to > > >> > > > > > > > > > > > get the metadata. That's a good solution in my > > mind. > > >> > > > > > > > > > > > > > >> > > > > > > > > > > > > > >> > > > > > > > > > > > Best, > > >> > > > > > > > > > > > Zakelly > > >> > > > > > > > > > > > > > >> > > > > > > > > > > > On Wed, Mar 12, 2025 at 10:35β―AM Shengkai Fang < > > >> > > > > > > fskm...@gmail.com> > > >> > > > > > > > > > > wrote: > > >> > > > > > > > > > > > > > >> > > > > > > > > > > > > Hi Gabor, > > >> > > > > > > > > > > > > > > >> > > > > > > > > > > > > > 2. Adding a new connector with > > >> `savepoint-metadata` > > >> > > > > > > > > > > > > > > >> > > > > > > > > > > > > I would argue against introducing a new > > connector > > >> > type > > >> > > > > named > > >> > > > > > > > > > > > > savepoint-metadata, as the existing Catalog > > >> mechanism > > >> > > can > > >> > > > > > > > > inherently > > >> > > > > > > > > > > > > provide the necessary connector factory > > >> capabilities. > > >> > > > Iβve > > >> > > > > > > > detailed > > >> > > > > > > > > > > this > > >> > > > > > > > > > > > > proposal in branch[1]. Please take a moment to > > >> review > > >> > > it. > > >> > > > > > > > > > > > > > > >> > > > > > > > > > > > > If we introduce a connector named > > >> > `savepoint-metadata`, > > >> > > > it > > >> > > > > > > means > > >> > > > > > > > > user > > >> > > > > > > > > > > can > > >> > > > > > > > > > > > > create a temporary table with connector > > >> > > > > `savepoint-metadata` > > >> > > > > > > and > > >> > > > > > > > > the > > >> > > > > > > > > > > > > connector needs to check whether table schema > is > > >> same > > >> > > to > > >> > > > > the > > >> > > > > > > > schema > > >> > > > > > > > > > we > > >> > > > > > > > > > > > > proposed in the FLIP. On the other hand, it's > > not > > >> > easy > > >> > > > work > > >> > > > > > for > > >> > > > > > > > > > others > > >> > > > > > > > > > > to > > >> > > > > > > > > > > > > users a metadata table with same schema. > > >> > > > > > > > > > > > > > > >> > > > > > > > > > > > > [1] > > >> > > > > > > > > > > > > > > >> > > > > > > > > > > > > > > >> > > > > > > > > > > > > > >> > > > > > > > > > > > > >> > > > > > > > > > > > >> > > > > > > > > > > >> > > > > > > > > > >> > > > > > > > > >> > > > > > > > >> > > > > > > >> > > > > > >> > > > > >> > > > >> > > > https://github.com/apache/flink/compare/master...fsk119:flink:state-metadata?expand=1#diff-712a7bc92fe46c405fb0e61b475bb2a005cb7a72bab7df28bbb92744bcb5f465R63 > > >> > > > > > > > > > > > > > > >> > > > > > > > > > > > > Best, > > >> > > > > > > > > > > > > Shengkai > > >> > > > > > > > > > > > > > > >> > > > > > > > > > > > > Gabor Somogyi <gabor.g.somo...@gmail.com> > > >> > > δΊ2025εΉ΄3ζ11ζ₯ε¨δΊ > > >> > > > > > > 16:56ειοΌ > > >> > > > > > > > > > > > > > > >> > > > > > > > > > > > > > Hi Shengkai, > > >> > > > > > > > > > > > > > > > >> > > > > > > > > > > > > > > 1. State TTL for Value Columns > > >> > > > > > > > > > > > > > > > >> > > > > > > > > > > > > > From directional perspective I agree your > idea > > >> how > > >> > it > > >> > > > can > > >> > > > > > be > > >> > > > > > > > > > > > implemented. > > >> > > > > > > > > > > > > > Previously I've mentioned that TTL > information > > >> is > > >> > not > > >> > > > > > exposed > > >> > > > > > > > on > > >> > > > > > > > > > the > > >> > > > > > > > > > > > > state > > >> > > > > > > > > > > > > > processor API (which the SQL state connector > > >> uses > > >> > to > > >> > > > read > > >> > > > > > > data) > > >> > > > > > > > > > > > > > and unless somebody show me the opposite > this > > >> FLIP > > >> > is > > >> > > > not > > >> > > > > > > going > > >> > > > > > > > > to > > >> > > > > > > > > > > > > address > > >> > > > > > > > > > > > > > this to avoid feature creep. Our users are > > also > > >> > > > > interested > > >> > > > > > in > > >> > > > > > > > TTL > > >> > > > > > > > > > so > > >> > > > > > > > > > > > > > sooner or later we're going to expose it, > this > > >> is > > >> > > > matter > > >> > > > > of > > >> > > > > > > > > > > scheduling. > > >> > > > > > > > > > > > > > > > >> > > > > > > > > > > > > > > 2. Adding a new connector with > > >> > `savepoint-metadata` > > >> > > > > > > > > > > > > > > > >> > > > > > > > > > > > > > Not sure I understand your point at all > > related > > >> > > > > > StateCatalog. > > >> > > > > > > > > First > > >> > > > > > > > > > > of > > >> > > > > > > > > > > > > all > > >> > > > > > > > > > > > > > I can't agree more that StateCatalog is > needed > > >> and > > >> > > is a > > >> > > > > > > planned > > >> > > > > > > > > > > > building > > >> > > > > > > > > > > > > > block in an upcoming > > >> > > > > > > > > > > > > > FLIP but not sure how can it help now? No > > matter > > >> > > what, > > >> > > > > your > > >> > > > > > > > > > knowledge > > >> > > > > > > > > > > > is > > >> > > > > > > > > > > > > > essential when we add StateCatalog. Let me > > >> expose > > >> > my > > >> > > > > > > > > understanding > > >> > > > > > > > > > in > > >> > > > > > > > > > > > > this > > >> > > > > > > > > > > > > > area: > > >> > > > > > > > > > > > > > * First we need create table statements to > > >> access > > >> > > state > > >> > > > > > data > > >> > > > > > > > and > > >> > > > > > > > > > > > metadata > > >> > > > > > > > > > > > > > * When we have that then we can add > > StateCatalog > > >> > > which > > >> > > > > > could > > >> > > > > > > > > > > > potentially > > >> > > > > > > > > > > > > > ease the life of users by for ex. giving > > >> > > off-the-shelf > > >> > > > > > tables > > >> > > > > > > > > > without > > >> > > > > > > > > > > > > > sweating with create table statements > > >> > > > > > > > > > > > > > > > >> > > > > > > > > > > > > > User expectations: > > >> > > > > > > > > > > > > > * See state data (this is fulfilled with the > > >> > existing > > >> > > > > > > > connector) > > >> > > > > > > > > > > > > > * See metadata about state data like TTL > (this > > >> can > > >> > be > > >> > > > > added > > >> > > > > > > as > > >> > > > > > > > > > > metadata > > >> > > > > > > > > > > > > > column as you suggested since it belongs to > > the > > >> > data) > > >> > > > > > > > > > > > > > * See metadata about operators (this can be > > >> added > > >> > > from > > >> > > > > > > > > > > > > savepoint-metadata) > > >> > > > > > > > > > > > > > > > >> > > > > > > > > > > > > > Important to highlight that state data table > > >> format > > >> > > > > differs > > >> > > > > > > > from > > >> > > > > > > > > > > state > > >> > > > > > > > > > > > > > metadata table format. Namely one table has > > rows > > >> > for > > >> > > > > state > > >> > > > > > > > values > > >> > > > > > > > > > and > > >> > > > > > > > > > > > > > another has rows for operators, right? > > >> > > > > > > > > > > > > > I think that's the reason why you've > > pinpointed > > >> out > > >> > > > that > > >> > > > > > the > > >> > > > > > > > > > > suggested > > >> > > > > > > > > > > > > > metadata columns are somewhat clunky. > > >> > > > > > > > > > > > > > > > >> > > > > > > > > > > > > > As a conclusion I agree to add > > ${state-name}_ttl > > >> > > > metadata > > >> > > > > > > > column > > >> > > > > > > > > > > later > > >> > > > > > > > > > > > on > > >> > > > > > > > > > > > > > since it belongs to the state value and > > adding a > > >> > new > > >> > > > > table > > >> > > > > > > type > > >> > > > > > > > > > (like > > >> > > > > > > > > > > > you > > >> > > > > > > > > > > > > > suggested similar to PG [1]) > > >> > > > > > > > > > > > > > for metadata. Please see how Spark does that > > too > > >> > [2]. > > >> > > > > > > > > > > > > > > > >> > > > > > > > > > > > > > If you have better approach then please > > >> elaborate > > >> > > with > > >> > > > > more > > >> > > > > > > > > details > > >> > > > > > > > > > > and > > >> > > > > > > > > > > > > > help me to understand your point. > > >> > > > > > > > > > > > > > > > >> > > > > > > > > > > > > > > Up until now we've seen even in TB > > savepoints > > >> > that > > >> > > > the > > >> > > > > > > number > > >> > > > > > > > > of > > >> > > > > > > > > > > keys > > >> > > > > > > > > > > > > can > > >> > > > > > > > > > > > > > > be extremely huge but not the per key > state > > >> > itself. > > >> > > > > > > > > > > > > > > But again, this is a good feature as-is > and > > >> can > > >> > be > > >> > > > > > handled > > >> > > > > > > > in a > > >> > > > > > > > > > > > > separate > > >> > > > > > > > > > > > > > > jira. > > >> > > > > > > > > > > > > > > > >> > > > > > > > > > > > > > I've just created > > >> > > > > > > > > > https://issues.apache.org/jira/browse/FLINK-37456. > > >> > > > > > > > > > > > > > > > >> > > > > > > > > > > > > > [1] > > >> > > > > > > > > > https://www.postgresql.org/docs/current/view-pg-tables.html > > >> > > > > > > > > > > > > > [2] > > >> > > > > > > > > > > > > > > > >> > > > > > > > > > > > > > > > >> > > > > > > > > > > > > > > >> > > > > > > > > > > > > > >> > > > > > > > > > > > > >> > > > > > > > > > > > >> > > > > > > > > > > >> > > > > > > > > > >> > > > > > > > > >> > > > > > > > >> > > > > > > >> > > > > > >> > > > > >> > > > >> > > > https://www.databricks.com/blog/announcing-state-reader-api-new-statestore-data-source > > >> > > > > > > > > > > > > > > > >> > > > > > > > > > > > > > BR, > > >> > > > > > > > > > > > > > G > > >> > > > > > > > > > > > > > > > >> > > > > > > > > > > > > > > > >> > > > > > > > > > > > > > On Tue, Mar 11, 2025 at 3:55β―AM Shengkai > Fang > > < > > >> > > > > > > > fskm...@gmail.com > > >> > > > > > > > > > > > >> > > > > > > > > > > > wrote: > > >> > > > > > > > > > > > > > > > >> > > > > > > > > > > > > > > Hi, Gabor. Thanks for your response. > > >> > > > > > > > > > > > > > > > > >> > > > > > > > > > > > > > > > 1. State TTL for Value Columns > > >> > > > > > > > > > > > > > > > > >> > > > > > > > > > > > > > > Thank you for addressing the limitations > > here. > > >> > > > > However, I > > >> > > > > > > > > believe > > >> > > > > > > > > > > it > > >> > > > > > > > > > > > > > would > > >> > > > > > > > > > > > > > > be beneficial to further clarify the API > in > > >> this > > >> > > FLIP > > >> > > > > > > > regarding > > >> > > > > > > > > > how > > >> > > > > > > > > > > > > users > > >> > > > > > > > > > > > > > > can specify the TTL column. > > >> > > > > > > > > > > > > > > > > >> > > > > > > > > > > > > > > One potential approach that comes to mind > is > > >> > using > > >> > > a > > >> > > > > > > > > standardized > > >> > > > > > > > > > > > > naming > > >> > > > > > > > > > > > > > > convention such as ${state-name}_ttl for > the > > >> > > metadata > > >> > > > > > > column > > >> > > > > > > > > that > > >> > > > > > > > > > > > > defines > > >> > > > > > > > > > > > > > > the TTL value. In terms of implementation, > > the > > >> > > > > > > > > > listReadableMetadata > > >> > > > > > > > > > > > > > > function could: > > >> > > > > > > > > > > > > > > > > >> > > > > > > > > > > > > > > 1. Read the tableβs columns and > > configuration, > > >> > > > > > > > > > > > > > > 2. Extract all defined state names, and > > >> > > > > > > > > > > > > > > 3. Return a structured list of metadata > > >> entries > > >> > > > > formatted > > >> > > > > > > as > > >> > > > > > > > > > > > > > > ${state-name}_ttl. > > >> > > > > > > > > > > > > > > > > >> > > > > > > > > > > > > > > WDYT? > > >> > > > > > > > > > > > > > > > > >> > > > > > > > > > > > > > > > 2. Adding a new connector with > > >> > > `savepoint-metadata` > > >> > > > > > > > > > > > > > > > > >> > > > > > > > > > > > > > > Introducing a new connector type at this > > stage > > >> > may > > >> > > > > > > > > unnecessarily > > >> > > > > > > > > > > > > > complicate > > >> > > > > > > > > > > > > > > the system. Given that every table already > > >> > belongs > > >> > > > to a > > >> > > > > > > > > Catalog, > > >> > > > > > > > > > > > which > > >> > > > > > > > > > > > > is > > >> > > > > > > > > > > > > > > designed to provide a Factory for building > > >> source > > >> > > or > > >> > > > > sink > > >> > > > > > > > > > > > connectors, I > > >> > > > > > > > > > > > > > > propose integrating a dedicated > StateCatalog > > >> > > instead. > > >> > > > > > This > > >> > > > > > > > > > approach > > >> > > > > > > > > > > > > would > > >> > > > > > > > > > > > > > > allow us to: > > >> > > > > > > > > > > > > > > > > >> > > > > > > > > > > > > > > 1. Leverage the Catalogβs existing > > >> capabilities > > >> > to > > >> > > > > manage > > >> > > > > > > TTL > > >> > > > > > > > > > > > metadata > > >> > > > > > > > > > > > > > > (e.g., state names and TTL logic) without > > >> > > duplicating > > >> > > > > > > > > > > functionality. > > >> > > > > > > > > > > > > > > 2. Provide a unified interface for > connector > > >> > > > > > instantiation > > >> > > > > > > > and > > >> > > > > > > > > > > > metadata > > >> > > > > > > > > > > > > > > handling through the Catalogβs Factory > > >> pattern. > > >> > > > > > > > > > > > > > > > > >> > > > > > > > > > > > > > > Would this design decision better align > with > > >> our > > >> > > > > > > > architectureβs > > >> > > > > > > > > > > > > > > extensibility and reduce redundancy? > > >> > > > > > > > > > > > > > > > > >> > > > > > > > > > > > > > > > Up until now we've seen even in TB > > >> savepoints > > >> > > that > > >> > > > > the > > >> > > > > > > > number > > >> > > > > > > > > > of > > >> > > > > > > > > > > > keys > > >> > > > > > > > > > > > > > can > > >> > > > > > > > > > > > > > > > be extremely huge but not the per key > > state > > >> > > itself. > > >> > > > > > > > > > > > > > > > But again, this is a good feature as-is > > and > > >> can > > >> > > be > > >> > > > > > > handled > > >> > > > > > > > > in a > > >> > > > > > > > > > > > > > separate > > >> > > > > > > > > > > > > > > > jira. > > >> > > > > > > > > > > > > > > > > >> > > > > > > > > > > > > > > +1 for a separate jira. > > >> > > > > > > > > > > > > > > > > >> > > > > > > > > > > > > > > Best, > > >> > > > > > > > > > > > > > > Shengkai > > >> > > > > > > > > > > > > > > > > >> > > > > > > > > > > > > > > Gabor Somogyi <gabor.g.somo...@gmail.com> > > >> > > > > δΊ2025εΉ΄3ζ10ζ₯ε¨δΈ > > >> > > > > > > > > 19:05ειοΌ > > >> > > > > > > > > > > > > > > > > >> > > > > > > > > > > > > > > > Hi Shengkai, > > >> > > > > > > > > > > > > > > > > > >> > > > > > > > > > > > > > > > Please see my comments inline. > > >> > > > > > > > > > > > > > > > > > >> > > > > > > > > > > > > > > > BR, > > >> > > > > > > > > > > > > > > > G > > >> > > > > > > > > > > > > > > > > > >> > > > > > > > > > > > > > > > > > >> > > > > > > > > > > > > > > > On Mon, Mar 3, 2025 at 7:07β―AM Shengkai > > >> Fang < > > >> > > > > > > > > > fskm...@gmail.com> > > >> > > > > > > > > > > > > > wrote: > > >> > > > > > > > > > > > > > > > > > >> > > > > > > > > > > > > > > > > Hi, Gabor. Thanks for your the FLIP. I > > >> have > > >> > > some > > >> > > > > > > > questions > > >> > > > > > > > > > > about > > >> > > > > > > > > > > > > the > > >> > > > > > > > > > > > > > > > FLIP: > > >> > > > > > > > > > > > > > > > > > > >> > > > > > > > > > > > > > > > > 1. State TTL for Value Columns > > >> > > > > > > > > > > > > > > > > How can users retrieve the state TTL > > >> > > > (Time-to-Live) > > >> > > > > > for > > >> > > > > > > > > each > > >> > > > > > > > > > > > value > > >> > > > > > > > > > > > > > > > column? > > >> > > > > > > > > > > > > > > > > From my understanding of the current > > >> design, > > >> > it > > >> > > > > seems > > >> > > > > > > > that > > >> > > > > > > > > > this > > >> > > > > > > > > > > > > > > > > functionality is not supported. Could > > you > > >> > > clarify > > >> > > > > if > > >> > > > > > > > there > > >> > > > > > > > > > are > > >> > > > > > > > > > > > > plans > > >> > > > > > > > > > > > > > to > > >> > > > > > > > > > > > > > > > > address this limitation? > > >> > > > > > > > > > > > > > > > > > > >> > > > > > > > > > > > > > > > > > >> > > > > > > > > > > > > > > > Since the state processor API is not yet > > >> > exposing > > >> > > > > this > > >> > > > > > > > > > > information > > >> > > > > > > > > > > > > this > > >> > > > > > > > > > > > > > > > would require several steps. > > >> > > > > > > > > > > > > > > > First, the state processor API support > > >> needs to > > >> > > be > > >> > > > > > added > > >> > > > > > > > > which > > >> > > > > > > > > > > can > > >> > > > > > > > > > > > be > > >> > > > > > > > > > > > > > > then > > >> > > > > > > > > > > > > > > > exposed on the SQL API. > > >> > > > > > > > > > > > > > > > This is definitely a future improvement > > >> which > > >> > is > > >> > > > > useful > > >> > > > > > > and > > >> > > > > > > > > can > > >> > > > > > > > > > > be > > >> > > > > > > > > > > > > > > handled > > >> > > > > > > > > > > > > > > > in a separate jira. > > >> > > > > > > > > > > > > > > > > > >> > > > > > > > > > > > > > > > > > >> > > > > > > > > > > > > > > > > 2. Metadata Table vs. Metadata Column > > >> > > > > > > > > > > > > > > > > The metadata information described in > > the > > >> > FLIP > > >> > > > > > appears > > >> > > > > > > to > > >> > > > > > > > > be > > >> > > > > > > > > > > > > intended > > >> > > > > > > > > > > > > > > to > > >> > > > > > > > > > > > > > > > > describe the state files stored at a > > >> specific > > >> > > > > > location. > > >> > > > > > > > To > > >> > > > > > > > > > me, > > >> > > > > > > > > > > > this > > >> > > > > > > > > > > > > > > > concept > > >> > > > > > > > > > > > > > > > > aligns more closely with system tables > > >> like > > >> > > > > pg_tables > > >> > > > > > > in > > >> > > > > > > > > > > > PostgreSQL > > >> > > > > > > > > > > > > > [1] > > >> > > > > > > > > > > > > > > > or > > >> > > > > > > > > > > > > > > > > the INFORMATION_SCHEMA in MySQL [2]. > > >> > > > > > > > > > > > > > > > > > > >> > > > > > > > > > > > > > > > > > >> > > > > > > > > > > > > > > > Adding a new connector with > > >> > `savepoint-metadata` > > >> > > > is a > > >> > > > > > > > > > possibility > > >> > > > > > > > > > > > > where > > >> > > > > > > > > > > > > > > we > > >> > > > > > > > > > > > > > > > can create such functionality. > > >> > > > > > > > > > > > > > > > I'm not against that, just want to have > a > > >> > common > > >> > > > > > > agreement > > >> > > > > > > > > that > > >> > > > > > > > > > > we > > >> > > > > > > > > > > > > > would > > >> > > > > > > > > > > > > > > > like to move that direction. > > >> > > > > > > > > > > > > > > > (As a side note not just PG but Spark > also > > >> has > > >> > > > > similar > > >> > > > > > > > > approach > > >> > > > > > > > > > > > and I > > >> > > > > > > > > > > > > > > > basically like the idea). > > >> > > > > > > > > > > > > > > > If we would go that direction savepoint > > >> > metadata > > >> > > > can > > >> > > > > be > > >> > > > > > > > > reached > > >> > > > > > > > > > > in > > >> > > > > > > > > > > > a > > >> > > > > > > > > > > > > > way > > >> > > > > > > > > > > > > > > > that one row would represent > > >> > > > > > > > > > > > > > > > an operator with it's values something > > like > > >> > this: > > >> > > > > > > > > > > > > > > > > > >> > > > > > > > > > > > > > > > > > >> > > > > > > > > > > > > > > > > > >> > > > > > > > > > > > > > > > > >> > > > > > > > > > > > > > > > >> > > > > > > > > > > > > > > >> > > > > > > > > > > > > > >> > > > > > > > > > > > > >> > > > > > > > > > > > >> > > > > > > > > > > >> > > > > > > > > > >> > > > > > > > > >> > > > > > > > >> > > > > > > >> > > > > > >> > > > > >> > > > >> > > > βββββββββββ¬ββββββββββ¬ββββββββββ¬ββββββββββ¬ββββββββββ¬ββββββββββ¬ββββββββββ¬βββββββββ > > >> > > > > > > > > > > > > > > > > > >> > > > > > > > > > > > > > > > > > >> > > > > > > > > > > > > > > > > >> > > > > > > > > > > > > > > > >> > > > > > > > > > > > > > > >> > > > > > > > > > > > > > >> > > > > > > > > > > > > >> > > > > > > > > > > > >> > > > > > > > > > > >> > > > > > > > > > >> > > > > > > > > >> > > > > > > > >> > > > > > > >> > > > > > >> > > > > >> > > > >> > > > βoperatorNβoperatorUβoperatorHβparalleliβmaxParallβsubtaskStβcoordinatβtotalStaβ > > >> > > > > > > > > > > > > > > > βame βid βash βsm > > >> βelism > > >> > > > > > > > > > > > > > > > βatesCountβorStateSiβtesSizeIβ > > >> > > > > > > > > > > > > > > > β β β β > β > > >> > > β > > >> > > > > > > > > > > > > > > > βzeInBytesβnBytes β > > >> > > > > > > > > > > > > > > > > > >> > > > > > > > > > > > > > > > > > >> > > > > > > > > > > > > > > > > >> > > > > > > > > > > > > > > > >> > > > > > > > > > > > > > > >> > > > > > > > > > > > > > >> > > > > > > > > > > > > >> > > > > > > > > > > > >> > > > > > > > > > > >> > > > > > > > > > >> > > > > > > > > >> > > > > > > > >> > > > > > > >> > > > > > >> > > > > >> > > > >> > > > βββββββββββΌββββββββββΌββββββββββΌββββββββββΌββββββββββΌββββββββββΌββββββββββΌβββββββββ€ > > >> > > > > > > > > > > > > > > > βSource: βdatagen-sβ47aee9439β2 > > β128 > > >> > > > β2 > > >> > > > > > > > > β16 > > >> > > > > > > > > > > > > > > > β546 β > > >> > > > > > > > > > > > > > > > βdatagen-sβource-uidβ4d6ea26e2β > β > > >> > > β > > >> > > > > > > > β > > >> > > > > > > > > > > > > β > > >> > > > > > > > > > > > > > > > β > > >> > > > > > > > > > > > > > > > βource β βd544bef0aβ > β > > >> > > β > > >> > > > > > > > β > > >> > > > > > > > > > > > > β > > >> > > > > > > > > > > > > > > > β > > >> > > > > > > > > > > > > > > > β β β37bb5 β > β > > >> > > β > > >> > > > > > > > β > > >> > > > > > > > > > > > > β > > >> > > > > > > > > > > > > > > > β > > >> > > > > > > > > > > > > > > > > > >> > > > > > > > > > > > > > > > > > >> > > > > > > > > > > > > > > > > >> > > > > > > > > > > > > > > > >> > > > > > > > > > > > > > > >> > > > > > > > > > > > > > >> > > > > > > > > > > > > >> > > > > > > > > > > > >> > > > > > > > > > > >> > > > > > > > > > >> > > > > > > > > >> > > > > > > > >> > > > > > > >> > > > > > >> > > > > >> > > > >> > > > βββββββββββΌββββββββββΌββββββββββΌββββββββββΌββββββββββΌββββββββββΌββββββββββΌβββββββββ€ > > >> > > > > > > > > > > > > > > > βlong-udf-βlong-udf-β6ed3f40bfβ2 > > β128 > > >> > > > β2 > > >> > > > > > > > > β0 > > >> > > > > > > > > > > > > > β0 > > >> > > > > > > > > > > > > > > > β > > >> > > > > > > > > > > > > > > > βwith-mastβwith-mastβf3c8dfcdfβ > β > > >> > > β > > >> > > > > > > > β > > >> > > > > > > > > > > > > β > > >> > > > > > > > > > > > > > > > β > > >> > > > > > > > > > > > > > > > βer-hook βer-hook-uβcb95128a1β > β > > >> > > β > > >> > > > > > > > β > > >> > > > > > > > > > > > > β > > >> > > > > > > > > > > > > > > > β > > >> > > > > > > > > > > > > > > > β βid β018f1 β > β > > >> > > β > > >> > > > > > > > β > > >> > > > > > > > > > > > > β > > >> > > > > > > > > > > > > > > > β > > >> > > > > > > > > > > > > > > > > > >> > > > > > > > > > > > > > > > > > >> > > > > > > > > > > > > > > > > >> > > > > > > > > > > > > > > > >> > > > > > > > > > > > > > > >> > > > > > > > > > > > > > >> > > > > > > > > > > > > >> > > > > > > > > > > > >> > > > > > > > > > > >> > > > > > > > > > >> > > > > > > > > >> > > > > > > > >> > > > > > > >> > > > > > >> > > > > >> > > > >> > > > βββββββββββΌββββββββββΌββββββββββΌββββββββββΌββββββββββΌββββββββββΌββββββββββΌβββββββββ€ > > >> > > > > > > > > > > > > > > > βvalue-proβvalue-proβca4f5fe9aβ2 > > β128 > > >> > > > β2 > > >> > > > > > > > > β0 > > >> > > > > > > > > > > > > > > > β40726 β > > >> > > > > > > > > > > > > > > > βcess βcess-uid β637b656f0β > β > > >> > > β > > >> > > > > > > > β > > >> > > > > > > > > > > > > β > > >> > > > > > > > > > > > > > > > β > > >> > > > > > > > > > > > > > > > β β β9ea78b3e7β > β > > >> > > β > > >> > > > > > > > β > > >> > > > > > > > > > > > > β > > >> > > > > > > > > > > > > > > > β > > >> > > > > > > > > > > > > > > > β β βa15b9 β > β > > >> > > β > > >> > > > > > > > β > > >> > > > > > > > > > > > > β > > >> > > > > > > > > > > > > > > > β > > >> > > > > > > > > > > > > > > > > > >> > > > > > > > > > > > > > > > > > >> > > > > > > > > > > > > > > > > >> > > > > > > > > > > > > > > > >> > > > > > > > > > > > > > > >> > > > > > > > > > > > > > >> > > > > > > > > > > > > >> > > > > > > > > > > > >> > > > > > > > > > > >> > > > > > > > > > >> > > > > > > > > >> > > > > > > > >> > > > > > > >> > > > > > >> > > > > >> > > > >> > > > βββββββββββΌββββββββββΌββββββββββΌββββββββββΌββββββββββΌββββββββββΌββββββββββΌβββββββββ€ > > >> > > > > > > > > > > > > > > > > > >> > > > > > > > > > > > > > > > This table can then be joined with the > > >> actually > > >> > > > > > existing > > >> > > > > > > > > > > > `savepoint` > > >> > > > > > > > > > > > > > > > connector created tables based on UID > hash > > >> > (which > > >> > > > is > > >> > > > > > > unique > > >> > > > > > > > > and > > >> > > > > > > > > > > > > always > > >> > > > > > > > > > > > > > > > exists). > > >> > > > > > > > > > > > > > > > This would mean that the already > existing > > >> table > > >> > > > would > > >> > > > > > > need > > >> > > > > > > > > > only a > > >> > > > > > > > > > > > > > single > > >> > > > > > > > > > > > > > > > metadata column which is the UID hash. > > >> > > > > > > > > > > > > > > > WDYT? > > >> > > > > > > > > > > > > > > > @zakelly, plz share your thoughts too. > > >> > > > > > > > > > > > > > > > > > >> > > > > > > > > > > > > > > > > > >> > > > > > > > > > > > > > > > > If we opt to use metadata columns, > every > > >> > record > > >> > > > in > > >> > > > > > the > > >> > > > > > > > > table > > >> > > > > > > > > > > > would > > >> > > > > > > > > > > > > > end > > >> > > > > > > > > > > > > > > up > > >> > > > > > > > > > > > > > > > > having identical values for these > > columns > > >> > > (please > > >> > > > > > > correct > > >> > > > > > > > > me > > >> > > > > > > > > > if > > >> > > > > > > > > > > > Iβm > > >> > > > > > > > > > > > > > > > > mistaken). On the other hand, the > state > > >> > > connector > > >> > > > > > > > requires > > >> > > > > > > > > > > users > > >> > > > > > > > > > > > to > > >> > > > > > > > > > > > > > > > specify > > >> > > > > > > > > > > > > > > > > an operator UID or operator UID hash, > > >> after > > >> > > which > > >> > > > > it > > >> > > > > > > > > outputs > > >> > > > > > > > > > > > > > > user-defined > > >> > > > > > > > > > > > > > > > > values in its records. This approach > > feels > > >> > > > somewhat > > >> > > > > > > > > redundant > > >> > > > > > > > > > > to > > >> > > > > > > > > > > > > me. > > >> > > > > > > > > > > > > > > > > > > >> > > > > > > > > > > > > > > > > > >> > > > > > > > > > > > > > > > If we would add a new > `savepoint-metadata` > > >> > > > connector > > >> > > > > > then > > >> > > > > > > > > this > > >> > > > > > > > > > > can > > >> > > > > > > > > > > > be > > >> > > > > > > > > > > > > > > > addressed. > > >> > > > > > > > > > > > > > > > On the other hand UID and UID hash are > > >> having > > >> > > > > either-or > > >> > > > > > > > > > > > relationship > > >> > > > > > > > > > > > > > from > > >> > > > > > > > > > > > > > > > config perspective, > > >> > > > > > > > > > > > > > > > so when a user provides the UID then > > he/she > > >> can > > >> > > be > > >> > > > > > > > interested > > >> > > > > > > > > > in > > >> > > > > > > > > > > > the > > >> > > > > > > > > > > > > > hash > > >> > > > > > > > > > > > > > > > for further calculations > > >> > > > > > > > > > > > > > > > (the whole Flink internals are depending > > on > > >> the > > >> > > > > hash). > > >> > > > > > > > > Printing > > >> > > > > > > > > > > out > > >> > > > > > > > > > > > > the > > >> > > > > > > > > > > > > > > > human readable UID > > >> > > > > > > > > > > > > > > > is an explicit requirement from the user > > >> side > > >> > > > because > > >> > > > > > > > hashes > > >> > > > > > > > > > are > > >> > > > > > > > > > > > not > > >> > > > > > > > > > > > > > > human > > >> > > > > > > > > > > > > > > > readable. > > >> > > > > > > > > > > > > > > > > > >> > > > > > > > > > > > > > > > > > >> > > > > > > > > > > > > > > > > 3. Handling LIST and MAP States in the > > >> State > > >> > > > > > Connector > > >> > > > > > > > > > > > > > > > > I have concerns about how the current > > >> design > > >> > > > > handles > > >> > > > > > > LIST > > >> > > > > > > > > and > > >> > > > > > > > > > > MAP > > >> > > > > > > > > > > > > > > states. > > >> > > > > > > > > > > > > > > > > Specifically, the state connector uses > > >> Flink > > >> > > > SQLβs > > >> > > > > > MAP > > >> > > > > > > > and > > >> > > > > > > > > > > ARRAY > > >> > > > > > > > > > > > > > types, > > >> > > > > > > > > > > > > > > > > which implies that it attempts to load > > >> entire > > >> > > MAP > > >> > > > > or > > >> > > > > > > LIST > > >> > > > > > > > > > > states > > >> > > > > > > > > > > > > into > > >> > > > > > > > > > > > > > > > > memory. > > >> > > > > > > > > > > > > > > > > > > >> > > > > > > > > > > > > > > > > However, in many real-world scenarios, > > >> these > > >> > > > states > > >> > > > > > can > > >> > > > > > > > > grow > > >> > > > > > > > > > > very > > >> > > > > > > > > > > > > > > large. > > >> > > > > > > > > > > > > > > > > Typically, the state API addresses > this > > by > > >> > > > > providing > > >> > > > > > an > > >> > > > > > > > > > > iterator > > >> > > > > > > > > > > > to > > >> > > > > > > > > > > > > > > > > traverse elements within the state > > >> > > incrementally. > > >> > > > > Iβm > > >> > > > > > > > > unsure > > >> > > > > > > > > > > > > whether > > >> > > > > > > > > > > > > > > Iβve > > >> > > > > > > > > > > > > > > > > missed something in FLIP-496 or > > FLIP-512, > > >> but > > >> > > it > > >> > > > > > seems > > >> > > > > > > > that > > >> > > > > > > > > > the > > >> > > > > > > > > > > > > > current > > >> > > > > > > > > > > > > > > > > design might struggle with scalability > > in > > >> > such > > >> > > > > cases. > > >> > > > > > > > > > > > > > > > > > > >> > > > > > > > > > > > > > > > > > >> > > > > > > > > > > > > > > > You see it good, the current > > implementation > > >> > keeps > > >> > > > > state > > >> > > > > > > > for a > > >> > > > > > > > > > > > single > > >> > > > > > > > > > > > > > key > > >> > > > > > > > > > > > > > > in > > >> > > > > > > > > > > > > > > > memory. > > >> > > > > > > > > > > > > > > > Back in the days we've considered this > > >> > potential > > >> > > > > issue > > >> > > > > > > and > > >> > > > > > > > > > > > concluded > > >> > > > > > > > > > > > > > that > > >> > > > > > > > > > > > > > > > this is not necessarily > > >> > > > > > > > > > > > > > > > needed for the initial version and can > be > > >> done > > >> > > as a > > >> > > > > > later > > >> > > > > > > > > > > > > improvement. > > >> > > > > > > > > > > > > > > > > > >> > > > > > > > > > > > > > > > Up until now we've seen even in TB > > >> savepoints > > >> > > that > > >> > > > > the > > >> > > > > > > > number > > >> > > > > > > > > > of > > >> > > > > > > > > > > > keys > > >> > > > > > > > > > > > > > can > > >> > > > > > > > > > > > > > > > be extremely huge but not the per key > > state > > >> > > itself. > > >> > > > > > > > > > > > > > > > But again, this is a good feature as-is > > and > > >> can > > >> > > be > > >> > > > > > > handled > > >> > > > > > > > > in a > > >> > > > > > > > > > > > > > separate > > >> > > > > > > > > > > > > > > > jira. > > >> > > > > > > > > > > > > > > > > > >> > > > > > > > > > > > > > > > > > >> > > > > > > > > > > > > > > > > > > >> > > > > > > > > > > > > > > > > Best, > > >> > > > > > > > > > > > > > > > > Shengkai > > >> > > > > > > > > > > > > > > > > > > >> > > > > > > > > > > > > > > > > [1] > > >> > > > > > > > > > > > > >> > > https://www.postgresql.org/docs/current/view-pg-tables.html > > >> > > > > > > > > > > > > > > > > [2] > > >> > > > > > > > > > > > > > > > > > > >> > > > > > > > > > > > > > > > > > > >> > > > > > > > > > > > > > > > > > >> > > > > > > > > > > > > > > > > >> > > > > > > > > > > > > > > > >> > > > > > > > > > > > > > > >> > > > > > > > > > > > > > >> > > > > > > > > > > > > >> > > > > > > > > > > > >> > > > > > > > > > > >> > > > > > > > > > >> > > > > > > > > >> > > > > > > > >> > > > > > > >> > > > > > >> > > > > >> > > > >> > > > https://dev.mysql.com/doc/refman/8.4/en/information-schema-tables-table.html > > >> > > > > > > > > > > > > > > > > > > >> > > > > > > > > > > > > > > > > Gabor Somogyi < > > gabor.g.somo...@gmail.com> > > >> > > > > > δΊ2025εΉ΄3ζ3ζ₯ε¨δΈ > > >> > > > > > > > > > > 02:00ειοΌ > > >> > > > > > > > > > > > > > > > > > > >> > > > > > > > > > > > > > > > > > Hi Zakelly, > > >> > > > > > > > > > > > > > > > > > > > >> > > > > > > > > > > > > > > > > > In order to shoot for simplicity > > >> `METADATA > > >> > > > > VIRTUAL` > > >> > > > > > > as > > >> > > > > > > > > key > > >> > > > > > > > > > > > words > > >> > > > > > > > > > > > > > for > > >> > > > > > > > > > > > > > > > > > definition is the target. > > >> > > > > > > > > > > > > > > > > > When it's not super complex the > latter > > >> can > > >> > be > > >> > > > > added > > >> > > > > > > > too. > > >> > > > > > > > > > > > > > > > > > > > >> > > > > > > > > > > > > > > > > > BR, > > >> > > > > > > > > > > > > > > > > > G > > >> > > > > > > > > > > > > > > > > > > > >> > > > > > > > > > > > > > > > > > > > >> > > > > > > > > > > > > > > > > > On Sun, Mar 2, 2025 at 3:37β―PM > Zakelly > > >> Lan > > >> > < > > >> > > > > > > > > > > > > zakelly....@gmail.com> > > >> > > > > > > > > > > > > > > > > wrote: > > >> > > > > > > > > > > > > > > > > > > > >> > > > > > > > > > > > > > > > > > > Hi Gabor, > > >> > > > > > > > > > > > > > > > > > > > > >> > > > > > > > > > > > > > > > > > > +1 for this. > > >> > > > > > > > > > > > > > > > > > > > > >> > > > > > > > > > > > > > > > > > > Will the metadata column use > > `METADATA > > >> > > > VIRTUAL` > > >> > > > > > as > > >> > > > > > > > key > > >> > > > > > > > > > > words > > >> > > > > > > > > > > > > for > > >> > > > > > > > > > > > > > > > > > > definition, or `METADATA FROM xxx > > >> > VIRTUAL` > > >> > > > for > > >> > > > > > > > > renaming, > > >> > > > > > > > > > > just > > >> > > > > > > > > > > > > > like > > >> > > > > > > > > > > > > > > > the > > >> > > > > > > > > > > > > > > > > > > Kafka table? > > >> > > > > > > > > > > > > > > > > > > > > >> > > > > > > > > > > > > > > > > > > > > >> > > > > > > > > > > > > > > > > > > Best, > > >> > > > > > > > > > > > > > > > > > > Zakelly > > >> > > > > > > > > > > > > > > > > > > > > >> > > > > > > > > > > > > > > > > > > On Sat, Mar 1, 2025 at 1:31β―PM > Gabor > > >> > > Somogyi > > >> > > > < > > >> > > > > > > > > > > > > > > > > gabor.g.somo...@gmail.com> > > >> > > > > > > > > > > > > > > > > > > wrote: > > >> > > > > > > > > > > > > > > > > > > > > >> > > > > > > > > > > > > > > > > > > > Hi All, > > >> > > > > > > > > > > > > > > > > > > > > > >> > > > > > > > > > > > > > > > > > > > I'd like to start a discussion > of > > >> > > FLIP-512: > > >> > > > > Add > > >> > > > > > > > meta > > >> > > > > > > > > > > > > > information > > >> > > > > > > > > > > > > > > to > > >> > > > > > > > > > > > > > > > > SQL > > >> > > > > > > > > > > > > > > > > > > > state connector [1]. > > >> > > > > > > > > > > > > > > > > > > > Feel free to add your thoughts > to > > >> make > > >> > > this > > >> > > > > > > feature > > >> > > > > > > > > > > betterhttps://cwiki.apache.org/confluence/display/FLINK/FLIP-512%3A+Add+meta+information+to+SQL+state+connector