In the meantime I've just updated the FLIP according to this to be optimistic π
BR, G On Thu, Mar 27, 2025 at 2:15β―PM Gabor Somogyi <gabor.g.somo...@gmail.com> wrote: > Considering all the facts I also +1 on PTF. Even if something is missing > we can add later. > > @Zakelly Lan <zakelly....@gmail.com> @Shengkai Fang are you also on the > same page or have something to add? > > BR, > G > > > On Thu, Mar 27, 2025 at 1:50β―PM Lincoln Lee <lincoln.8...@gmail.com> > wrote: > >> +1 for PTF >> >> > Is it possible to describe such function to see the column names/types? >> >> Although Flink SQL does not directly support this feature, users can >> achieve >> similar results with the help of `explain` syntax, e.g. >> 'explain select * from read_state_metadata(...)' >> >> >> Best, >> Lincoln Lee >> >> >> Gyula FΓ³ra <gyula.f...@gmail.com> δΊ2025εΉ΄3ζ27ζ₯ε¨ε 20:41ειοΌ >> >> > Hey! >> > >> > I think the PTF approach strikes a great balance in simplicity and the >> > capabilities that we get out of it. >> > >> > I think this could be a completely viable alternative to the dedicated >> > connector, +1. >> > >> > Cheers, >> > Gyula >> > >> > On Thu, Mar 27, 2025 at 10:37β―AM Shengkai Fang <fskm...@gmail.com> >> wrote: >> > >> > > Hi, Gabor. >> > > >> > > > Do I understand correctly that this is 2.x only feature and we can't >> > > backport it to 1.x line >> > > >> > > Yes. PTF is only supported in 2.x verison. >> > > >> > > > Is it possible to describe such function to see the column >> names/types? >> > > >> > > Flink SQL doesn't support this feature, but postgres[2] or mysql[1] >> has >> > > similar feature. >> > > >> > > [1] >> https://dev.mysql.com/doc/refman/8.4/en/show-create-procedure.html >> > > [2] >> > > >> > > >> > >> https://stackoverflow.com/questions/6898453/show-the-code-of-a-function-procedure-and-trigger-in-postgresql >> > > >> > > Best, >> > > Shengkai >> > > >> > > >> > > Gabor Somogyi <gabor.g.somo...@gmail.com> δΊ2025εΉ΄3ζ27ζ₯ε¨ε 16:25ειοΌ >> > > >> > > > Hi Shengkai, >> > > > >> > > > Thanks for your effort with the example, this looks promising. >> > > > I like the fact that users wouldn't need to sweat with complex >> create >> > > table >> > > > statements. >> > > > >> > > > Couple of questions: >> > > > * Do I understand correctly that this is 2.x only feature and we >> can't >> > > > backport it to 1.x line? >> > > > I'm not intended to do any backport, just would like to know the >> > > technical >> > > > constraints. >> > > > * Is it possible to describe such function to see the column >> > names/types? >> > > > >> > > > BR, >> > > > G >> > > > >> > > > >> > > > On Thu, Mar 27, 2025 at 3:17β―AM Shengkai Fang <fskm...@gmail.com> >> > wrote: >> > > > >> > > > > Many thanks for your reminder, Leonard. Here's the link I >> > mentioned[1]. >> > > > > >> > > > > Best, >> > > > > Shengkai >> > > > > >> > > > > [1] https://github.com/apache/flink/pull/26358 >> > > > > >> > > > > Leonard Xu <xbjt...@gmail.com> δΊ2025εΉ΄3ζ27ζ₯ε¨ε 10:05ειοΌ >> > > > > >> > > > > > Your link is broken, Shengkai >> > > > > > >> > > > > > Best, >> > > > > > Leonard >> > > > > > >> > > > > > > 2025εΉ΄3ζ27ζ₯ 10:01οΌShengkai Fang <fskm...@gmail.com> ειοΌ >> > > > > > > >> > > > > > > Hi, All. >> > > > > > > >> > > > > > > I write a simple demo to illustrate my idea. Hope this helps. >> > > > > > > >> > > > > > > Best, >> > > > > > > Shengkai >> > > > > > > >> > > > > > > >> > > > > > >> > > > > >> > > > >> > > >> > >> https://github.com/apache/flink/compare/master...fsk119:flink:example?expand=1 >> > > > > > > >> > > > > > > Gabor Somogyi <gabor.g.somo...@gmail.com> δΊ2025εΉ΄3ζ26ζ₯ε¨δΈ >> 15:54ειοΌ >> > > > > > > >> > > > > > >>> I'm fine with a seperate SQL connector for metadata, so >> maybe >> > we >> > > > > could >> > > > > > >> update the FLIP about our discussion? >> > > > > > >> >> > > > > > >> Sorry, I've forgotten this part. Yeah, no matter we choose >> I'm >> > > going >> > > > > to >> > > > > > >> update the FLIP. >> > > > > > >> >> > > > > > >> G >> > > > > > >> >> > > > > > >> >> > > > > > >> On Wed, Mar 26, 2025 at 8:51β―AM Gabor Somogyi < >> > > > > > gabor.g.somo...@gmail.com> >> > > > > > >> wrote: >> > > > > > >> >> > > > > > >>> Hi All, >> > > > > > >>> >> > > > > > >>> I've also lack of the knowledge of PTF so I've read just the >> > > > > motivation >> > > > > > >>> part: >> > > > > > >>> >> > > > > > >>> "The SQL 2016 standard introduced a way of defining custom >> SQL >> > > > > > operators >> > > > > > >>> defined by ISO/IEC 19075-7:2021 (Part 7: Polymorphic table >> > > > > functions). >> > > > > > >>> ~200 pages define how this new kind of function can consume >> and >> > > > > produce >> > > > > > >>> tables with various execution properties. >> > > > > > >>> Unfortunately, this part of the standard is not publicly >> > > > available." >> > > > > > >>> >> > > > > > >>> Of course we can take a look at some examples but do we >> really >> > > want >> > > > > to >> > > > > > >>> expose state data with this construct >> > > > > > >>> which is described in ~200 pages and part of the standard is >> > not >> > > > > > publicly >> > > > > > >>> available? π >> > > > > > >>> I mean the dataset is couple of rows and the use-case is >> join >> > > with >> > > > > > >> another >> > > > > > >>> table like with state data. >> > > > > > >>> If somebody can give advantages I would buy that but from my >> > > > limited >> > > > > > >>> understanding this would be an overkill here. >> > > > > > >>> >> > > > > > >>> BR, >> > > > > > >>> G >> > > > > > >>> >> > > > > > >>> >> > > > > > >>> On Wed, Mar 26, 2025 at 8:28β―AM Gyula FΓ³ra < >> > gyula.f...@gmail.com >> > > > >> > > > > > wrote: >> > > > > > >>> >> > > > > > >>>> Hi Zakelly , Shengkai! >> > > > > > >>>> >> > > > > > >>>> I don't know too much about PTFs, it would be interesting >> to >> > see >> > > > how >> > > > > > the >> > > > > > >>>> usage would look in practice. >> > > > > > >>>> >> > > > > > >>>> Do you have some mockup/example in mind how the PTF would >> look >> > > for >> > > > > > >> example >> > > > > > >>>> when want to: >> > > > > > >>>> - Simply display/aggregate whats in the metadata >> > > > > > >>>> - Join keyed state with some metadata columns >> > > > > > >>>> >> > > > > > >>>> Thanks >> > > > > > >>>> Gyula >> > > > > > >>>> >> > > > > > >>>> On Wed, Mar 26, 2025 at 7:33β―AM Zakelly Lan < >> > > > zakelly....@gmail.com> >> > > > > > >>>> wrote: >> > > > > > >>>> >> > > > > > >>>>> Hi everyone, >> > > > > > >>>>> >> > > > > > >>>>> I'm fine with a seperate SQL connector for metadata, so >> maybe >> > > we >> > > > > > could >> > > > > > >>>>> update the FLIP about our discussion? And Shengkai >> provides a >> > > PTF >> > > > > > >>>>> implementation, does that also meet the requirement? >> > > > > > >>>>> >> > > > > > >>>>> >> > > > > > >>>>> Best, >> > > > > > >>>>> Zakelly >> > > > > > >>>>> >> > > > > > >>>>> On Thu, Mar 20, 2025 at 4:47β―PM Gabor Somogyi < >> > > > > > >>>> gabor.g.somo...@gmail.com> >> > > > > > >>>>> wrote: >> > > > > > >>>>> >> > > > > > >>>>>> Hi All, >> > > > > > >>>>>> >> > > > > > >>>>>> @Zakelly: Gyula summarised it correctly what I meant so >> > please >> > > > > treat >> > > > > > >>>> the >> > > > > > >>>>>> content as mine. >> > > > > > >>>>>> As an addition I'm not against to add CLI at all, I'm >> just >> > > > stating >> > > > > > >>>> that >> > > > > > >>>>> in >> > > > > > >>>>>> some cases like this, users would like to have >> > > > > > >>>>>> a self-serving solution where they can provide SQL >> > statements >> > > > > which >> > > > > > >>>> can >> > > > > > >>>>>> trigger alerts automatically. >> > > > > > >>>>>> >> > > > > > >>>>>> My personal opinion is that CLI would be beneficial for >> > > several >> > > > > > >>>> cases. A >> > > > > > >>>>>> good example is when users want to restart job >> > > > > > >>>>>> from specific Kafka offsets which are persisted in a >> > > savepoint. >> > > > > For >> > > > > > >>>> such >> > > > > > >>>>>> scenario users are more than happy since they >> > > > > > >>>>>> expect manual intervention with full control. So all in >> all >> > > one >> > > > > can >> > > > > > >>>> count >> > > > > > >>>>>> on my +1 when CLI FLIP would come up... >> > > > > > >>>>>> >> > > > > > >>>>>> BR, >> > > > > > >>>>>> G >> > > > > > >>>>>> >> > > > > > >>>>>> >> > > > > > >>>>>> On Thu, Mar 20, 2025 at 8:20β―AM Gyula FΓ³ra < >> > > > gyula.f...@gmail.com> >> > > > > > >>>> wrote: >> > > > > > >>>>>> >> > > > > > >>>>>>> Hi! >> > > > > > >>>>>>> >> > > > > > >>>>>>> @Zakelly Lan <zakelly....@gmail.com> >> > > > > > >>>>>>> I think what Gabor means is that users want to have >> > > predefined >> > > > > SQL >> > > > > > >>>>> scripts >> > > > > > >>>>>>> to perform state analysis tasks to debug/identify >> problems. >> > > > > > >>>>>>> Such as write a SQL script that joins the metadata table >> > with >> > > > the >> > > > > > >>>> state >> > > > > > >>>>>>> and >> > > > > > >>>>>>> do some analytics on it. >> > > > > > >>>>>>> >> > > > > > >>>>>>> If we have a meta table then the SQL script that can do >> > this >> > > is >> > > > > > >> fixed >> > > > > > >>>>> and >> > > > > > >>>>>>> users can trigger this on demand by simply providing a >> new >> > > > > > >> savepoint >> > > > > > >>>>> path. >> > > > > > >>>>>>> >> > > > > > >>>>>>> If we have a different mechanism to extract metadata >> that >> > is >> > > > not >> > > > > > >> SQL >> > > > > > >>>>>>> native >> > > > > > >>>>>>> then manual steps need to be executed and a custom SQL >> > script >> > > > > would >> > > > > > >>>> need >> > > > > > >>>>>>> to >> > > > > > >>>>>>> be written that adds the manually extracted metadata >> into >> > the >> > > > > > >> script. >> > > > > > >>>>>>> >> > > > > > >>>>>>> Cheers, >> > > > > > >>>>>>> Gyula >> > > > > > >>>>>>> >> > > > > > >>>>>>> On Thu, Mar 20, 2025 at 4:32β―AM Zakelly Lan < >> > > > > zakelly....@gmail.com >> > > > > > >>> >> > > > > > >>>>>>> wrote: >> > > > > > >>>>>>> >> > > > > > >>>>>>>> Hi all, >> > > > > > >>>>>>>> >> > > > > > >>>>>>>> Thanks for your answers! Getting everyone aligned on >> this >> > > > topic >> > > > > > >> is >> > > > > > >>>>>>>> challenging, but itβs definitely worth the effort >> since it >> > > > will >> > > > > > >>>> help >> > > > > > >>>>>>>> streamline things moving forward. >> > > > > > >>>>>>>> >> > > > > > >>>>>>>> @Gabor are you saying that users are using some >> scripts to >> > > > > define >> > > > > > >>>> the >> > > > > > >>>>>>> SQL >> > > > > > >>>>>>>> metadata connector and get the information, right? If >> so, >> > > > would >> > > > > a >> > > > > > >>>> CLI >> > > > > > >>>>>>> tool >> > > > > > >>>>>>>> be more convenient? It's easy to invoke and can get the >> > > result >> > > > > > >>>>> swiftly. >> > > > > > >>>>>>> And >> > > > > > >>>>>>>> there should be some other systems to track the >> checkpoint >> > > > > > >> lineage >> > > > > > >>>> and >> > > > > > >>>>>>>> analyze if there are outliers in metadata (e.g. state >> size >> > > of >> > > > > one >> > > > > > >>>>>>> operator) >> > > > > > >>>>>>>> right? Well, maybe I missed something so please >> correct me >> > > if >> > > > > I'm >> > > > > > >>>>> wrong. >> > > > > > >>>>>>>> >> > > > > > >>>>>>>> I think the overall vision in Flink SQL is to provide a >> > SQL >> > > > > > >> native >> > > > > > >>>>>>>>> environment where we can serve complex use-cases like >> you >> > > > would >> > > > > > >>>>> expect >> > > > > > >>>>>>>> in a >> > > > > > >>>>>>>>> regular database. >> > > > > > >>>>>>>> >> > > > > > >>>>>>>> >> > > > > > >>>>>>>> @Gyula Well, this is a good point. From the >> perspective of >> > > > > > >>>>> comprehensive >> > > > > > >>>>>>>> SQL experience, I'd +1 for treating metadata as data. >> > > > Although I >> > > > > > >>>> doubt >> > > > > > >>>>>>> if >> > > > > > >>>>>>>> there is a need for processing metadata, I won't be >> > against >> > > a >> > > > > > >>>> separate >> > > > > > >>>>>>>> connector. >> > > > > > >>>>>>>> >> > > > > > >>>>>>>> Regarding the CLI tool, I still think itβs worth >> > > implementing. >> > > > > > >>>> Such a >> > > > > > >>>>>>> tool >> > > > > > >>>>>>>> could provide savepoint information before resuming >> from a >> > > > > > >>>> savepoint, >> > > > > > >>>>>>> which >> > > > > > >>>>>>>> would enhance the user experience in CLI-based >> workflows. >> > It >> > > > > > >> would >> > > > > > >>>> be >> > > > > > >>>>>>> good >> > > > > > >>>>>>>> if someone could implement this feature. We shouldnβt >> > worry >> > > > > about >> > > > > > >>>>>>> whether >> > > > > > >>>>>>>> this tool might be retired in the future. Regardless of >> > the >> > > > > > >>>> SQL-based >> > > > > > >>>>>>>> solution we eventually adopt, this capability will >> remain >> > > > > > >> essential >> > > > > > >>>>> for >> > > > > > >>>>>>> CLI >> > > > > > >>>>>>>> users. This is another topic. >> > > > > > >>>>>>>> >> > > > > > >>>>>>>> >> > > > > > >>>>>>>> Best, >> > > > > > >>>>>>>> Zakelly >> > > > > > >>>>>>>> >> > > > > > >>>>>>>> >> > > > > > >>>>>>>> On Thu, Mar 20, 2025 at 10:37β―AM Shengkai Fang < >> > > > > > >> fskm...@gmail.com> >> > > > > > >>>>>>> wrote: >> > > > > > >>>>>>>> >> > > > > > >>>>>>>>> Hi. >> > > > > > >>>>>>>>> >> > > > > > >>>>>>>>> After reading the doc[1], I think Spark provides a >> > function >> > > > for >> > > > > > >>>>> users >> > > > > > >>>>>>> to >> > > > > > >>>>>>>>> consume the metadata from the savepoint. In Flink >> SQL, >> > > > similar >> > > > > > >>>>>>>>> functionality is implemented through Polymorphic Table >> > > > > > >> Functions >> > > > > > >>>>>>> (PTF) as >> > > > > > >>>>>>>>> proposed in FLIP-440[2]. Below is a code example[3] >> > > > > > >> illustrating >> > > > > > >>>>> this >> > > > > > >>>>>>>>> concept: >> > > > > > >>>>>>>>> >> > > > > > >>>>>>>>> ``` >> > > > > > >>>>>>>>> public static class ScalarArgsFunction extends >> > > > > > >>>>>>>>> TestProcessTableFunctionBase { >> > > > > > >>>>>>>>> public void eval(Integer i, Boolean b) { >> > > > > > >>>>>>>>> collectObjects(i, b); >> > > > > > >>>>>>>>> } >> > > > > > >>>>>>>>> } >> > > > > > >>>>>>>>> ``` >> > > > > > >>>>>>>>> >> > > > > > >>>>>>>>> ``` >> > > > > > >>>>>>>>> INSERT INTO sink SELECT * FROM f(i => 42, b => >> > CAST('TRUE' >> > > AS >> > > > > > >>>>>>> BOOLEAN)) >> > > > > > >>>>>>>>> `` >> > > > > > >>>>>>>>> >> > > > > > >>>>>>>>> So we can add a builtin function named >> > > `read_state_metadata` >> > > > to >> > > > > > >>>> read >> > > > > > >>>>>>>>> savepoint data. >> > > > > > >>>>>>>>> >> > > > > > >>>>>>>>> Best, >> > > > > > >>>>>>>>> Shengkai >> > > > > > >>>>>>>>> >> > > > > > >>>>>>>>> [1] >> > > > > > >>>>>>>>> >> > > > > > >>>>>>>>> >> > > > > > >>>>>>>> >> > > > > > >>>>>>> >> > > > > > >>>>> >> > > > > > >>>> >> > > > > > >> >> > > > > > >> > > > > >> > > > >> > > >> > >> https://docs.databricks.com/aws/en/structured-streaming/read-state?language=SQL >> > > > > > >>>>>>>>> [2] >> > > > > > >>>>>>>>> >> > > > > > >>>>>>>> >> > > > > > >>>>>>> >> > > > > > >>>>> >> > > > > > >>>> >> > > > > > >> >> > > > > > >> > > > > >> > > > >> > > >> > >> https://cwiki.apache.org/confluence/pages/viewpage.action?pageId=298781093 >> > > > > > >>>>>>>>> [3] >> > > > > > >>>>>>>>> >> > > > > > >>>>>>>>> >> > > > > > >>>>>>>> >> > > > > > >>>>>>> >> > > > > > >>>>> >> > > > > > >>>> >> > > > > > >> >> > > > > > >> > > > > >> > > > >> > > >> > >> https://github.com/apache/flink/blob/master/flink-table/flink-table-planner/src/test/java/org/apache/flink/table/planner/plan/nodes/exec/stream/ProcessTableFunctionTestPrograms.java#L140 >> > > > > > >>>>>>>>> >> > > > > > >>>>>>>>> Gyula FΓ³ra <gyula.f...@gmail.com> δΊ2025εΉ΄3ζ19ζ₯ε¨δΈ >> 18:37ειοΌ >> > > > > > >>>>>>>>> >> > > > > > >>>>>>>>>> Hi All! >> > > > > > >>>>>>>>>> >> > > > > > >>>>>>>>>> Thank you for the answers and concerns from everyone. >> > > > > > >>>>>>>>>> >> > > > > > >>>>>>>>>> On the CLI vs State Metadata Connector/Table >> question I >> > > > would >> > > > > > >>>> also >> > > > > > >>>>>>> like >> > > > > > >>>>>>>>> to >> > > > > > >>>>>>>>>> step back a little and look at the bigger picture. >> > > > > > >>>>>>>>>> >> > > > > > >>>>>>>>>> I think the overall vision in Flink SQL is to >> provide a >> > > SQL >> > > > > > >>>> native >> > > > > > >>>>>>>>>> environment where we can serve complex use-cases like >> > you >> > > > > > >> would >> > > > > > >>>>>>> expect >> > > > > > >>>>>>>>> in a >> > > > > > >>>>>>>>>> regular database. >> > > > > > >>>>>>>>>> Most features, developments in the recent years have >> > gone >> > > > > > >> this >> > > > > > >>>>> way. >> > > > > > >>>>>>>>>> >> > > > > > >>>>>>>>>> The State Metadata Table would be a natural and >> > > > > > >> straightforward >> > > > > > >>>>> fit >> > > > > > >>>>>>>> here. >> > > > > > >>>>>>>>>> So from my side, +1 for that. >> > > > > > >>>>>>>>>> >> > > > > > >>>>>>>>>> However I could understand if we are not ready to >> add a >> > > new >> > > > > > >>>>>>>>>> connector/format due to maintenance concerns (and in >> > > general >> > > > > > >>>>> concern >> > > > > > >>>>>>>>> about >> > > > > > >>>>>>>>>> the design). >> > > > > > >>>>>>>>>> If that's the issue then we should spend more time on >> > the >> > > > > > >>>> design >> > > > > > >>>>> to >> > > > > > >>>>>>> get >> > > > > > >>>>>>>>>> comfortable with the approach and seek feedback from >> the >> > > > > > >> wider >> > > > > > >>>>>>>> community >> > > > > > >>>>>>>>>> >> > > > > > >>>>>>>>>> I am -1 for the CLI/tooling approach as that will not >> > > > provide >> > > > > > >>>> the >> > > > > > >>>>>>>>>> featureset we are looking for that is not already >> > covered >> > > by >> > > > > > >>>> the >> > > > > > >>>>>>> Java >> > > > > > >>>>>>>>>> connector. And that approach would come with the same >> > > > > > >>>> maintenance >> > > > > > >>>>>>>>>> implications. >> > > > > > >>>>>>>>>> >> > > > > > >>>>>>>>>> Cheers >> > > > > > >>>>>>>>>> Gyula >> > > > > > >>>>>>>>>> >> > > > > > >>>>>>>>>> >> > > > > > >>>>>>>>>> On Wed, Mar 19, 2025 at 11:24β―AM Gabor Somogyi < >> > > > > > >>>>>>>>> gabor.g.somo...@gmail.com> >> > > > > > >>>>>>>>>> wrote: >> > > > > > >>>>>>>>>> >> > > > > > >>>>>>>>>>> Hi Zaklely, Shengkai >> > > > > > >>>>>>>>>>> >> > > > > > >>>>>>>>>>> Several topics are going on so adding gist answers >> to >> > > them. >> > > > > > >>>> When >> > > > > > >>>>>>> some >> > > > > > >>>>>>>>>> topic >> > > > > > >>>>>>>>>>> is not touched please highlight it. >> > > > > > >>>>>>>>>>> >> > > > > > >>>>>>>>>>> @Shengkai: I've read through all the previous FLIPs >> > > related >> > > > > > >>>>>>> catalogs >> > > > > > >>>>>>>>> and >> > > > > > >>>>>>>>>> if >> > > > > > >>>>>>>>>>> we would like to keep the concepts there >> > > > > > >>>>>>>>>>> then one-to-one mapping relationship between >> savepoint >> > > and >> > > > > > >>>>> catalog >> > > > > > >>>>>>>> is a >> > > > > > >>>>>>>>>>> reasonable direction. In short I'm happy that >> > > > > > >>>>>>>>>>> you've highlighted this and agree as a whole. I've >> > > written >> > > > > > >> it >> > > > > > >>>>> down >> > > > > > >>>>>>>>>>> previously, just want to double confirm that state >> > > catalog >> > > > > > >> is >> > > > > > >>>>>>>>>>> essential and planned. When we reach this point then >> > your >> > > > > > >>>> input >> > > > > > >>>>> is >> > > > > > >>>>>>>> more >> > > > > > >>>>>>>>>>> than welcome. >> > > > > > >>>>>>>>>>> >> > > > > > >>>>>>>>>>> @Zakelly: We've tried the CLI and separate library >> > > > > > >> approaches >> > > > > > >>>>> with >> > > > > > >>>>>>>>> users >> > > > > > >>>>>>>>>>> already and these are not something which is welcome >> > > > > > >> because >> > > > > > >>>> of >> > > > > > >>>>>>> the >> > > > > > >>>>>>>>>>> following: >> > > > > > >>>>>>>>>>> * Users want to have automated tasks and not manual >> > > > > > >>>> CLI/library >> > > > > > >>>>>>>> output >> > > > > > >>>>>>>>>>> parsing. This can be hacked around but our >> experience >> > is >> > > > > > >>>>> negative >> > > > > > >>>>>>> on >> > > > > > >>>>>>>>> this >> > > > > > >>>>>>>>>>> because it's just brittle. >> > > > > > >>>>>>>>>>> * From development perspective It's way much bigger >> > > effort >> > > > > > >>>> than >> > > > > > >>>>> a >> > > > > > >>>>>>>>>> connector >> > > > > > >>>>>>>>>>> (hard to test, packaging/version handling is and >> extra >> > > > > > >> layer >> > > > > > >>>> of >> > > > > > >>>>>>>>>> complexity, >> > > > > > >>>>>>>>>>> external FS authentication is pain for users, >> expecting >> > > > > > >> them >> > > > > > >>>> to >> > > > > > >>>>>>>>> download >> > > > > > >>>>>>>>>>> savepoints also) >> > > > > > >>>>>>>>>>> * Purely personal opinion but if we would find >> better >> > > ways >> > > > > > >>>> later >> > > > > > >>>>>>> then >> > > > > > >>>>>>>>>>> retire a CLI is not more lightweight than retire a >> > > > > > >> connector >> > > > > > >>>>>>>>>>> >> > > > > > >>>>>>>>>>>> It would be great if you give some examples on how >> > user >> > > > > > >>>> could >> > > > > > >>>>>>>>> leverage >> > > > > > >>>>>>>>>>> the separate connector to process the metadata. >> > > > > > >>>>>>>>>>> >> > > > > > >>>>>>>>>>> The most simplest cases: >> > > > > > >>>>>>>>>>> * give me the overgroving state uids >> > > > > > >>>>>>>>>>> * give me the not known (new or renamed) state uids >> > > > > > >>>>>>>>>>> * give me the state uids where state size >> drastically >> > > > > > >> dropped >> > > > > > >>>>>>> compare >> > > > > > >>>>>>>>> to >> > > > > > >>>>>>>>>> a >> > > > > > >>>>>>>>>>> previous savepoint (accidental state loss) >> > > > > > >>>>>>>>>>> >> > > > > > >>>>>>>>>>> Since it was mentioned: as a general offtopic >> teaser, >> > > yeah >> > > > > > >> it >> > > > > > >>>>>>> would >> > > > > > >>>>>>>> be >> > > > > > >>>>>>>>>> good >> > > > > > >>>>>>>>>>> to have some sort of checkpoint/savepoint lineage or >> > > > > > >> however >> > > > > > >>>> we >> > > > > > >>>>>>> call >> > > > > > >>>>>>>>> it. >> > > > > > >>>>>>>>>>> Since we've not yet reached this point there are no >> > > > > > >> technical >> > > > > > >>>>>>>> details, >> > > > > > >>>>>>>>>> it's >> > > > > > >>>>>>>>>>> more like a vision. It's a common pattern that >> > > > > > >>>>>>>>>>> jobs are physically running but somehow the state >> > > > > > >> processing >> > > > > > >>>> is >> > > > > > >>>>>>> stuck >> > > > > > >>>>>>>>> and >> > > > > > >>>>>>>>>>> it would be good to add some way to find it out >> > > > > > >>>> automatically. >> > > > > > >>>>>>>>>>> The important saying here is automation and not >> manual >> > > > > > >>>>> evaluation >> > > > > > >>>>>>>> since >> > > > > > >>>>>>>>>>> handling 10k+ jobs is just not allowing that. >> > > > > > >>>>>>>>>>> >> > > > > > >>>>>>>>>>> BR, >> > > > > > >>>>>>>>>>> G >> > > > > > >>>>>>>>>>> >> > > > > > >>>>>>>>>>> >> > > > > > >>>>>>>>>>> On Wed, Mar 19, 2025 at 6:46β―AM Shengkai Fang < >> > > > > > >>>>> fskm...@gmail.com> >> > > > > > >>>>>>>>> wrote: >> > > > > > >>>>>>>>>>> >> > > > > > >>>>>>>>>>>> Hi, All. >> > > > > > >>>>>>>>>>>> >> > > > > > >>>>>>>>>>>> About State Catalog, I want to share more thoughts >> > about >> > > > > > >>>> this. >> > > > > > >>>>>>>>>>>> >> > > > > > >>>>>>>>>>>> In the initial design concept, I understood that a >> > > > > > >>>> savepoint >> > > > > > >>>>>>> and a >> > > > > > >>>>>>>>>> state >> > > > > > >>>>>>>>>>>> catalog have a one-to-one mapping relationship. >> Each >> > > > > > >>>> operator >> > > > > > >>>>>>>>>> corresponds >> > > > > > >>>>>>>>>>>> to a database, and the state of each operator is >> > > > > > >>>> represented >> > > > > > >>>>> as >> > > > > > >>>>>>>>>>> individual >> > > > > > >>>>>>>>>>>> tables. The rationale behind this design is: >> > > > > > >>>>>>>>>>>> >> > > > > > >>>>>>>>>>>> *State Diversity*: An operator may involve multiple >> > > types >> > > > > > >>>> of >> > > > > > >>>>>>>> states. >> > > > > > >>>>>>>>>> For >> > > > > > >>>>>>>>>>>> example, in our VVR design, a "multi-join" operator >> > uses >> > > > > > >>>> keyed >> > > > > > >>>>>>>> states >> > > > > > >>>>>>>>>> for >> > > > > > >>>>>>>>>>>> two input streams and a broadcast state for the >> third >> > > > > > >>>> stream. >> > > > > > >>>>>>> This >> > > > > > >>>>>>>>>> makes >> > > > > > >>>>>>>>>>> it >> > > > > > >>>>>>>>>>>> challenging to represent all states of an operator >> > > > > > >> within a >> > > > > > >>>>>>> single >> > > > > > >>>>>>>>>> table. >> > > > > > >>>>>>>>>>>> *Scalability*: Internally, an operator might have >> > > > > > >> multiple >> > > > > > >>>>> keyed >> > > > > > >>>>>>>>> states >> > > > > > >>>>>>>>>>>> (e.g., value state and list state). However, large >> > list >> > > > > > >>>> states >> > > > > > >>>>>>> may >> > > > > > >>>>>>>>> not >> > > > > > >>>>>>>>>>> fit >> > > > > > >>>>>>>>>>>> entirely in memory. To address this, we recommend >> > > > > > >>>> implementing >> > > > > > >>>>>>> each >> > > > > > >>>>>>>>>> state >> > > > > > >>>>>>>>>>>> as a separate table. >> > > > > > >>>>>>>>>>>> >> > > > > > >>>>>>>>>>>> To resolve the loosely coupled relationships >> between >> > > > > > >>>> operator >> > > > > > >>>>>>>> states, >> > > > > > >>>>>>>>>> we >> > > > > > >>>>>>>>>>>> propose embedding predefined views within the >> catalog. >> > > > > > >>>> These >> > > > > > >>>>>>> views >> > > > > > >>>>>>>>>>> simplify >> > > > > > >>>>>>>>>>>> user understanding of operator implementations and >> > > > > > >> provide >> > > > > > >>>> a >> > > > > > >>>>>>> more >> > > > > > >>>>>>>>>>> intuitive >> > > > > > >>>>>>>>>>>> perspective. For instance, a join operator may have >> > > > > > >>>> multiple >> > > > > > >>>>>>> state >> > > > > > >>>>>>>>>>>> implementations (depending on whether the join key >> > > > > > >> includes >> > > > > > >>>>>>> unique >> > > > > > >>>>>>>>>>>> attributes), but users primarily care about the >> data >> > > > > > >>>>> associated >> > > > > > >>>>>>>> with >> > > > > > >>>>>>>>> a >> > > > > > >>>>>>>>>>>> specific join key across input streams. >> > > > > > >>>>>>>>>>>> >> > > > > > >>>>>>>>>>>> Returning to the one-to-one mapping between >> savepoints >> > > > > > >> and >> > > > > > >>>>>>>> catalogs, >> > > > > > >>>>>>>>> we >> > > > > > >>>>>>>>>>> aim >> > > > > > >>>>>>>>>>>> to manage multiple user state catalogs through a >> > catalog >> > > > > > >>>>> store. >> > > > > > >>>>>>>> When >> > > > > > >>>>>>>>> a >> > > > > > >>>>>>>>>>> user >> > > > > > >>>>>>>>>>>> triggers a savepoint for a job on the platform: >> > > > > > >>>>>>>>>>>> >> > > > > > >>>>>>>>>>>> 1. The platform sends a REST request to the >> > JobManager. >> > > > > > >>>>>>>>>>>> 2. Simultaneously, it registers a new state >> catalog in >> > > > > > >> the >> > > > > > >>>>>>> catalog >> > > > > > >>>>>>>>>> store, >> > > > > > >>>>>>>>>>>> enabling immediate analysis of state data on the >> > > > > > >> platform. >> > > > > > >>>>>>>>>>>> 3. Deleting a savepoint would also trigger the >> removal >> > > of >> > > > > > >>>> its >> > > > > > >>>>>>>>>> associated >> > > > > > >>>>>>>>>>>> catalog. >> > > > > > >>>>>>>>>>>> >> > > > > > >>>>>>>>>>>> This vision assumes that states are >> self-describing or >> > > > > > >>>> that a >> > > > > > >>>>>>> state >> > > > > > >>>>>>>>>>>> metaservice is introduced to analyze savepoint >> > > > > > >> structures. >> > > > > > >>>>>>>>>>>> >> > > > > > >>>>>>>>>>>>> How can users create logic to identify differences >> > > > > > >>>> between >> > > > > > >>>>>>>> multiple >> > > > > > >>>>>>>>>>>> savepoints? >> > > > > > >>>>>>>>>>>> >> > > > > > >>>>>>>>>>>> Since savepoints and state catalogs are one-to-one >> > > > > > >> mapped, >> > > > > > >>>>> users >> > > > > > >>>>>>>> can >> > > > > > >>>>>>>>>>> query >> > > > > > >>>>>>>>>>>> metadata via their respective catalogs. For >> example: >> > > > > > >>>>>>>>>>>> >> > > > > > >>>>>>>>>>>> 1. >> > > > > > >>>>> >> `savepoint-${id}`.`system`.`metadata_table`.`<operator-name>` >> > > > > > >>>>>>>>>> provides >> > > > > > >>>>>>>>>>>> operator-specific metadata (e.g., state size, >> type). >> > > > > > >>>>>>>>>>>> 2. Comparing metadata tables (e.g., schema >> versions, >> > > > > > >> state >> > > > > > >>>>> entry >> > > > > > >>>>>>>>>> counts) >> > > > > > >>>>>>>>>>>> across catalogs reveals structural or quantitative >> > > > > > >>>>> differences. >> > > > > > >>>>>>>>>>>> 3. For deeper analysis, users could write SQL >> queries >> > to >> > > > > > >>>>> compare >> > > > > > >>>>>>>>>> specific >> > > > > > >>>>>>>>>>>> state partitions or leverage the metaservice to >> track >> > > > > > >> state >> > > > > > >>>>>>>> evolution >> > > > > > >>>>>>>>>>>> (e.g., added/removed operators, modified state >> > > > > > >>>>> configurations). >> > > > > > >>>>>>>>>>>> >> > > > > > >>>>>>>>>>>> If we plan to introduce a state catalog in the >> > future, I >> > > > > > >>>> would >> > > > > > >>>>>>> lean >> > > > > > >>>>>>>>>>> toward >> > > > > > >>>>>>>>>>>> using metadata tables. If a utility tool can >> address >> > the >> > > > > > >>>>>>> challenges >> > > > > > >>>>>>>>> we >> > > > > > >>>>>>>>>>>> face, could we avoid introducing an additional >> > > connector? >> > > > > > >>>>>>>>>>>> >> > > > > > >>>>>>>>>>>> Best, >> > > > > > >>>>>>>>>>>> Shengkai >> > > > > > >>>>>>>>>>>> >> > > > > > >>>>>>>>>>>> Gyula FΓ³ra <gyula.f...@gmail.com> δΊ2025εΉ΄3ζ17ζ₯ε¨δΈ >> > > 20:25ειοΌ >> > > > > > >>>>>>>>>>>> >> > > > > > >>>>>>>>>>>>> Hi All! >> > > > > > >>>>>>>>>>>>> >> > > > > > >>>>>>>>>>>>> Without going into too much detail here are my 2 >> > cents >> > > > > > >>>>>>> regarding >> > > > > > >>>>>>>>> the >> > > > > > >>>>>>>>>>>>> virtual column / catalog metadata / table >> (connector) >> > > > > > >>>>>>> discussion >> > > > > > >>>>>>>>> for >> > > > > > >>>>>>>>>>> the >> > > > > > >>>>>>>>>>>>> State metadata. >> > > > > > >>>>>>>>>>>>> >> > > > > > >>>>>>>>>>>>> State metadata such as the types of states, their >> > > > > > >>>>> properties, >> > > > > > >>>>>>>>> names, >> > > > > > >>>>>>>>>>>> sizes >> > > > > > >>>>>>>>>>>>> etc are all valuable information that can be used >> to >> > > > > > >>>> enrich >> > > > > > >>>>>>> the >> > > > > > >>>>>>>>>>>>> computations we do on state. >> > > > > > >>>>>>>>>>>>> We can either analyze it standalone (such as >> discover >> > > > > > >>>>>>> anomalies, >> > > > > > >>>>>>>>> for >> > > > > > >>>>>>>>>>>> large >> > > > > > >>>>>>>>>>>>> jobs with many states), across multiple savepoints >> > > > > > >>>> (discover >> > > > > > >>>>>>> how >> > > > > > >>>>>>>>>> state >> > > > > > >>>>>>>>>>>>> changed over time) or by joining it with keyed or >> > > > > > >>>> non-keyed >> > > > > > >>>>>>> state >> > > > > > >>>>>>>>>> data >> > > > > > >>>>>>>>>>> to >> > > > > > >>>>>>>>>>>>> serve more complex queries on the state. >> > > > > > >>>>>>>>>>>>> >> > > > > > >>>>>>>>>>>>> The only solution that seems to serve all these >> > > > > > >> use-cases >> > > > > > >>>>> and >> > > > > > >>>>>>>>>>>> requirements >> > > > > > >>>>>>>>>>>>> in a straightforward and SQL canonical way is to >> > simply >> > > > > > >>>>> expose >> > > > > > >>>>>>>> the >> > > > > > >>>>>>>>>>> state >> > > > > > >>>>>>>>>>>>> metadata as a separate table. This is a metadata >> > table >> > > > > > >>>> but >> > > > > > >>>>> you >> > > > > > >>>>>>>> can >> > > > > > >>>>>>>>>> also >> > > > > > >>>>>>>>>>>>> think of it as data table, it makes no practical >> > > > > > >>>> difference >> > > > > > >>>>>>> here. >> > > > > > >>>>>>>>>>>>> >> > > > > > >>>>>>>>>>>>> Once we have a catalog later, the catalog can >> offer >> > > > > > >> this >> > > > > > >>>>> table >> > > > > > >>>>>>>> out >> > > > > > >>>>>>>>> of >> > > > > > >>>>>>>>>>> the >> > > > > > >>>>>>>>>>>>> box, the same way databases provide metadata >> tables. >> > > > > > >> For >> > > > > > >>>>> this >> > > > > > >>>>>>> to >> > > > > > >>>>>>>>> work >> > > > > > >>>>>>>>>>>>> however we need another, simpler connector that >> > creates >> > > > > > >>>> this >> > > > > > >>>>>>>> table. >> > > > > > >>>>>>>>>>>>> >> > > > > > >>>>>>>>>>>>> +1 for state metadata as a separate >> connector/table, >> > > > > > >>>> instead >> > > > > > >>>>>>> of >> > > > > > >>>>>>>>>> adding >> > > > > > >>>>>>>>>>>>> virtual columns and adhoc catalog metadata that is >> > hard >> > > > > > >>>> to >> > > > > > >>>>> use >> > > > > > >>>>>>>> in a >> > > > > > >>>>>>>>>>> large >> > > > > > >>>>>>>>>>>>> number of queries. >> > > > > > >>>>>>>>>>>>> >> > > > > > >>>>>>>>>>>>> Cheers, >> > > > > > >>>>>>>>>>>>> Gyula >> > > > > > >>>>>>>>>>>>> >> > > > > > >>>>>>>>>>>>> On Mon, Mar 17, 2025 at 12:44β―PM Gabor Somogyi < >> > > > > > >>>>>>>>>>>> gabor.g.somo...@gmail.com> >> > > > > > >>>>>>>>>>>>> wrote: >> > > > > > >>>>>>>>>>>>> >> > > > > > >>>>>>>>>>>>>> 1. State TTL for Value Columns >> > > > > > >>>>>>>>>>>>>> >> > > > > > >>>>>>>>>>>>>>> Iβm planning on adding this, and we may >> collaborate >> > > > > > >>>> on >> > > > > > >>>>> it >> > > > > > >>>>>>> in >> > > > > > >>>>>>>>> the >> > > > > > >>>>>>>>>>>>> future. >> > > > > > >>>>>>>>>>>>>> >> > > > > > >>>>>>>>>>>>>> +1 on this, just ping me. >> > > > > > >>>>>>>>>>>>>> >> > > > > > >>>>>>>>>>>>>> 2. Metadata Table vs. Metadata Column >> > > > > > >>>>>>>>>>>>>> >> > > > > > >>>>>>>>>>>>>> After some code digging and POC all I can say >> that >> > > > > > >> with >> > > > > > >>>>>>> heavy >> > > > > > >>>>>>>>>> effort >> > > > > > >>>>>>>>>>> we >> > > > > > >>>>>>>>>>>>> can >> > > > > > >>>>>>>>>>>>>> maybe add such changes that we're able to show >> > > > > > >> metadata >> > > > > > >>>>> of a >> > > > > > >>>>>>>>>>> savepoint >> > > > > > >>>>>>>>>>>>> from >> > > > > > >>>>>>>>>>>>>> catalog. >> > > > > > >>>>>>>>>>>>>> I'm not against that but from user perspective >> this >> > > > > > >> has >> > > > > > >>>>>>> limited >> > > > > > >>>>>>>>>>> value, >> > > > > > >>>>>>>>>>>>> let >> > > > > > >>>>>>>>>>>>>> me explain why. >> > > > > > >>>>>>>>>>>>>> >> > > > > > >>>>>>>>>>>>>> From high level perspective I see the following >> > > > > > >> which I >> > > > > > >>>>> see >> > > > > > >>>>>>>>>> agreement >> > > > > > >>>>>>>>>>>> on: >> > > > > > >>>>>>>>>>>>>> * We should have a catalog which is representing >> one >> > > > > > >> or >> > > > > > >>>>> more >> > > > > > >>>>>>>> jobs >> > > > > > >>>>>>>>>>>>> savepoint >> > > > > > >>>>>>>>>>>>>> data set (future plan) >> > > > > > >>>>>>>>>>>>>> * Savepoints should be able to be registered in >> the >> > > > > > >>>>> catalog >> > > > > > >>>>>>>> which >> > > > > > >>>>>>>>>> are >> > > > > > >>>>>>>>>>>>> then >> > > > > > >>>>>>>>>>>>>> databases (future plan) >> > > > > > >>>>>>>>>>>>>> * There must be a possiblity to create tables >> from >> > > > > > >>>>> databases >> > > > > > >>>>>>>>> where >> > > > > > >>>>>>>>>>>> users >> > > > > > >>>>>>>>>>>>>> can read state data (exists already) >> > > > > > >>>>>>>>>>>>>> >> > > > > > >>>>>>>>>>>>>> In terms of metadata, If I understand correctly >> then >> > > > > > >>>> the >> > > > > > >>>>>>>>> suggested >> > > > > > >>>>>>>>>>>>> approach >> > > > > > >>>>>>>>>>>>>> would be to access >> > > > > > >>>>>>>>>>>>>> it from the catalog describe command, right? >> Adding >> > > > > > >>>> that >> > > > > > >>>>>>> info >> > > > > > >>>>>>>>> when >> > > > > > >>>>>>>>>>>>> specific >> > > > > > >>>>>>>>>>>>>> database describe command >> > > > > > >>>>>>>>>>>>>> is executed could be done. >> > > > > > >>>>>>>>>>>>>> >> > > > > > >>>>>>>>>>>>>> The question is for instance how can users create >> > > > > > >> such >> > > > > > >>>> a >> > > > > > >>>>>>> logic >> > > > > > >>>>>>>>> that >> > > > > > >>>>>>>>>>>> tells >> > > > > > >>>>>>>>>>>>>> them what is >> > > > > > >>>>>>>>>>>>>> the difference between multiple savepoints? >> > > > > > >>>>>>>>>>>>>> Just to give some examples: >> > > > > > >>>>>>>>>>>>>> * per operator size changes between savepoints >> > > > > > >>>>>>>>>>>>>> * show values from operator data where state size >> > > > > > >>>> reaches >> > > > > > >>>>> a >> > > > > > >>>>>>>>>> boundary >> > > > > > >>>>>>>>>>>>>> * in general "find which checkpoint ruined >> things" >> > is >> > > > > > >>>>> quite >> > > > > > >>>>>>>>> common >> > > > > > >>>>>>>>>>>>> pattern >> > > > > > >>>>>>>>>>>>>> What I would like to highlight here is that from >> > > > > > >> Flink >> > > > > > >>>>>>> point of >> > > > > > >>>>>>>>>> view >> > > > > > >>>>>>>>>>>> the >> > > > > > >>>>>>>>>>>>>> metadata can be >> > > > > > >>>>>>>>>>>>>> considered as a static side output information >> but >> > > > > > >> for >> > > > > > >>>>> users >> > > > > > >>>>>>>>> these >> > > > > > >>>>>>>>>>>> values >> > > > > > >>>>>>>>>>>>>> are actual real data >> > > > > > >>>>>>>>>>>>>> where logic is planned to build around. >> > > > > > >>>>>>>>>>>>>> >> > > > > > >>>>>>>>>>>>>>> The metadata is more like one-time information >> > > > > > >>>> instead >> > > > > > >>>>> of >> > > > > > >>>>>>> a >> > > > > > >>>>>>>>>>> streaming >> > > > > > >>>>>>>>>>>>>> data that changes all >> > > > > > >>>>>>>>>>>>>> the time, so a single connector seems to be an >> > > > > > >>>> overkill. >> > > > > > >>>>>>>>>>>>>> >> > > > > > >>>>>>>>>>>>>> State data is also static within a savepoint and >> > > > > > >> that's >> > > > > > >>>>> the >> > > > > > >>>>>>>>> reason >> > > > > > >>>>>>>>>>> why >> > > > > > >>>>>>>>>>>>> the >> > > > > > >>>>>>>>>>>>>> state processor API is working in batch mode. >> > > > > > >>>>>>>>>>>>>> When we handle multiple checkpoints in a >> streaming >> > > > > > >>>> fashion >> > > > > > >>>>>>> then >> > > > > > >>>>>>>>>> this >> > > > > > >>>>>>>>>>>> can >> > > > > > >>>>>>>>>>>>> be >> > > > > > >>>>>>>>>>>>>> viewed from another angle. >> > > > > > >>>>>>>>>>>>>> >> > > > > > >>>>>>>>>>>>>> We can come up with more lightweight solution >> other >> > > > > > >>>> than a >> > > > > > >>>>>>> new >> > > > > > >>>>>>>>>>>> connector >> > > > > > >>>>>>>>>>>>>> but enforcing users to parse the catalog >> > > > > > >>>>>>>>>>>>>> describe command output in order to compare >> multiple >> > > > > > >>>>>>> savepoints >> > > > > > >>>>>>>>>>> doesn't >> > > > > > >>>>>>>>>>>>>> sound smooth user experience. >> > > > > > >>>>>>>>>>>>>> Honestly I've no other idea how exposing >> metadata as >> > > > > > >>>> real >> > > > > > >>>>>>> user >> > > > > > >>>>>>>>> data >> > > > > > >>>>>>>>>>> so >> > > > > > >>>>>>>>>>>>>> waiting on other approaches. >> > > > > > >>>>>>>>>>>>>> >> > > > > > >>>>>>>>>>>>>> BR, >> > > > > > >>>>>>>>>>>>>> G >> > > > > > >>>>>>>>>>>>>> >> > > > > > >>>>>>>>>>>>>> >> > > > > > >>>>>>>>>>>>>> On Thu, Mar 13, 2025 at 2:44β―AM Shengkai Fang < >> > > > > > >>>>>>>> fskm...@gmail.com >> > > > > > >>>>>>>>>> >> > > > > > >>>>>>>>>>>> wrote: >> > > > > > >>>>>>>>>>>>>> >> > > > > > >>>>>>>>>>>>>>> Looking forward to hearing the good news! >> > > > > > >>>>>>>>>>>>>>> >> > > > > > >>>>>>>>>>>>>>> Best, >> > > > > > >>>>>>>>>>>>>>> Shengkai >> > > > > > >>>>>>>>>>>>>>> >> > > > > > >>>>>>>>>>>>>>> Gabor Somogyi <gabor.g.somo...@gmail.com> >> > > > > > >>>> δΊ2025εΉ΄3ζ12ζ₯ε¨δΈ >> > > > > > >>>>>>>>> 22:24ειοΌ >> > > > > > >>>>>>>>>>>>>>> >> > > > > > >>>>>>>>>>>>>>>> Thanks for both the valuable input! >> > > > > > >>>>>>>>>>>>>>>> >> > > > > > >>>>>>>>>>>>>>>> Let me take a closer look at the suggestions, >> > > > > > >> like >> > > > > > >>>> the >> > > > > > >>>>>>>>> Catalog >> > > > > > >>>>>>>>>>>>>>> capabilities >> > > > > > >>>>>>>>>>>>>>>> and possibility of embedding TypeInformation or >> > > > > > >>>>>>>>>>>>>>>> StateDescriptor metadata directly into the raw >> > > > > > >>>> state >> > > > > > >>>>>>>> files... >> > > > > > >>>>>>>>>>>>>>>> >> > > > > > >>>>>>>>>>>>>>>> BR, >> > > > > > >>>>>>>>>>>>>>>> G >> > > > > > >>>>>>>>>>>>>>>> >> > > > > > >>>>>>>>>>>>>>>> >> > > > > > >>>>>>>>>>>>>>>> On Wed, Mar 12, 2025 at 8:17β―AM Shengkai Fang < >> > > > > > >>>>>>>>>> fskm...@gmail.com >> > > > > > >>>>>>>>>>>> >> > > > > > >>>>>>>>>>>>>> wrote: >> > > > > > >>>>>>>>>>>>>>>> >> > > > > > >>>>>>>>>>>>>>>>> Thanks for Zakelly's clarification. >> > > > > > >>>>>>>>>>>>>>>>> >> > > > > > >>>>>>>>>>>>>>>>> 1. State TTL for Value Columns >> > > > > > >>>>>>>>>>>>>>>>> >> > > > > > >>>>>>>>>>>>>>>>> +1 to delay the discussion about this. >> > > > > > >>>>>>>>>>>>>>>>> >> > > > > > >>>>>>>>>>>>>>>>> 2. Metadata Table vs. Metadata Column >> > > > > > >>>>>>>>>>>>>>>>> >> > > > > > >>>>>>>>>>>>>>>>> Iβd like to share my perspective on the State >> > > > > > >>>>> Catalog >> > > > > > >>>>>>>>>> proposal. >> > > > > > >>>>>>>>>>>>> While >> > > > > > >>>>>>>>>>>>>>>>> introducing this capability is beneficial, >> > > > > > >> there >> > > > > > >>>> is >> > > > > > >>>>> a >> > > > > > >>>>>>>>>> blocker: >> > > > > > >>>>>>>>>>>> the >> > > > > > >>>>>>>>>>>>>>>> current >> > > > > > >>>>>>>>>>>>>>>>> StateBackend architecture does not permit >> > > > > > >>>> operators >> > > > > > >>>>> to >> > > > > > >>>>>>>>> encode >> > > > > > >>>>>>>>>>>>>>>>> TypeInformation into the stateβit only >> > > > > > >> preserves >> > > > > > >>>> the >> > > > > > >>>>>>>>>>> Serializer. >> > > > > > >>>>>>>>>>>>> This >> > > > > > >>>>>>>>>>>>>>>>> limitation creates an asymmetry, as operators >> > > > > > >>>> alone >> > > > > > >>>>>>>> retain >> > > > > > >>>>>>>>>>>>> knowledge >> > > > > > >>>>>>>>>>>>>> of >> > > > > > >>>>>>>>>>>>>>>> the >> > > > > > >>>>>>>>>>>>>>>>> data structureβs schema. >> > > > > > >>>>>>>>>>>>>>>>> >> > > > > > >>>>>>>>>>>>>>>>> To address this, I suggest allowing operators >> > > > > > >> to >> > > > > > >>>>> embed >> > > > > > >>>>>>>>>>>>>> TypeInformation >> > > > > > >>>>>>>>>>>>>>> or >> > > > > > >>>>>>>>>>>>>>>>> StateDescriptor metadata directly into the raw >> > > > > > >>>> state >> > > > > > >>>>>>>> files. >> > > > > > >>>>>>>>>>> Such >> > > > > > >>>>>>>>>>>> a >> > > > > > >>>>>>>>>>>>>>> design >> > > > > > >>>>>>>>>>>>>>>>> would enable the Catalog to: >> > > > > > >>>>>>>>>>>>>>>>> >> > > > > > >>>>>>>>>>>>>>>>> 1. Parse state files and programmatically >> > > > > > >> derive >> > > > > > >>>> the >> > > > > > >>>>>>>> schema >> > > > > > >>>>>>>>>> and >> > > > > > >>>>>>>>>>>>>>>> structural >> > > > > > >>>>>>>>>>>>>>>>> guarantees for each state. >> > > > > > >>>>>>>>>>>>>>>>> 2. Leverage existing Flink Table utilities, >> > > > > > >> such >> > > > > > >>>> as >> > > > > > >>>>>>>>>>>>>>>>> LegacyTypeInfoDataTypeConverter (in >> > > > > > >>>>>>>>>>>>>>> org.apache.flink.table.types.utils), >> > > > > > >>>>>>>>>>>>>>>> to >> > > > > > >>>>>>>>>>>>>>>>> bridge TypeInformation and DataType >> > > > > > >> conversions. >> > > > > > >>>>>>>>>>>>>>>>> >> > > > > > >>>>>>>>>>>>>>>>> If we can not store the TypeInformation or >> > > > > > >>>>>>>> StateDescriptor >> > > > > > >>>>>>>>>> into >> > > > > > >>>>>>>>>>>> the >> > > > > > >>>>>>>>>>>>>> raw >> > > > > > >>>>>>>>>>>>>>>>> state files, I am +1 for this FLIP to use >> > > > > > >>>> metadata >> > > > > > >>>>>>> column >> > > > > > >>>>>>>>> to >> > > > > > >>>>>>>>>>>>> retrieve >> > > > > > >>>>>>>>>>>>>>>>> information. >> > > > > > >>>>>>>>>>>>>>>>> >> > > > > > >>>>>>>>>>>>>>>>> Best, >> > > > > > >>>>>>>>>>>>>>>>> Shengkai >> > > > > > >>>>>>>>>>>>>>>>> >> > > > > > >>>>>>>>>>>>>>>>> >> > > > > > >>>>>>>>>>>>>>>>> >> > > > > > >>>>>>>>>>>>>>>>> >> > > > > > >>>>>>>>>>>>>>>>> Zakelly Lan <zakelly....@gmail.com> >> > > > > > >>>> δΊ2025εΉ΄3ζ12ζ₯ε¨δΈ >> > > > > > >>>>>>>> 12:43ειοΌ >> > > > > > >>>>>>>>>>>>>>>>> >> > > > > > >>>>>>>>>>>>>>>>>> Hi Gabor and Shengkai, >> > > > > > >>>>>>>>>>>>>>>>>> >> > > > > > >>>>>>>>>>>>>>>>>> Thanks for sharing your thoughts! This is a >> > > > > > >>>> long >> > > > > > >>>>>>>>> discussion >> > > > > > >>>>>>>>>>> and >> > > > > > >>>>>>>>>>>>>> sorry >> > > > > > >>>>>>>>>>>>>>>> for >> > > > > > >>>>>>>>>>>>>>>>>> the late reply (I'm busy catching up with >> > > > > > >>>> release >> > > > > > >>>>>>> 2.0 >> > > > > > >>>>>>>>> these >> > > > > > >>>>>>>>>>>>> days). >> > > > > > >>>>>>>>>>>>>>>>>> >> > > > > > >>>>>>>>>>>>>>>>>> 1. State TTL for Value Columns >> > > > > > >>>>>>>>>>>>>>>>>> >> > > > > > >>>>>>>>>>>>>>>>>> >> > > > > > >>>>>>>>>>>>>>>>>> Let me first clarify your thoughts to ensure >> > > > > > >> I >> > > > > > >>>>>>>> understand >> > > > > > >>>>>>>>>>>>>> correctly. >> > > > > > >>>>>>>>>>>>>>>>> IIUC, >> > > > > > >>>>>>>>>>>>>>>>>> there is no persistent configuration for >> > > > > > >> state >> > > > > > >>>> TTL >> > > > > > >>>>>>> in >> > > > > > >>>>>>>> the >> > > > > > >>>>>>>>>>>>>> checkpoint. >> > > > > > >>>>>>>>>>>>>>>>> While >> > > > > > >>>>>>>>>>>>>>>>>> you can infer that TTL is enabled by reading >> > > > > > >>>> the >> > > > > > >>>>>>>>>> serializer, >> > > > > > >>>>>>>>>>>> the >> > > > > > >>>>>>>>>>>>>>>>> checkpoint >> > > > > > >>>>>>>>>>>>>>>>>> itself only stores the last access time for >> > > > > > >>>> each >> > > > > > >>>>>>> value. >> > > > > > >>>>>>>>> So >> > > > > > >>>>>>>>>>> the >> > > > > > >>>>>>>>>>>>> only >> > > > > > >>>>>>>>>>>>>>>> thing >> > > > > > >>>>>>>>>>>>>>>>>> we can show is the last access time for each >> > > > > > >>>>> value. >> > > > > > >>>>>>> But >> > > > > > >>>>>>>>> it >> > > > > > >>>>>>>>>> is >> > > > > > >>>>>>>>>>>> not >> > > > > > >>>>>>>>>>>>>>>>> required >> > > > > > >>>>>>>>>>>>>>>>>> for all state backends to store this, as they >> > > > > > >>>> may >> > > > > > >>>>>>>>> directly >> > > > > > >>>>>>>>>>>> store >> > > > > > >>>>>>>>>>>>>> the >> > > > > > >>>>>>>>>>>>>>>>>> expired time. This will also increase the >> > > > > > >>>>>>> difficulty of >> > > > > > >>>>>>>>>>>>>>> implementation >> > > > > > >>>>>>>>>>>>>>>> & >> > > > > > >>>>>>>>>>>>>>>>>> maintenance. >> > > > > > >>>>>>>>>>>>>>>>>> >> > > > > > >>>>>>>>>>>>>>>>>> This once again reiterates the importance of >> > > > > > >>>>> unified >> > > > > > >>>>>>>>>> metadata >> > > > > > >>>>>>>>>>>> for >> > > > > > >>>>>>>>>>>>>>>>>> checkpoints. Iβm planning on adding this, and >> > > > > > >>>> we >> > > > > > >>>>> may >> > > > > > >>>>>>>>>>>> collaborate >> > > > > > >>>>>>>>>>>>> on >> > > > > > >>>>>>>>>>>>>>> it >> > > > > > >>>>>>>>>>>>>>>> in >> > > > > > >>>>>>>>>>>>>>>>>> the future. >> > > > > > >>>>>>>>>>>>>>>>>> >> > > > > > >>>>>>>>>>>>>>>>>> 2. Metadata Table vs. Metadata Column >> > > > > > >>>>>>>>>>>>>>>>>> >> > > > > > >>>>>>>>>>>>>>>>>> >> > > > > > >>>>>>>>>>>>>>>>>> I'm not in favor of adding a new connector >> > > > > > >> for >> > > > > > >>>>>>>> metadata. >> > > > > > >>>>>>>>>> The >> > > > > > >>>>>>>>>>>>>> metadata >> > > > > > >>>>>>>>>>>>>>>> is >> > > > > > >>>>>>>>>>>>>>>>>> more like one-time information instead of a >> > > > > > >>>>>>> streaming >> > > > > > >>>>>>>>> data >> > > > > > >>>>>>>>>>> that >> > > > > > >>>>>>>>>>>>>>> changes >> > > > > > >>>>>>>>>>>>>>>>> all >> > > > > > >>>>>>>>>>>>>>>>>> the time, so a single connector seems to be >> > > > > > >> an >> > > > > > >>>>>>>> overkill. >> > > > > > >>>>>>>>> It >> > > > > > >>>>>>>>>>> is >> > > > > > >>>>>>>>>>>>> not >> > > > > > >>>>>>>>>>>>>>> easy >> > > > > > >>>>>>>>>>>>>>>>> to >> > > > > > >>>>>>>>>>>>>>>>>> withdraw a connector if we have a better >> > > > > > >>>> solution >> > > > > > >>>>> in >> > > > > > >>>>>>>>>> future. >> > > > > > >>>>>>>>>>>> I'm >> > > > > > >>>>>>>>>>>>>> not >> > > > > > >>>>>>>>>>>>>>>>>> familiar with current Catalog capabilities, >> > > > > > >>>> and if >> > > > > > >>>>>>> it >> > > > > > >>>>>>>>> could >> > > > > > >>>>>>>>>>>>> extract >> > > > > > >>>>>>>>>>>>>>> and >> > > > > > >>>>>>>>>>>>>>>>>> show some operator-level information from >> > > > > > >>>>> savepoint, >> > > > > > >>>>>>>> that >> > > > > > >>>>>>>>>>> would >> > > > > > >>>>>>>>>>>>> be >> > > > > > >>>>>>>>>>>>>>>> great. >> > > > > > >>>>>>>>>>>>>>>>>> >> > > > > > >>>>>>>>>>>>>>>>>> If the Catalog can't do that, I would >> > > > > > >> consider >> > > > > > >>>> the >> > > > > > >>>>>>>>> current >> > > > > > >>>>>>>>>>> FLIP >> > > > > > >>>>>>>>>>>>> to >> > > > > > >>>>>>>>>>>>>>> be a >> > > > > > >>>>>>>>>>>>>>>>>> compromise solution. >> > > > > > >>>>>>>>>>>>>>>>>> >> > > > > > >>>>>>>>>>>>>>>>>> And if we have that unified metadata for >> > > > > > >>>>>>>>>> checkpoint/savepoint >> > > > > > >>>>>>>>>>>> in >> > > > > > >>>>>>>>>>>>>>>> future, >> > > > > > >>>>>>>>>>>>>>>>> we >> > > > > > >>>>>>>>>>>>>>>>>> may directly register savepoint in catalog, >> > > > > > >> and >> > > > > > >>>>>>> create >> > > > > > >>>>>>>> a >> > > > > > >>>>>>>>>>> source >> > > > > > >>>>>>>>>>>>>>> without >> > > > > > >>>>>>>>>>>>>>>>>> specifying complex columns, as well as >> > > > > > >> describe >> > > > > > >>>>> the >> > > > > > >>>>>>>>>> savepoint >> > > > > > >>>>>>>>>>>>>> catalog >> > > > > > >>>>>>>>>>>>>>>> to >> > > > > > >>>>>>>>>>>>>>>>>> get the metadata. That's a good solution in >> > > > > > >> my >> > > > > > >>>>> mind. >> > > > > > >>>>>>>>>>>>>>>>>> >> > > > > > >>>>>>>>>>>>>>>>>> >> > > > > > >>>>>>>>>>>>>>>>>> Best, >> > > > > > >>>>>>>>>>>>>>>>>> Zakelly >> > > > > > >>>>>>>>>>>>>>>>>> >> > > > > > >>>>>>>>>>>>>>>>>> On Wed, Mar 12, 2025 at 10:35β―AM Shengkai >> > > > > > >> Fang >> > > > > > >>>> < >> > > > > > >>>>>>>>>>>>> fskm...@gmail.com> >> > > > > > >>>>>>>>>>>>>>>>> wrote: >> > > > > > >>>>>>>>>>>>>>>>>> >> > > > > > >>>>>>>>>>>>>>>>>>> Hi Gabor, >> > > > > > >>>>>>>>>>>>>>>>>>> >> > > > > > >>>>>>>>>>>>>>>>>>>> 2. Adding a new connector with >> > > > > > >>>>>>> `savepoint-metadata` >> > > > > > >>>>>>>>>>>>>>>>>>> >> > > > > > >>>>>>>>>>>>>>>>>>> I would argue against introducing a new >> > > > > > >>>>> connector >> > > > > > >>>>>>>> type >> > > > > > >>>>>>>>>>> named >> > > > > > >>>>>>>>>>>>>>>>>>> savepoint-metadata, as the existing Catalog >> > > > > > >>>>>>> mechanism >> > > > > > >>>>>>>>> can >> > > > > > >>>>>>>>>>>>>>> inherently >> > > > > > >>>>>>>>>>>>>>>>>>> provide the necessary connector factory >> > > > > > >>>>>>> capabilities. >> > > > > > >>>>>>>>>> Iβve >> > > > > > >>>>>>>>>>>>>> detailed >> > > > > > >>>>>>>>>>>>>>>>> this >> > > > > > >>>>>>>>>>>>>>>>>>> proposal in branch[1]. Please take a moment >> > > > > > >>>> to >> > > > > > >>>>>>> review >> > > > > > >>>>>>>>> it. >> > > > > > >>>>>>>>>>>>>>>>>>> >> > > > > > >>>>>>>>>>>>>>>>>>> If we introduce a connector named >> > > > > > >>>>>>>> `savepoint-metadata`, >> > > > > > >>>>>>>>>> it >> > > > > > >>>>>>>>>>>>> means >> > > > > > >>>>>>>>>>>>>>> user >> > > > > > >>>>>>>>>>>>>>>>> can >> > > > > > >>>>>>>>>>>>>>>>>>> create a temporary table with connector >> > > > > > >>>>>>>>>>> `savepoint-metadata` >> > > > > > >>>>>>>>>>>>> and >> > > > > > >>>>>>>>>>>>>>> the >> > > > > > >>>>>>>>>>>>>>>>>>> connector needs to check whether table >> > > > > > >>>> schema is >> > > > > > >>>>>>> same >> > > > > > >>>>>>>>> to >> > > > > > >>>>>>>>>>> the >> > > > > > >>>>>>>>>>>>>> schema >> > > > > > >>>>>>>>>>>>>>>> we >> > > > > > >>>>>>>>>>>>>>>>>>> proposed in the FLIP. On the other hand, >> > > > > > >> it's >> > > > > > >>>>> not >> > > > > > >>>>>>>> easy >> > > > > > >>>>>>>>>> work >> > > > > > >>>>>>>>>>>> for >> > > > > > >>>>>>>>>>>>>>>> others >> > > > > > >>>>>>>>>>>>>>>>> to >> > > > > > >>>>>>>>>>>>>>>>>>> users a metadata table with same schema. >> > > > > > >>>>>>>>>>>>>>>>>>> >> > > > > > >>>>>>>>>>>>>>>>>>> [1] >> > > > > > >>>>>>>>>>>>>>>>>>> >> > > > > > >>>>>>>>>>>>>>>>>>> >> > > > > > >>>>>>>>>>>>>>>>>> >> > > > > > >>>>>>>>>>>>>>>>> >> > > > > > >>>>>>>>>>>>>>>> >> > > > > > >>>>>>>>>>>>>>> >> > > > > > >>>>>>>>>>>>>> >> > > > > > >>>>>>>>>>>>> >> > > > > > >>>>>>>>>>>> >> > > > > > >>>>>>>>>>> >> > > > > > >>>>>>>>>> >> > > > > > >>>>>>>>> >> > > > > > >>>>>>>> >> > > > > > >>>>>>> >> > > > > > >>>>> >> > > > > > >>>> >> > > > > > >> >> > > > > > >> > > > > >> > > > >> > > >> > >> https://github.com/apache/flink/compare/master...fsk119:flink:state-metadata?expand=1#diff-712a7bc92fe46c405fb0e61b475bb2a005cb7a72bab7df28bbb92744bcb5f465R63 >> > > > > > >>>>>>>>>>>>>>>>>>> >> > > > > > >>>>>>>>>>>>>>>>>>> Best, >> > > > > > >>>>>>>>>>>>>>>>>>> Shengkai >> > > > > > >>>>>>>>>>>>>>>>>>> >> > > > > > >>>>>>>>>>>>>>>>>>> Gabor Somogyi <gabor.g.somo...@gmail.com> >> > > > > > >>>>>>>>> δΊ2025εΉ΄3ζ11ζ₯ε¨δΊ >> > > > > > >>>>>>>>>>>>> 16:56ειοΌ >> > > > > > >>>>>>>>>>>>>>>>>>> >> > > > > > >>>>>>>>>>>>>>>>>>>> Hi Shengkai, >> > > > > > >>>>>>>>>>>>>>>>>>>> >> > > > > > >>>>>>>>>>>>>>>>>>>>> 1. State TTL for Value Columns >> > > > > > >>>>>>>>>>>>>>>>>>>> >> > > > > > >>>>>>>>>>>>>>>>>>>> From directional perspective I agree your >> > > > > > >>>> idea >> > > > > > >>>>>>> how >> > > > > > >>>>>>>> it >> > > > > > >>>>>>>>>> can >> > > > > > >>>>>>>>>>>> be >> > > > > > >>>>>>>>>>>>>>>>>> implemented. >> > > > > > >>>>>>>>>>>>>>>>>>>> Previously I've mentioned that TTL >> > > > > > >>>> information >> > > > > > >>>>>>> is >> > > > > > >>>>>>>> not >> > > > > > >>>>>>>>>>>> exposed >> > > > > > >>>>>>>>>>>>>> on >> > > > > > >>>>>>>>>>>>>>>> the >> > > > > > >>>>>>>>>>>>>>>>>>> state >> > > > > > >>>>>>>>>>>>>>>>>>>> processor API (which the SQL state >> > > > > > >>>> connector >> > > > > > >>>>>>> uses >> > > > > > >>>>>>>> to >> > > > > > >>>>>>>>>> read >> > > > > > >>>>>>>>>>>>> data) >> > > > > > >>>>>>>>>>>>>>>>>>>> and unless somebody show me the opposite >> > > > > > >>>> this >> > > > > > >>>>>>> FLIP >> > > > > > >>>>>>>> is >> > > > > > >>>>>>>>>> not >> > > > > > >>>>>>>>>>>>> going >> > > > > > >>>>>>>>>>>>>>> to >> > > > > > >>>>>>>>>>>>>>>>>>> address >> > > > > > >>>>>>>>>>>>>>>>>>>> this to avoid feature creep. Our users >> > > > > > >> are >> > > > > > >>>>> also >> > > > > > >>>>>>>>>>> interested >> > > > > > >>>>>>>>>>>> in >> > > > > > >>>>>>>>>>>>>> TTL >> > > > > > >>>>>>>>>>>>>>>> so >> > > > > > >>>>>>>>>>>>>>>>>>>> sooner or later we're going to expose it, >> > > > > > >>>> this >> > > > > > >>>>>>> is >> > > > > > >>>>>>>>>> matter >> > > > > > >>>>>>>>>>> of >> > > > > > >>>>>>>>>>>>>>>>> scheduling. >> > > > > > >>>>>>>>>>>>>>>>>>>> >> > > > > > >>>>>>>>>>>>>>>>>>>>> 2. Adding a new connector with >> > > > > > >>>>>>>> `savepoint-metadata` >> > > > > > >>>>>>>>>>>>>>>>>>>> >> > > > > > >>>>>>>>>>>>>>>>>>>> Not sure I understand your point at all >> > > > > > >>>>> related >> > > > > > >>>>>>>>>>>> StateCatalog. >> > > > > > >>>>>>>>>>>>>>> First >> > > > > > >>>>>>>>>>>>>>>>> of >> > > > > > >>>>>>>>>>>>>>>>>>> all >> > > > > > >>>>>>>>>>>>>>>>>>>> I can't agree more that StateCatalog is >> > > > > > >>>> needed >> > > > > > >>>>>>> and >> > > > > > >>>>>>>>> is a >> > > > > > >>>>>>>>>>>>> planned >> > > > > > >>>>>>>>>>>>>>>>>> building >> > > > > > >>>>>>>>>>>>>>>>>>>> block in an upcoming >> > > > > > >>>>>>>>>>>>>>>>>>>> FLIP but not sure how can it help now? No >> > > > > > >>>>> matter >> > > > > > >>>>>>>>> what, >> > > > > > >>>>>>>>>>> your >> > > > > > >>>>>>>>>>>>>>>> knowledge >> > > > > > >>>>>>>>>>>>>>>>>> is >> > > > > > >>>>>>>>>>>>>>>>>>>> essential when we add StateCatalog. Let >> > > > > > >> me >> > > > > > >>>>>>> expose >> > > > > > >>>>>>>> my >> > > > > > >>>>>>>>>>>>>>> understanding >> > > > > > >>>>>>>>>>>>>>>> in >> > > > > > >>>>>>>>>>>>>>>>>>> this >> > > > > > >>>>>>>>>>>>>>>>>>>> area: >> > > > > > >>>>>>>>>>>>>>>>>>>> * First we need create table statements >> > > > > > >> to >> > > > > > >>>>>>> access >> > > > > > >>>>>>>>> state >> > > > > > >>>>>>>>>>>> data >> > > > > > >>>>>>>>>>>>>> and >> > > > > > >>>>>>>>>>>>>>>>>> metadata >> > > > > > >>>>>>>>>>>>>>>>>>>> * When we have that then we can add >> > > > > > >>>>> StateCatalog >> > > > > > >>>>>>>>> which >> > > > > > >>>>>>>>>>>> could >> > > > > > >>>>>>>>>>>>>>>>>> potentially >> > > > > > >>>>>>>>>>>>>>>>>>>> ease the life of users by for ex. giving >> > > > > > >>>>>>>>> off-the-shelf >> > > > > > >>>>>>>>>>>> tables >> > > > > > >>>>>>>>>>>>>>>> without >> > > > > > >>>>>>>>>>>>>>>>>>>> sweating with create table statements >> > > > > > >>>>>>>>>>>>>>>>>>>> >> > > > > > >>>>>>>>>>>>>>>>>>>> User expectations: >> > > > > > >>>>>>>>>>>>>>>>>>>> * See state data (this is fulfilled with >> > > > > > >>>> the >> > > > > > >>>>>>>> existing >> > > > > > >>>>>>>>>>>>>> connector) >> > > > > > >>>>>>>>>>>>>>>>>>>> * See metadata about state data like TTL >> > > > > > >>>> (this >> > > > > > >>>>>>> can >> > > > > > >>>>>>>> be >> > > > > > >>>>>>>>>>> added >> > > > > > >>>>>>>>>>>>> as >> > > > > > >>>>>>>>>>>>>>>>> metadata >> > > > > > >>>>>>>>>>>>>>>>>>>> column as you suggested since it belongs >> > > > > > >> to >> > > > > > >>>>> the >> > > > > > >>>>>>>> data) >> > > > > > >>>>>>>>>>>>>>>>>>>> * See metadata about operators (this can >> > > > > > >> be >> > > > > > >>>>>>> added >> > > > > > >>>>>>>>> from >> > > > > > >>>>>>>>>>>>>>>>>>> savepoint-metadata) >> > > > > > >>>>>>>>>>>>>>>>>>>> >> > > > > > >>>>>>>>>>>>>>>>>>>> Important to highlight that state data >> > > > > > >>>> table >> > > > > > >>>>>>> format >> > > > > > >>>>>>>>>>> differs >> > > > > > >>>>>>>>>>>>>> from >> > > > > > >>>>>>>>>>>>>>>>> state >> > > > > > >>>>>>>>>>>>>>>>>>>> metadata table format. Namely one table >> > > > > > >> has >> > > > > > >>>>> rows >> > > > > > >>>>>>>> for >> > > > > > >>>>>>>>>>> state >> > > > > > >>>>>>>>>>>>>> values >> > > > > > >>>>>>>>>>>>>>>> and >> > > > > > >>>>>>>>>>>>>>>>>>>> another has rows for operators, right? >> > > > > > >>>>>>>>>>>>>>>>>>>> I think that's the reason why you've >> > > > > > >>>>> pinpointed >> > > > > > >>>>>>> out >> > > > > > >>>>>>>>>> that >> > > > > > >>>>>>>>>>>> the >> > > > > > >>>>>>>>>>>>>>>>> suggested >> > > > > > >>>>>>>>>>>>>>>>>>>> metadata columns are somewhat clunky. >> > > > > > >>>>>>>>>>>>>>>>>>>> >> > > > > > >>>>>>>>>>>>>>>>>>>> As a conclusion I agree to add >> > > > > > >>>>> ${state-name}_ttl >> > > > > > >>>>>>>>>> metadata >> > > > > > >>>>>>>>>>>>>> column >> > > > > > >>>>>>>>>>>>>>>>> later >> > > > > > >>>>>>>>>>>>>>>>>> on >> > > > > > >>>>>>>>>>>>>>>>>>>> since it belongs to the state value and >> > > > > > >>>>> adding a >> > > > > > >>>>>>>> new >> > > > > > >>>>>>>>>>> table >> > > > > > >>>>>>>>>>>>> type >> > > > > > >>>>>>>>>>>>>>>> (like >> > > > > > >>>>>>>>>>>>>>>>>> you >> > > > > > >>>>>>>>>>>>>>>>>>>> suggested similar to PG [1]) >> > > > > > >>>>>>>>>>>>>>>>>>>> for metadata. Please see how Spark does >> > > > > > >>>> that >> > > > > > >>>>> too >> > > > > > >>>>>>>> [2]. >> > > > > > >>>>>>>>>>>>>>>>>>>> >> > > > > > >>>>>>>>>>>>>>>>>>>> If you have better approach then please >> > > > > > >>>>>>> elaborate >> > > > > > >>>>>>>>> with >> > > > > > >>>>>>>>>>> more >> > > > > > >>>>>>>>>>>>>>> details >> > > > > > >>>>>>>>>>>>>>>>> and >> > > > > > >>>>>>>>>>>>>>>>>>>> help me to understand your point. >> > > > > > >>>>>>>>>>>>>>>>>>>> >> > > > > > >>>>>>>>>>>>>>>>>>>>> Up until now we've seen even in TB >> > > > > > >>>>> savepoints >> > > > > > >>>>>>>> that >> > > > > > >>>>>>>>>> the >> > > > > > >>>>>>>>>>>>> number >> > > > > > >>>>>>>>>>>>>>> of >> > > > > > >>>>>>>>>>>>>>>>> keys >> > > > > > >>>>>>>>>>>>>>>>>>> can >> > > > > > >>>>>>>>>>>>>>>>>>>>> be extremely huge but not the per key >> > > > > > >>>> state >> > > > > > >>>>>>>> itself. >> > > > > > >>>>>>>>>>>>>>>>>>>>> But again, this is a good feature as-is >> > > > > > >>>> and >> > > > > > >>>>>>> can >> > > > > > >>>>>>>> be >> > > > > > >>>>>>>>>>>> handled >> > > > > > >>>>>>>>>>>>>> in a >> > > > > > >>>>>>>>>>>>>>>>>>> separate >> > > > > > >>>>>>>>>>>>>>>>>>>>> jira. >> > > > > > >>>>>>>>>>>>>>>>>>>> >> > > > > > >>>>>>>>>>>>>>>>>>>> I've just created >> > > > > > >>>>>>>>>>>>>>>> >> > > > > > >> https://issues.apache.org/jira/browse/FLINK-37456. >> > > > > > >>>>>>>>>>>>>>>>>>>> >> > > > > > >>>>>>>>>>>>>>>>>>>> [1] >> > > > > > >>>>>>>>>>>>>> >> > > > > > >>>>> >> https://www.postgresql.org/docs/current/view-pg-tables.html >> > > > > > >>>>>>>>>>>>>>>>>>>> [2] >> > > > > > >>>>>>>>>>>>>>>>>>>> >> > > > > > >>>>>>>>>>>>>>>>>>>> >> > > > > > >>>>>>>>>>>>>>>>>>> >> > > > > > >>>>>>>>>>>>>>>>>> >> > > > > > >>>>>>>>>>>>>>>>> >> > > > > > >>>>>>>>>>>>>>>> >> > > > > > >>>>>>>>>>>>>>> >> > > > > > >>>>>>>>>>>>>> >> > > > > > >>>>>>>>>>>>> >> > > > > > >>>>>>>>>>>> >> > > > > > >>>>>>>>>>> >> > > > > > >>>>>>>>>> >> > > > > > >>>>>>>>> >> > > > > > >>>>>>>> >> > > > > > >>>>>>> >> > > > > > >>>>> >> > > > > > >>>> >> > > > > > >> >> > > > > > >> > > > > >> > > > >> > > >> > >> https://www.databricks.com/blog/announcing-state-reader-api-new-statestore-data-source >> > > > > > >>>>>>>>>>>>>>>>>>>> >> > > > > > >>>>>>>>>>>>>>>>>>>> BR, >> > > > > > >>>>>>>>>>>>>>>>>>>> G >> > > > > > >>>>>>>>>>>>>>>>>>>> >> > > > > > >>>>>>>>>>>>>>>>>>>> >> > > > > > >>>>>>>>>>>>>>>>>>>> On Tue, Mar 11, 2025 at 3:55β―AM Shengkai >> > > > > > >>>> Fang >> > > > > > >>>>> < >> > > > > > >>>>>>>>>>>>>> fskm...@gmail.com >> > > > > > >>>>>>>>>>>>>>>> >> > > > > > >>>>>>>>>>>>>>>>>> wrote: >> > > > > > >>>>>>>>>>>>>>>>>>>> >> > > > > > >>>>>>>>>>>>>>>>>>>>> Hi, Gabor. Thanks for your response. >> > > > > > >>>>>>>>>>>>>>>>>>>>> >> > > > > > >>>>>>>>>>>>>>>>>>>>>> 1. State TTL for Value Columns >> > > > > > >>>>>>>>>>>>>>>>>>>>> >> > > > > > >>>>>>>>>>>>>>>>>>>>> Thank you for addressing the >> > > > > > >> limitations >> > > > > > >>>>> here. >> > > > > > >>>>>>>>>>> However, I >> > > > > > >>>>>>>>>>>>>>> believe >> > > > > > >>>>>>>>>>>>>>>>> it >> > > > > > >>>>>>>>>>>>>>>>>>>> would >> > > > > > >>>>>>>>>>>>>>>>>>>>> be beneficial to further clarify the >> > > > > > >> API >> > > > > > >>>> in >> > > > > > >>>>>>> this >> > > > > > >>>>>>>>> FLIP >> > > > > > >>>>>>>>>>>>>> regarding >> > > > > > >>>>>>>>>>>>>>>> how >> > > > > > >>>>>>>>>>>>>>>>>>> users >> > > > > > >>>>>>>>>>>>>>>>>>>>> can specify the TTL column. >> > > > > > >>>>>>>>>>>>>>>>>>>>> >> > > > > > >>>>>>>>>>>>>>>>>>>>> One potential approach that comes to >> > > > > > >>>> mind is >> > > > > > >>>>>>>> using >> > > > > > >>>>>>>>> a >> > > > > > >>>>>>>>>>>>>>> standardized >> > > > > > >>>>>>>>>>>>>>>>>>> naming >> > > > > > >>>>>>>>>>>>>>>>>>>>> convention such as ${state-name}_ttl >> > > > > > >> for >> > > > > > >>>> the >> > > > > > >>>>>>>>> metadata >> > > > > > >>>>>>>>>>>>> column >> > > > > > >>>>>>>>>>>>>>> that >> > > > > > >>>>>>>>>>>>>>>>>>> defines >> > > > > > >>>>>>>>>>>>>>>>>>>>> the TTL value. In terms of >> > > > > > >>>> implementation, >> > > > > > >>>>> the >> > > > > > >>>>>>>>>>>>>>>> listReadableMetadata >> > > > > > >>>>>>>>>>>>>>>>>>>>> function could: >> > > > > > >>>>>>>>>>>>>>>>>>>>> >> > > > > > >>>>>>>>>>>>>>>>>>>>> 1. Read the tableβs columns and >> > > > > > >>>>> configuration, >> > > > > > >>>>>>>>>>>>>>>>>>>>> 2. Extract all defined state names, and >> > > > > > >>>>>>>>>>>>>>>>>>>>> 3. Return a structured list of metadata >> > > > > > >>>>>>> entries >> > > > > > >>>>>>>>>>> formatted >> > > > > > >>>>>>>>>>>>> as >> > > > > > >>>>>>>>>>>>>>>>>>>>> ${state-name}_ttl. >> > > > > > >>>>>>>>>>>>>>>>>>>>> >> > > > > > >>>>>>>>>>>>>>>>>>>>> WDYT? >> > > > > > >>>>>>>>>>>>>>>>>>>>> >> > > > > > >>>>>>>>>>>>>>>>>>>>>> 2. Adding a new connector with >> > > > > > >>>>>>>>> `savepoint-metadata` >> > > > > > >>>>>>>>>>>>>>>>>>>>> >> > > > > > >>>>>>>>>>>>>>>>>>>>> Introducing a new connector type at >> > > > > > >> this >> > > > > > >>>>> stage >> > > > > > >>>>>>>> may >> > > > > > >>>>>>>>>>>>>>> unnecessarily >> > > > > > >>>>>>>>>>>>>>>>>>>> complicate >> > > > > > >>>>>>>>>>>>>>>>>>>>> the system. Given that every table >> > > > > > >>>> already >> > > > > > >>>>>>>> belongs >> > > > > > >>>>>>>>>> to a >> > > > > > >>>>>>>>>>>>>>> Catalog, >> > > > > > >>>>>>>>>>>>>>>>>> which >> > > > > > >>>>>>>>>>>>>>>>>>> is >> > > > > > >>>>>>>>>>>>>>>>>>>>> designed to provide a Factory for >> > > > > > >>>> building >> > > > > > >>>>>>> source >> > > > > > >>>>>>>>> or >> > > > > > >>>>>>>>>>> sink >> > > > > > >>>>>>>>>>>>>>>>>> connectors, I >> > > > > > >>>>>>>>>>>>>>>>>>>>> propose integrating a dedicated >> > > > > > >>>> StateCatalog >> > > > > > >>>>>>>>> instead. >> > > > > > >>>>>>>>>>>> This >> > > > > > >>>>>>>>>>>>>>>> approach >> > > > > > >>>>>>>>>>>>>>>>>>> would >> > > > > > >>>>>>>>>>>>>>>>>>>>> allow us to: >> > > > > > >>>>>>>>>>>>>>>>>>>>> >> > > > > > >>>>>>>>>>>>>>>>>>>>> 1. Leverage the Catalogβs existing >> > > > > > >>>>>>> capabilities >> > > > > > >>>>>>>> to >> > > > > > >>>>>>>>>>> manage >> > > > > > >>>>>>>>>>>>> TTL >> > > > > > >>>>>>>>>>>>>>>>>> metadata >> > > > > > >>>>>>>>>>>>>>>>>>>>> (e.g., state names and TTL logic) >> > > > > > >> without >> > > > > > >>>>>>>>> duplicating >> > > > > > >>>>>>>>>>>>>>>>> functionality. >> > > > > > >>>>>>>>>>>>>>>>>>>>> 2. Provide a unified interface for >> > > > > > >>>> connector >> > > > > > >>>>>>>>>>>> instantiation >> > > > > > >>>>>>>>>>>>>> and >> > > > > > >>>>>>>>>>>>>>>>>> metadata >> > > > > > >>>>>>>>>>>>>>>>>>>>> handling through the Catalogβs Factory >> > > > > > >>>>>>> pattern. >> > > > > > >>>>>>>>>>>>>>>>>>>>> >> > > > > > >>>>>>>>>>>>>>>>>>>>> Would this design decision better align >> > > > > > >>>> with >> > > > > > >>>>>>> our >> > > > > > >>>>>>>>>>>>>> architectureβs >> > > > > > >>>>>>>>>>>>>>>>>>>>> extensibility and reduce redundancy? >> > > > > > >>>>>>>>>>>>>>>>>>>>> >> > > > > > >>>>>>>>>>>>>>>>>>>>>> Up until now we've seen even in TB >> > > > > > >>>>>>> savepoints >> > > > > > >>>>>>>>> that >> > > > > > >>>>>>>>>>> the >> > > > > > >>>>>>>>>>>>>> number >> > > > > > >>>>>>>>>>>>>>>> of >> > > > > > >>>>>>>>>>>>>>>>>> keys >> > > > > > >>>>>>>>>>>>>>>>>>>> can >> > > > > > >>>>>>>>>>>>>>>>>>>>>> be extremely huge but not the per key >> > > > > > >>>>> state >> > > > > > >>>>>>>>> itself. >> > > > > > >>>>>>>>>>>>>>>>>>>>>> But again, this is a good feature >> > > > > > >> as-is >> > > > > > >>>>> and >> > > > > > >>>>>>> can >> > > > > > >>>>>>>>> be >> > > > > > >>>>>>>>>>>>> handled >> > > > > > >>>>>>>>>>>>>>> in a >> > > > > > >>>>>>>>>>>>>>>>>>>> separate >> > > > > > >>>>>>>>>>>>>>>>>>>>>> jira. >> > > > > > >>>>>>>>>>>>>>>>>>>>> >> > > > > > >>>>>>>>>>>>>>>>>>>>> +1 for a separate jira. >> > > > > > >>>>>>>>>>>>>>>>>>>>> >> > > > > > >>>>>>>>>>>>>>>>>>>>> Best, >> > > > > > >>>>>>>>>>>>>>>>>>>>> Shengkai >> > > > > > >>>>>>>>>>>>>>>>>>>>> >> > > > > > >>>>>>>>>>>>>>>>>>>>> Gabor Somogyi < >> > > > > > >> gabor.g.somo...@gmail.com >> > > > > > >>>>> >> > > > > > >>>>>>>>>>> δΊ2025εΉ΄3ζ10ζ₯ε¨δΈ >> > > > > > >>>>>>>>>>>>>>> 19:05ειοΌ >> > > > > > >>>>>>>>>>>>>>>>>>>>> >> > > > > > >>>>>>>>>>>>>>>>>>>>>> Hi Shengkai, >> > > > > > >>>>>>>>>>>>>>>>>>>>>> >> > > > > > >>>>>>>>>>>>>>>>>>>>>> Please see my comments inline. >> > > > > > >>>>>>>>>>>>>>>>>>>>>> >> > > > > > >>>>>>>>>>>>>>>>>>>>>> BR, >> > > > > > >>>>>>>>>>>>>>>>>>>>>> G >> > > > > > >>>>>>>>>>>>>>>>>>>>>> >> > > > > > >>>>>>>>>>>>>>>>>>>>>> >> > > > > > >>>>>>>>>>>>>>>>>>>>>> On Mon, Mar 3, 2025 at 7:07β―AM >> > > > > > >> Shengkai >> > > > > > >>>>>>> Fang < >> > > > > > >>>>>>>>>>>>>>>> fskm...@gmail.com> >> > > > > > >>>>>>>>>>>>>>>>>>>> wrote: >> > > > > > >>>>>>>>>>>>>>>>>>>>>> >> > > > > > >>>>>>>>>>>>>>>>>>>>>>> Hi, Gabor. Thanks for your the >> > > > > > >> FLIP. >> > > > > > >>>> I >> > > > > > >>>>>>> have >> > > > > > >>>>>>>>> some >> > > > > > >>>>>>>>>>>>>> questions >> > > > > > >>>>>>>>>>>>>>>>> about >> > > > > > >>>>>>>>>>>>>>>>>>> the >> > > > > > >>>>>>>>>>>>>>>>>>>>>> FLIP: >> > > > > > >>>>>>>>>>>>>>>>>>>>>>> >> > > > > > >>>>>>>>>>>>>>>>>>>>>>> 1. State TTL for Value Columns >> > > > > > >>>>>>>>>>>>>>>>>>>>>>> How can users retrieve the state >> > > > > > >> TTL >> > > > > > >>>>>>>>>> (Time-to-Live) >> > > > > > >>>>>>>>>>>> for >> > > > > > >>>>>>>>>>>>>>> each >> > > > > > >>>>>>>>>>>>>>>>>> value >> > > > > > >>>>>>>>>>>>>>>>>>>>>> column? >> > > > > > >>>>>>>>>>>>>>>>>>>>>>> From my understanding of the >> > > > > > >> current >> > > > > > >>>>>>> design, >> > > > > > >>>>>>>> it >> > > > > > >>>>>>>>>>> seems >> > > > > > >>>>>>>>>>>>>> that >> > > > > > >>>>>>>>>>>>>>>> this >> > > > > > >>>>>>>>>>>>>>>>>>>>>>> functionality is not supported. >> > > > > > >> Could >> > > > > > >>>>> you >> > > > > > >>>>>>>>> clarify >> > > > > > >>>>>>>>>>> if >> > > > > > >>>>>>>>>>>>>> there >> > > > > > >>>>>>>>>>>>>>>> are >> > > > > > >>>>>>>>>>>>>>>>>>> plans >> > > > > > >>>>>>>>>>>>>>>>>>>> to >> > > > > > >>>>>>>>>>>>>>>>>>>>>>> address this limitation? >> > > > > > >>>>>>>>>>>>>>>>>>>>>>> >> > > > > > >>>>>>>>>>>>>>>>>>>>>> >> > > > > > >>>>>>>>>>>>>>>>>>>>>> Since the state processor API is not >> > > > > > >>>> yet >> > > > > > >>>>>>>> exposing >> > > > > > >>>>>>>>>>> this >> > > > > > >>>>>>>>>>>>>>>>> information >> > > > > > >>>>>>>>>>>>>>>>>>> this >> > > > > > >>>>>>>>>>>>>>>>>>>>>> would require several steps. >> > > > > > >>>>>>>>>>>>>>>>>>>>>> First, the state processor API >> > > > > > >> support >> > > > > > >>>>>>> needs to >> > > > > > >>>>>>>>> be >> > > > > > >>>>>>>>>>>> added >> > > > > > >>>>>>>>>>>>>>> which >> > > > > > >>>>>>>>>>>>>>>>> can >> > > > > > >>>>>>>>>>>>>>>>>> be >> > > > > > >>>>>>>>>>>>>>>>>>>>> then >> > > > > > >>>>>>>>>>>>>>>>>>>>>> exposed on the SQL API. >> > > > > > >>>>>>>>>>>>>>>>>>>>>> This is definitely a future >> > > > > > >> improvement >> > > > > > >>>>>>> which >> > > > > > >>>>>>>> is >> > > > > > >>>>>>>>>>> useful >> > > > > > >>>>>>>>>>>>> and >> > > > > > >>>>>>>>>>>>>>> can >> > > > > > >>>>>>>>>>>>>>>>> be >> > > > > > >>>>>>>>>>>>>>>>>>>>> handled >> > > > > > >>>>>>>>>>>>>>>>>>>>>> in a separate jira. >> > > > > > >>>>>>>>>>>>>>>>>>>>>> >> > > > > > >>>>>>>>>>>>>>>>>>>>>> >> > > > > > >>>>>>>>>>>>>>>>>>>>>>> 2. Metadata Table vs. Metadata >> > > > > > >> Column >> > > > > > >>>>>>>>>>>>>>>>>>>>>>> The metadata information described >> > > > > > >> in >> > > > > > >>>>> the >> > > > > > >>>>>>>> FLIP >> > > > > > >>>>>>>>>>>> appears >> > > > > > >>>>>>>>>>>>> to >> > > > > > >>>>>>>>>>>>>>> be >> > > > > > >>>>>>>>>>>>>>>>>>> intended >> > > > > > >>>>>>>>>>>>>>>>>>>>> to >> > > > > > >>>>>>>>>>>>>>>>>>>>>>> describe the state files stored at >> > > > > > >> a >> > > > > > >>>>>>> specific >> > > > > > >>>>>>>>>>>> location. >> > > > > > >>>>>>>>>>>>>> To >> > > > > > >>>>>>>>>>>>>>>> me, >> > > > > > >>>>>>>>>>>>>>>>>> this >> > > > > > >>>>>>>>>>>>>>>>>>>>>> concept >> > > > > > >>>>>>>>>>>>>>>>>>>>>>> aligns more closely with system >> > > > > > >>>> tables >> > > > > > >>>>>>> like >> > > > > > >>>>>>>>>>> pg_tables >> > > > > > >>>>>>>>>>>>> in >> > > > > > >>>>>>>>>>>>>>>>>> PostgreSQL >> > > > > > >>>>>>>>>>>>>>>>>>>> [1] >> > > > > > >>>>>>>>>>>>>>>>>>>>>> or >> > > > > > >>>>>>>>>>>>>>>>>>>>>>> the INFORMATION_SCHEMA in MySQL >> > > > > > >> [2]. >> > > > > > >>>>>>>>>>>>>>>>>>>>>>> >> > > > > > >>>>>>>>>>>>>>>>>>>>>> >> > > > > > >>>>>>>>>>>>>>>>>>>>>> Adding a new connector with >> > > > > > >>>>>>>> `savepoint-metadata` >> > > > > > >>>>>>>>>> is a >> > > > > > >>>>>>>>>>>>>>>> possibility >> > > > > > >>>>>>>>>>>>>>>>>>> where >> > > > > > >>>>>>>>>>>>>>>>>>>>> we >> > > > > > >>>>>>>>>>>>>>>>>>>>>> can create such functionality. >> > > > > > >>>>>>>>>>>>>>>>>>>>>> I'm not against that, just want to >> > > > > > >>>> have a >> > > > > > >>>>>>>> common >> > > > > > >>>>>>>>>>>>> agreement >> > > > > > >>>>>>>>>>>>>>> that >> > > > > > >>>>>>>>>>>>>>>>> we >> > > > > > >>>>>>>>>>>>>>>>>>>> would >> > > > > > >>>>>>>>>>>>>>>>>>>>>> like to move that direction. >> > > > > > >>>>>>>>>>>>>>>>>>>>>> (As a side note not just PG but Spark >> > > > > > >>>> also >> > > > > > >>>>>>> has >> > > > > > >>>>>>>>>>> similar >> > > > > > >>>>>>>>>>>>>>> approach >> > > > > > >>>>>>>>>>>>>>>>>> and I >> > > > > > >>>>>>>>>>>>>>>>>>>>>> basically like the idea). >> > > > > > >>>>>>>>>>>>>>>>>>>>>> If we would go that direction >> > > > > > >> savepoint >> > > > > > >>>>>>>> metadata >> > > > > > >>>>>>>>>> can >> > > > > > >>>>>>>>>>> be >> > > > > > >>>>>>>>>>>>>>> reached >> > > > > > >>>>>>>>>>>>>>>>> in >> > > > > > >>>>>>>>>>>>>>>>>> a >> > > > > > >>>>>>>>>>>>>>>>>>>> way >> > > > > > >>>>>>>>>>>>>>>>>>>>>> that one row would represent >> > > > > > >>>>>>>>>>>>>>>>>>>>>> an operator with it's values >> > > > > > >> something >> > > > > > >>>>> like >> > > > > > >>>>>>>> this: >> > > > > > >>>>>>>>>>>>>>>>>>>>>> >> > > > > > >>>>>>>>>>>>>>>>>>>>>> >> > > > > > >>>>>>>>>>>>>>>>>>>>>> >> > > > > > >>>>>>>>>>>>>>>>>>>>> >> > > > > > >>>>>>>>>>>>>>>>>>>> >> > > > > > >>>>>>>>>>>>>>>>>>> >> > > > > > >>>>>>>>>>>>>>>>>> >> > > > > > >>>>>>>>>>>>>>>>> >> > > > > > >>>>>>>>>>>>>>>> >> > > > > > >>>>>>>>>>>>>>> >> > > > > > >>>>>>>>>>>>>> >> > > > > > >>>>>>>>>>>>> >> > > > > > >>>>>>>>>>>> >> > > > > > >>>>>>>>>>> >> > > > > > >>>>>>>>>> >> > > > > > >>>>>>>>> >> > > > > > >>>>>>>> >> > > > > > >>>>>>> >> > > > > > >>>>> >> > > > > > >>>> >> > > > > > >> >> > > > > > >> > > > > >> > > > >> > > >> > >> βββββββββββ¬ββββββββββ¬ββββββββββ¬ββββββββββ¬ββββββββββ¬ββββββββββ¬ββββββββββ¬βββββββββ >> > > > > > >>>>>>>>>>>>>>>>>>>>>> >> > > > > > >>>>>>>>>>>>>>>>>>>>>> >> > > > > > >>>>>>>>>>>>>>>>>>>>> >> > > > > > >>>>>>>>>>>>>>>>>>>> >> > > > > > >>>>>>>>>>>>>>>>>>> >> > > > > > >>>>>>>>>>>>>>>>>> >> > > > > > >>>>>>>>>>>>>>>>> >> > > > > > >>>>>>>>>>>>>>>> >> > > > > > >>>>>>>>>>>>>>> >> > > > > > >>>>>>>>>>>>>> >> > > > > > >>>>>>>>>>>>> >> > > > > > >>>>>>>>>>>> >> > > > > > >>>>>>>>>>> >> > > > > > >>>>>>>>>> >> > > > > > >>>>>>>>> >> > > > > > >>>>>>>> >> > > > > > >>>>>>> >> > > > > > >>>>> >> > > > > > >>>> >> > > > > > >> >> > > > > > >> > > > > >> > > > >> > > >> > >> βoperatorNβoperatorUβoperatorHβparalleliβmaxParallβsubtaskStβcoordinatβtotalStaβ >> > > > > > >>>>>>>>>>>>>>>>>>>>>> βame βid βash βsm >> > > > > > >>>>>>> βelism >> > > > > > >>>>>>>>>>>>>>>>>>>>>> βatesCountβorStateSiβtesSizeIβ >> > > > > > >>>>>>>>>>>>>>>>>>>>>> β β β β >> > > > > > >>>> β >> > > > > > >>>>>>>>> β >> > > > > > >>>>>>>>>>>>>>>>>>>>>> βzeInBytesβnBytes β >> > > > > > >>>>>>>>>>>>>>>>>>>>>> >> > > > > > >>>>>>>>>>>>>>>>>>>>>> >> > > > > > >>>>>>>>>>>>>>>>>>>>> >> > > > > > >>>>>>>>>>>>>>>>>>>> >> > > > > > >>>>>>>>>>>>>>>>>>> >> > > > > > >>>>>>>>>>>>>>>>>> >> > > > > > >>>>>>>>>>>>>>>>> >> > > > > > >>>>>>>>>>>>>>>> >> > > > > > >>>>>>>>>>>>>>> >> > > > > > >>>>>>>>>>>>>> >> > > > > > >>>>>>>>>>>>> >> > > > > > >>>>>>>>>>>> >> > > > > > >>>>>>>>>>> >> > > > > > >>>>>>>>>> >> > > > > > >>>>>>>>> >> > > > > > >>>>>>>> >> > > > > > >>>>>>> >> > > > > > >>>>> >> > > > > > >>>> >> > > > > > >> >> > > > > > >> > > > > >> > > > >> > > >> > >> βββββββββββΌββββββββββΌββββββββββΌββββββββββΌββββββββββΌββββββββββΌββββββββββΌβββββββββ€ >> > > > > > >>>>>>>>>>>>>>>>>>>>>> βSource: βdatagen-sβ47aee9439β2 >> > > > > > >>>>> β128 >> > > > > > >>>>>>>>>> β2 >> > > > > > >>>>>>>>>>>>>>> β16 >> > > > > > >>>>>>>>>>>>>>>>>>>>>> β546 β >> > > > > > >>>>>>>>>>>>>>>>>>>>>> βdatagen-sβource-uidβ4d6ea26e2β >> > > > > > >>>> β >> > > > > > >>>>>>>>> β >> > > > > > >>>>>>>>>>>>>> β >> > > > > > >>>>>>>>>>>>>>>>>>> β >> > > > > > >>>>>>>>>>>>>>>>>>>>>> β >> > > > > > >>>>>>>>>>>>>>>>>>>>>> βource β βd544bef0aβ >> > > > > > >>>> β >> > > > > > >>>>>>>>> β >> > > > > > >>>>>>>>>>>>>> β >> > > > > > >>>>>>>>>>>>>>>>>>> β >> > > > > > >>>>>>>>>>>>>>>>>>>>>> β >> > > > > > >>>>>>>>>>>>>>>>>>>>>> β β β37bb5 β >> > > > > > >>>> β >> > > > > > >>>>>>>>> β >> > > > > > >>>>>>>>>>>>>> β >> > > > > > >>>>>>>>>>>>>>>>>>> β >> > > > > > >>>>>>>>>>>>>>>>>>>>>> β >> > > > > > >>>>>>>>>>>>>>>>>>>>>> >> > > > > > >>>>>>>>>>>>>>>>>>>>>> >> > > > > > >>>>>>>>>>>>>>>>>>>>> >> > > > > > >>>>>>>>>>>>>>>>>>>> >> > > > > > >>>>>>>>>>>>>>>>>>> >> > > > > > >>>>>>>>>>>>>>>>>> >> > > > > > >>>>>>>>>>>>>>>>> >> > > > > > >>>>>>>>>>>>>>>> >> > > > > > >>>>>>>>>>>>>>> >> > > > > > >>>>>>>>>>>>>> >> > > > > > >>>>>>>>>>>>> >> > > > > > >>>>>>>>>>>> >> > > > > > >>>>>>>>>>> >> > > > > > >>>>>>>>>> >> > > > > > >>>>>>>>> >> > > > > > >>>>>>>> >> > > > > > >>>>>>> >> > > > > > >>>>> >> > > > > > >>>> >> > > > > > >> >> > > > > > >> > > > > >> > > > >> > > >> > >> βββββββββββΌββββββββββΌββββββββββΌββββββββββΌββββββββββΌββββββββββΌββββββββββΌβββββββββ€ >> > > > > > >>>>>>>>>>>>>>>>>>>>>> βlong-udf-βlong-udf-β6ed3f40bfβ2 >> > > > > > >>>>> β128 >> > > > > > >>>>>>>>>> β2 >> > > > > > >>>>>>>>>>>>>>> β0 >> > > > > > >>>>>>>>>>>>>>>>>>>> β0 >> > > > > > >>>>>>>>>>>>>>>>>>>>>> β >> > > > > > >>>>>>>>>>>>>>>>>>>>>> βwith-mastβwith-mastβf3c8dfcdfβ >> > > > > > >>>> β >> > > > > > >>>>>>>>> β >> > > > > > >>>>>>>>>>>>>> β >> > > > > > >>>>>>>>>>>>>>>>>>> β >> > > > > > >>>>>>>>>>>>>>>>>>>>>> β >> > > > > > >>>>>>>>>>>>>>>>>>>>>> βer-hook βer-hook-uβcb95128a1β >> > > > > > >>>> β >> > > > > > >>>>>>>>> β >> > > > > > >>>>>>>>>>>>>> β >> > > > > > >>>>>>>>>>>>>>>>>>> β >> > > > > > >>>>>>>>>>>>>>>>>>>>>> β >> > > > > > >>>>>>>>>>>>>>>>>>>>>> β βid β018f1 β >> > > > > > >>>> β >> > > > > > >>>>>>>>> β >> > > > > > >>>>>>>>>>>>>> β >> > > > > > >>>>>>>>>>>>>>>>>>> β >> > > > > > >>>>>>>>>>>>>>>>>>>>>> β >> > > > > > >>>>>>>>>>>>>>>>>>>>>> >> > > > > > >>>>>>>>>>>>>>>>>>>>>> >> > > > > > >>>>>>>>>>>>>>>>>>>>> >> > > > > > >>>>>>>>>>>>>>>>>>>> >> > > > > > >>>>>>>>>>>>>>>>>>> >> > > > > > >>>>>>>>>>>>>>>>>> >> > > > > > >>>>>>>>>>>>>>>>> >> > > > > > >>>>>>>>>>>>>>>> >> > > > > > >>>>>>>>>>>>>>> >> > > > > > >>>>>>>>>>>>>> >> > > > > > >>>>>>>>>>>>> >> > > > > > >>>>>>>>>>>> >> > > > > > >>>>>>>>>>> >> > > > > > >>>>>>>>>> >> > > > > > >>>>>>>>> >> > > > > > >>>>>>>> >> > > > > > >>>>>>> >> > > > > > >>>>> >> > > > > > >>>> >> > > > > > >> >> > > > > > >> > > > > >> > > > >> > > >> > >> βββββββββββΌββββββββββΌββββββββββΌββββββββββΌββββββββββΌββββββββββΌββββββββββΌβββββββββ€ >> > > > > > >>>>>>>>>>>>>>>>>>>>>> βvalue-proβvalue-proβca4f5fe9aβ2 >> > > > > > >>>>> β128 >> > > > > > >>>>>>>>>> β2 >> > > > > > >>>>>>>>>>>>>>> β0 >> > > > > > >>>>>>>>>>>>>>>>>>>>>> β40726 β >> > > > > > >>>>>>>>>>>>>>>>>>>>>> βcess βcess-uid β637b656f0β >> > > > > > >>>> β >> > > > > > >>>>>>>>> β >> > > > > > >>>>>>>>>>>>>> β >> > > > > > >>>>>>>>>>>>>>>>>>> β >> > > > > > >>>>>>>>>>>>>>>>>>>>>> β >> > > > > > >>>>>>>>>>>>>>>>>>>>>> β β β9ea78b3e7β >> > > > > > >>>> β >> > > > > > >>>>>>>>> β >> > > > > > >>>>>>>>>>>>>> β >> > > > > > >>>>>>>>>>>>>>>>>>> β >> > > > > > >>>>>>>>>>>>>>>>>>>>>> β >> > > > > > >>>>>>>>>>>>>>>>>>>>>> β β βa15b9 β >> > > > > > >>>> β >> > > > > > >>>>>>>>> β >> > > > > > >>>>>>>>>>>>>> β >> > > > > > >>>>>>>>>>>>>>>>>>> β >> > > > > > >>>>>>>>>>>>>>>>>>>>>> β >> > > > > > >>>>>>>>>>>>>>>>>>>>>> >> > > > > > >>>>>>>>>>>>>>>>>>>>>> >> > > > > > >>>>>>>>>>>>>>>>>>>>> >> > > > > > >>>>>>>>>>>>>>>>>>>> >> > > > > > >>>>>>>>>>>>>>>>>>> >> > > > > > >>>>>>>>>>>>>>>>>> >> > > > > > >>>>>>>>>>>>>>>>> >> > > > > > >>>>>>>>>>>>>>>> >> > > > > > >>>>>>>>>>>>>>> >> > > > > > >>>>>>>>>>>>>> >> > > > > > >>>>>>>>>>>>> >> > > > > > >>>>>>>>>>>> >> > > > > > >>>>>>>>>>> >> > > > > > >>>>>>>>>> >> > > > > > >>>>>>>>> >> > > > > > >>>>>>>> >> > > > > > >>>>>>> >> > > > > > >>>>> >> > > > > > >>>> >> > > > > > >> >> > > > > > >> > > > > >> > > > >> > > >> > >> βββββββββββΌββββββββββΌββββββββββΌββββββββββΌββββββββββΌββββββββββΌββββββββββΌβββββββββ€ >> > > > > > >>>>>>>>>>>>>>>>>>>>>> >> > > > > > >>>>>>>>>>>>>>>>>>>>>> This table can then be joined with >> > > > > > >> the >> > > > > > >>>>>>> actually >> > > > > > >>>>>>>>>>>> existing >> > > > > > >>>>>>>>>>>>>>>>>> `savepoint` >> > > > > > >>>>>>>>>>>>>>>>>>>>>> connector created tables based on UID >> > > > > > >>>> hash >> > > > > > >>>>>>>> (which >> > > > > > >>>>>>>>>> is >> > > > > > >>>>>>>>>>>>> unique >> > > > > > >>>>>>>>>>>>>>> and >> > > > > > >>>>>>>>>>>>>>>>>>> always >> > > > > > >>>>>>>>>>>>>>>>>>>>>> exists). >> > > > > > >>>>>>>>>>>>>>>>>>>>>> This would mean that the already >> > > > > > >>>> existing >> > > > > > >>>>>>> table >> > > > > > >>>>>>>>>> would >> > > > > > >>>>>>>>>>>>> need >> > > > > > >>>>>>>>>>>>>>>> only a >> > > > > > >>>>>>>>>>>>>>>>>>>> single >> > > > > > >>>>>>>>>>>>>>>>>>>>>> metadata column which is the UID >> > > > > > >> hash. >> > > > > > >>>>>>>>>>>>>>>>>>>>>> WDYT? >> > > > > > >>>>>>>>>>>>>>>>>>>>>> @zakelly, plz share your thoughts >> > > > > > >> too. >> > > > > > >>>>>>>>>>>>>>>>>>>>>> >> > > > > > >>>>>>>>>>>>>>>>>>>>>> >> > > > > > >>>>>>>>>>>>>>>>>>>>>>> If we opt to use metadata columns, >> > > > > > >>>> every >> > > > > > >>>>>>>> record >> > > > > > >>>>>>>>>> in >> > > > > > >>>>>>>>>>>> the >> > > > > > >>>>>>>>>>>>>>> table >> > > > > > >>>>>>>>>>>>>>>>>> would >> > > > > > >>>>>>>>>>>>>>>>>>>> end >> > > > > > >>>>>>>>>>>>>>>>>>>>> up >> > > > > > >>>>>>>>>>>>>>>>>>>>>>> having identical values for these >> > > > > > >>>>> columns >> > > > > > >>>>>>>>> (please >> > > > > > >>>>>>>>>>>>> correct >> > > > > > >>>>>>>>>>>>>>> me >> > > > > > >>>>>>>>>>>>>>>> if >> > > > > > >>>>>>>>>>>>>>>>>> Iβm >> > > > > > >>>>>>>>>>>>>>>>>>>>>>> mistaken). On the other hand, the >> > > > > > >>>> state >> > > > > > >>>>>>>>> connector >> > > > > > >>>>>>>>>>>>>> requires >> > > > > > >>>>>>>>>>>>>>>>> users >> > > > > > >>>>>>>>>>>>>>>>>> to >> > > > > > >>>>>>>>>>>>>>>>>>>>>> specify >> > > > > > >>>>>>>>>>>>>>>>>>>>>>> an operator UID or operator UID >> > > > > > >> hash, >> > > > > > >>>>>>> after >> > > > > > >>>>>>>>> which >> > > > > > >>>>>>>>>>> it >> > > > > > >>>>>>>>>>>>>>> outputs >> > > > > > >>>>>>>>>>>>>>>>>>>>> user-defined >> > > > > > >>>>>>>>>>>>>>>>>>>>>>> values in its records. This >> > > > > > >> approach >> > > > > > >>>>> feels >> > > > > > >>>>>>>>>> somewhat >> > > > > > >>>>>>>>>>>>>>> redundant >> > > > > > >>>>>>>>>>>>>>>>> to >> > > > > > >>>>>>>>>>>>>>>>>>> me. >> > > > > > >>>>>>>>>>>>>>>>>>>>>>> >> > > > > > >>>>>>>>>>>>>>>>>>>>>> >> > > > > > >>>>>>>>>>>>>>>>>>>>>> If we would add a new >> > > > > > >>>> `savepoint-metadata` >> > > > > > >>>>>>>>>> connector >> > > > > > >>>>>>>>>>>> then >> > > > > > >>>>>>>>>>>>>>> this >> > > > > > >>>>>>>>>>>>>>>>> can >> > > > > > >>>>>>>>>>>>>>>>>> be >> > > > > > >>>>>>>>>>>>>>>>>>>>>> addressed. >> > > > > > >>>>>>>>>>>>>>>>>>>>>> On the other hand UID and UID hash >> > > > > > >> are >> > > > > > >>>>>>> having >> > > > > > >>>>>>>>>>> either-or >> > > > > > >>>>>>>>>>>>>>>>>> relationship >> > > > > > >>>>>>>>>>>>>>>>>>>> from >> > > > > > >>>>>>>>>>>>>>>>>>>>>> config perspective, >> > > > > > >>>>>>>>>>>>>>>>>>>>>> so when a user provides the UID then >> > > > > > >>>>> he/she >> > > > > > >>>>>>> can >> > > > > > >>>>>>>>> be >> > > > > > >>>>>>>>>>>>>> interested >> > > > > > >>>>>>>>>>>>>>>> in >> > > > > > >>>>>>>>>>>>>>>>>> the >> > > > > > >>>>>>>>>>>>>>>>>>>> hash >> > > > > > >>>>>>>>>>>>>>>>>>>>>> for further calculations >> > > > > > >>>>>>>>>>>>>>>>>>>>>> (the whole Flink internals are >> > > > > > >>>> depending >> > > > > > >>>>> on >> > > > > > >>>>>>> the >> > > > > > >>>>>>>>>>> hash). >> > > > > > >>>>>>>>>>>>>>> Printing >> > > > > > >>>>>>>>>>>>>>>>> out >> > > > > > >>>>>>>>>>>>>>>>>>> the >> > > > > > >>>>>>>>>>>>>>>>>>>>>> human readable UID >> > > > > > >>>>>>>>>>>>>>>>>>>>>> is an explicit requirement from the >> > > > > > >>>> user >> > > > > > >>>>>>> side >> > > > > > >>>>>>>>>> because >> > > > > > >>>>>>>>>>>>>> hashes >> > > > > > >>>>>>>>>>>>>>>> are >> > > > > > >>>>>>>>>>>>>>>>>> not >> > > > > > >>>>>>>>>>>>>>>>>>>>> human >> > > > > > >>>>>>>>>>>>>>>>>>>>>> readable. >> > > > > > >>>>>>>>>>>>>>>>>>>>>> >> > > > > > >>>>>>>>>>>>>>>>>>>>>> >> > > > > > >>>>>>>>>>>>>>>>>>>>>>> 3. Handling LIST and MAP States in >> > > > > > >>>> the >> > > > > > >>>>>>> State >> > > > > > >>>>>>>>>>>> Connector >> > > > > > >>>>>>>>>>>>>>>>>>>>>>> I have concerns about how the >> > > > > > >> current >> > > > > > >>>>>>> design >> > > > > > >>>>>>>>>>> handles >> > > > > > >>>>>>>>>>>>> LIST >> > > > > > >>>>>>>>>>>>>>> and >> > > > > > >>>>>>>>>>>>>>>>> MAP >> > > > > > >>>>>>>>>>>>>>>>>>>>> states. >> > > > > > >>>>>>>>>>>>>>>>>>>>>>> Specifically, the state connector >> > > > > > >>>> uses >> > > > > > >>>>>>> Flink >> > > > > > >>>>>>>>>> SQLβs >> > > > > > >>>>>>>>>>>> MAP >> > > > > > >>>>>>>>>>>>>> and >> > > > > > >>>>>>>>>>>>>>>>> ARRAY >> > > > > > >>>>>>>>>>>>>>>>>>>> types, >> > > > > > >>>>>>>>>>>>>>>>>>>>>>> which implies that it attempts to >> > > > > > >>>> load >> > > > > > >>>>>>> entire >> > > > > > >>>>>>>>> MAP >> > > > > > >>>>>>>>>>> or >> > > > > > >>>>>>>>>>>>> LIST >> > > > > > >>>>>>>>>>>>>>>>> states >> > > > > > >>>>>>>>>>>>>>>>>>> into >> > > > > > >>>>>>>>>>>>>>>>>>>>>>> memory. >> > > > > > >>>>>>>>>>>>>>>>>>>>>>> >> > > > > > >>>>>>>>>>>>>>>>>>>>>>> However, in many real-world >> > > > > > >>>> scenarios, >> > > > > > >>>>>>> these >> > > > > > >>>>>>>>>> states >> > > > > > >>>>>>>>>>>> can >> > > > > > >>>>>>>>>>>>>>> grow >> > > > > > >>>>>>>>>>>>>>>>> very >> > > > > > >>>>>>>>>>>>>>>>>>>>> large. >> > > > > > >>>>>>>>>>>>>>>>>>>>>>> Typically, the state API addresses >> > > > > > >>>> this >> > > > > > >>>>> by >> > > > > > >>>>>>>>>>> providing >> > > > > > >>>>>>>>>>>> an >> > > > > > >>>>>>>>>>>>>>>>> iterator >> > > > > > >>>>>>>>>>>>>>>>>> to >> > > > > > >>>>>>>>>>>>>>>>>>>>>>> traverse elements within the state >> > > > > > >>>>>>>>> incrementally. >> > > > > > >>>>>>>>>>> Iβm >> > > > > > >>>>>>>>>>>>>>> unsure >> > > > > > >>>>>>>>>>>>>>>>>>> whether >> > > > > > >>>>>>>>>>>>>>>>>>>>> Iβve >> > > > > > >>>>>>>>>>>>>>>>>>>>>>> missed something in FLIP-496 or >> > > > > > >>>>> FLIP-512, >> > > > > > >>>>>>> but >> > > > > > >>>>>>>>> it >> > > > > > >>>>>>>>>>>> seems >> > > > > > >>>>>>>>>>>>>> that >> > > > > > >>>>>>>>>>>>>>>> the >> > > > > > >>>>>>>>>>>>>>>>>>>> current >> > > > > > >>>>>>>>>>>>>>>>>>>>>>> design might struggle with >> > > > > > >>>> scalability >> > > > > > >>>>> in >> > > > > > >>>>>>>> such >> > > > > > >>>>>>>>>>> cases. >> > > > > > >>>>>>>>>>>>>>>>>>>>>>> >> > > > > > >>>>>>>>>>>>>>>>>>>>>> >> > > > > > >>>>>>>>>>>>>>>>>>>>>> You see it good, the current >> > > > > > >>>>> implementation >> > > > > > >>>>>>>> keeps >> > > > > > >>>>>>>>>>> state >> > > > > > >>>>>>>>>>>>>> for a >> > > > > > >>>>>>>>>>>>>>>>>> single >> > > > > > >>>>>>>>>>>>>>>>>>>> key >> > > > > > >>>>>>>>>>>>>>>>>>>>> in >> > > > > > >>>>>>>>>>>>>>>>>>>>>> memory. >> > > > > > >>>>>>>>>>>>>>>>>>>>>> Back in the days we've considered >> > > > > > >> this >> > > > > > >>>>>>>> potential >> > > > > > >>>>>>>>>>> issue >> > > > > > >>>>>>>>>>>>> and >> > > > > > >>>>>>>>>>>>>>>>>> concluded >> > > > > > >>>>>>>>>>>>>>>>>>>> that >> > > > > > >>>>>>>>>>>>>>>>>>>>>> this is not necessarily >> > > > > > >>>>>>>>>>>>>>>>>>>>>> needed for the initial version and >> > > > > > >> can >> > > > > > >>>> be >> > > > > > >>>>>>> done >> > > > > > >>>>>>>>> as a >> > > > > > >>>>>>>>>>>> later >> > > > > > >>>>>>>>>>>>>>>>>>> improvement. >> > > > > > >>>>>>>>>>>>>>>>>>>>>> >> > > > > > >>>>>>>>>>>>>>>>>>>>>> Up until now we've seen even in TB >> > > > > > >>>>>>> savepoints >> > > > > > >>>>>>>>> that >> > > > > > >>>>>>>>>>> the >> > > > > > >>>>>>>>>>>>>> number >> > > > > > >>>>>>>>>>>>>>>> of >> > > > > > >>>>>>>>>>>>>>>>>> keys >> > > > > > >>>>>>>>>>>>>>>>>>>> can >> > > > > > >>>>>>>>>>>>>>>>>>>>>> be extremely huge but not the per key >> > > > > > >>>>> state >> > > > > > >>>>>>>>> itself. >> > > > > > >>>>>>>>>>>>>>>>>>>>>> But again, this is a good feature >> > > > > > >> as-is >> > > > > > >>>>> and >> > > > > > >>>>>>> can >> > > > > > >>>>>>>>> be >> > > > > > >>>>>>>>>>>>> handled >> > > > > > >>>>>>>>>>>>>>> in a >> > > > > > >>>>>>>>>>>>>>>>>>>> separate >> > > > > > >>>>>>>>>>>>>>>>>>>>>> jira. >> > > > > > >>>>>>>>>>>>>>>>>>>>>> >> > > > > > >>>>>>>>>>>>>>>>>>>>>> >> > > > > > >>>>>>>>>>>>>>>>>>>>>>> >> > > > > > >>>>>>>>>>>>>>>>>>>>>>> Best, >> > > > > > >>>>>>>>>>>>>>>>>>>>>>> Shengkai >> > > > > > >>>>>>>>>>>>>>>>>>>>>>> >> > > > > > >>>>>>>>>>>>>>>>>>>>>>> [1] >> > > > > > >>>>>>>>>>>>>>>>> >> > > > > > >>>>>>>>> >> > > https://www.postgresql.org/docs/current/view-pg-tables.html >> > > > > > >>>>>>>>>>>>>>>>>>>>>>> [2] >> > > > > > >>>>>>>>>>>>>>>>>>>>>>> >> > > > > > >>>>>>>>>>>>>>>>>>>>>>> >> > > > > > >>>>>>>>>>>>>>>>>>>>>> >> > > > > > >>>>>>>>>>>>>>>>>>>>> >> > > > > > >>>>>>>>>>>>>>>>>>>> >> > > > > > >>>>>>>>>>>>>>>>>>> >> > > > > > >>>>>>>>>>>>>>>>>> >> > > > > > >>>>>>>>>>>>>>>>> >> > > > > > >>>>>>>>>>>>>>>> >> > > > > > >>>>>>>>>>>>>>> >> > > > > > >>>>>>>>>>>>>> >> > > > > > >>>>>>>>>>>>> >> > > > > > >>>>>>>>>>>> >> > > > > > >>>>>>>>>>> >> > > > > > >>>>>>>>>> >> > > > > > >>>>>>>>> >> > > > > > >>>>>>>> >> > > > > > >>>>>>> >> > > > > > >>>>> >> > > > > > >>>> >> > > > > > >> >> > > > > > >> > > > > >> > > > >> > > >> > >> https://dev.mysql.com/doc/refman/8.4/en/information-schema-tables-table.html >> > > > > > >>>>>>>>>>>>>>>>>>>>>>> >> > > > > > >>>>>>>>>>>>>>>>>>>>>>> Gabor Somogyi < >> > > > > > >>>>> gabor.g.somo...@gmail.com> >> > > > > > >>>>>>>>>>>> δΊ2025εΉ΄3ζ3ζ₯ε¨δΈ >> > > > > > >>>>>>>>>>>>>>>>> 02:00ειοΌ >> > > > > > >>>>>>>>>>>>>>>>>>>>>>> >> > > > > > >>>>>>>>>>>>>>>>>>>>>>>> Hi Zakelly, >> > > > > > >>>>>>>>>>>>>>>>>>>>>>>> >> > > > > > >>>>>>>>>>>>>>>>>>>>>>>> In order to shoot for simplicity >> > > > > > >>>>>>> `METADATA >> > > > > > >>>>>>>>>>> VIRTUAL` >> > > > > > >>>>>>>>>>>>> as >> > > > > > >>>>>>>>>>>>>>> key >> > > > > > >>>>>>>>>>>>>>>>>> words >> > > > > > >>>>>>>>>>>>>>>>>>>> for >> > > > > > >>>>>>>>>>>>>>>>>>>>>>>> definition is the target. >> > > > > > >>>>>>>>>>>>>>>>>>>>>>>> When it's not super complex the >> > > > > > >>>> latter >> > > > > > >>>>>>> can >> > > > > > >>>>>>>> be >> > > > > > >>>>>>>>>>> added >> > > > > > >>>>>>>>>>>>>> too. >> > > > > > >>>>>>>>>>>>>>>>>>>>>>>> >> > > > > > >>>>>>>>>>>>>>>>>>>>>>>> BR, >> > > > > > >>>>>>>>>>>>>>>>>>>>>>>> G >> > > > > > >>>>>>>>>>>>>>>>>>>>>>>> >> > > > > > >>>>>>>>>>>>>>>>>>>>>>>> >> > > > > > >>>>>>>>>>>>>>>>>>>>>>>> On Sun, Mar 2, 2025 at 3:37β―PM >> > > > > > >>>> Zakelly >> > > > > > >>>>>>> Lan >> > > > > > >>>>>>>> < >> > > > > > >>>>>>>>>>>>>>>>>>> zakelly....@gmail.com> >> > > > > > >>>>>>>>>>>>>>>>>>>>>>> wrote: >> > > > > > >>>>>>>>>>>>>>>>>>>>>>>> >> > > > > > >>>>>>>>>>>>>>>>>>>>>>>>> Hi Gabor, >> > > > > > >>>>>>>>>>>>>>>>>>>>>>>>> >> > > > > > >>>>>>>>>>>>>>>>>>>>>>>>> +1 for this. >> > > > > > >>>>>>>>>>>>>>>>>>>>>>>>> >> > > > > > >>>>>>>>>>>>>>>>>>>>>>>>> Will the metadata column use >> > > > > > >>>>> `METADATA >> > > > > > >>>>>>>>>> VIRTUAL` >> > > > > > >>>>>>>>>>>> as >> > > > > > >>>>>>>>>>>>>> key >> > > > > > >>>>>>>>>>>>>>>>> words >> > > > > > >>>>>>>>>>>>>>>>>>> for >> > > > > > >>>>>>>>>>>>>>>>>>>>>>>>> definition, or `METADATA FROM >> > > > > > >> xxx >> > > > > > >>>>>>>> VIRTUAL` >> > > > > > >>>>>>>>>> for >> > > > > > >>>>>>>>>>>>>>> renaming, >> > > > > > >>>>>>>>>>>>>>>>> just >> > > > > > >>>>>>>>>>>>>>>>>>>> like >> > > > > > >>>>>>>>>>>>>>>>>>>>>> the >> > > > > > >>>>>>>>>>>>>>>>>>>>>>>>> Kafka table? >> > > > > > >>>>>>>>>>>>>>>>>>>>>>>>> >> > > > > > >>>>>>>>>>>>>>>>>>>>>>>>> >> > > > > > >>>>>>>>>>>>>>>>>>>>>>>>> Best, >> > > > > > >>>>>>>>>>>>>>>>>>>>>>>>> Zakelly >> > > > > > >>>>>>>>>>>>>>>>>>>>>>>>> >> > > > > > >>>>>>>>>>>>>>>>>>>>>>>>> On Sat, Mar 1, 2025 at 1:31β―PM >> > > > > > >>>> Gabor >> > > > > > >>>>>>>>> Somogyi >> > > > > > >>>>>>>>>> < >> > > > > > >>>>>>>>>>>>>>>>>>>>>>> gabor.g.somo...@gmail.com> >> > > > > > >>>>>>>>>>>>>>>>>>>>>>>>> wrote: >> > > > > > >>>>>>>>>>>>>>>>>>>>>>>>> >> > > > > > >>>>>>>>>>>>>>>>>>>>>>>>>> Hi All, >> > > > > > >>>>>>>>>>>>>>>>>>>>>>>>>> >> > > > > > >>>>>>>>>>>>>>>>>>>>>>>>>> I'd like to start a >> > > > > > >> discussion >> > > > > > >>>> of >> > > > > > >>>>>>>>> FLIP-512: >> > > > > > >>>>>>>>>>> Add >> > > > > > >>>>>>>>>>>>>> meta >> > > > > > >>>>>>>>>>>>>>>>>>>> information >> > > > > > >>>>>>>>>>>>>>>>>>>>> to >> > > > > > >>>>>>>>>>>>>>>>>>>>>>> SQL >> > > > > > >>>>>>>>>>>>>>>>>>>>>>>>>> state connector [1]. >> > > > > > >>>>>>>>>>>>>>>>>>>>>>>>>> Feel free to add your >> > > > > > >> thoughts >> > > > > > >>>> to >> > > > > > >>>>>>> make >> > > > > > >>>>>>>>> this >> > > > > > >>>>>>>>>>>>> feature >> > > > > > >>>>>>>>>>>>>>>>> better. >> > > > > > >>>>>>>>>>>>>>>>>>>>>>>>>> >> > > > > > >>>>>>>>>>>>>>>>>>>>>>>>>> [1] >> > > > > > >>>>>>>>>>>>>>>>>>>>>>>>>> >> > > > > > >>>>>>>>>>>>>>>>>>>>>>>>>> >> > > > > > >>>>>>>>>>>>>>>>>>>>>>>>> >> > > > > > >>>>>>>>>>>>>>>>>>>>>>>> >> > > > > > >>>>>>>>>>>>>>>>>>>>>>> >> > > > > > >>>>>>>>>>>>>>>>>>>>>> >> > > > > > >>>>>>>>>>>>>>>>>>>>> >> > > > > > >>>>>>>>>>>>>>>>>>>> >> > > > > > >>>>>>>>>>>>>>>>>>> >> > > > > > >>>>>>>>>>>>>>>>>> >> > > > > > >>>>>>>>>>>>>>>>> >> > > > > > >>>>>>>>>>>>>>>> >> > > > > > >>>>>>>>>>>>>>> >> > > > > > >>>>>>>>>>>>>> >> > > > > > >>>>>>>>>>>>> >> > > > > > >>>>>>>>>>>> >> > > > > > >>>>>>>>>>> >> > > > > > >>>>>>>>>> >> > > > > > >>>>>>>>> >> > > > > > >>>>>>>> >> > > > > > >>>>>>> >> > > > > > >>>>> >> > > > > > >>>> >> > > > > > >> >> > > > > > >> > > > > >> > > > >> > > >> > >> https://cwiki.apache.org/confluence/display/FLINK/FLIP-512%3A+Add+meta+information+to+SQL+state+connector >> > > > > > >>>>>>>>>>>>>>>>>>>>>>>>>> >> > > > > > >>>>>>>>>>>>>>>>>>>>>>>>>> BR, >> > > > > > >>>>>>>>>>>>>>>>>>>>>>>>>> G >> > > > > > >>>>>>>>>>>>>>>>>>>>>>>>>> >> > > > > > >>>>>>>>>>>>>>>>>>>>>>>>> >> > > > > > >>>>>>>>>>>>>>>>>>>>>>>> >> > > > > > >>>>>>>>>>>>>>>>>>>>>>> >> > > > > > >>>>>>>>>>>>>>>>>>>>>> >> > > > > > >>>>>>>>>>>>>>>>>>>>> >> > > > > > >>>>>>>>>>>>>>>>>>>> >> > > > > > >>>>>>>>>>>>>>>>>>> >> > > > > > >>>>>>>>>>>>>>>>>> >> > > > > > >>>>>>>>>>>>>>>>> >> > > > > > >>>>>>>>>>>>>>>> >> > > > > > >>>>>>>>>>>>>>> >> > > > > > >>>>>>>>>>>>>> >> > > > > > >>>>>>>>>>>>> >> > > > > > >>>>>>>>>>>> >> > > > > > >>>>>>>>>>> >> > > > > > >>>>>>>>>> >> > > > > > >>>>>>>>> >> > > > > > >>>>>>>> >> > > > > > >>>>>>> >> > > > > > >>>>>> >> > > > > > >>>>> >> > > > > > >>>> >> > > > > > >>> >> > > > > > >> >> > > > > > >> > > > > > >> > > > > >> > > > >> > > >> > >> >