One more question about the FLIP. I think the output schema is definitely a public API to users. If users use the `CREATE FUNCTION` statement, is it means the class path is also a public API to users. Alternatively, this is merely an experimental feature and we don't have any promise about this function.
Best, Shengkai Shengkai Fang <fskm...@gmail.com> 于2025年3月28日周五 10:20写道: > +1 to use PTF. > > I would like to raise a consideration regarding the usage implementation: > Would it be necessary to allow users to utilize the CREATE FUNCTION > statement for registering the PTF? > > Currently, Flink SQL supports letting external systems register modules > and leverage these modules to centrally manage all function definitions. > Given this architectural approach, I’m curious if the plan involves > introducing additional functions in the future. If so, I would advocate for > introducing a dedicated state module to centralize such management. This > would empower users to: > > 1. Simply execute the LOAD MODULE command to load the required module, and > 2. Directly invoke read_metadata thereafter. > > For more details about the module, please refer to this document[1]. > > Best, > Shengkai > > [1] > https://nightlies.apache.org/flink/flink-docs-master/docs/dev/table/modules/ > > Gabor Somogyi <gabor.g.somo...@gmail.com> 于2025年3月28日周五 00:26写道: > >> Just found out that PTF in batch mode is not supported, plz see the dev >> mailing about it [1]. >> >> [1] https://lists.apache.org/thread/ytm9m1qt4pq2q2gjngfktrn8vrlvkf07 >> >> BR, >> G >> >> >> On Thu, Mar 27, 2025 at 3:38 PM Gabor Somogyi <gabor.g.somo...@gmail.com> >> wrote: >> >> > In the meantime I've just updated the FLIP according to this to be >> > optimistic 🙂 >> > >> > BR, >> > G >> > >> > On Thu, Mar 27, 2025 at 2:15 PM Gabor Somogyi < >> gabor.g.somo...@gmail.com> >> > wrote: >> > >> >> Considering all the facts I also +1 on PTF. Even if something is >> missing >> >> we can add later. >> >> >> >> @Zakelly Lan <zakelly....@gmail.com> @Shengkai Fang are you also on >> the >> >> same page or have something to add? >> >> >> >> BR, >> >> G >> >> >> >> >> >> On Thu, Mar 27, 2025 at 1:50 PM Lincoln Lee <lincoln.8...@gmail.com> >> >> wrote: >> >> >> >>> +1 for PTF >> >>> >> >>> > Is it possible to describe such function to see the column >> names/types? >> >>> >> >>> Although Flink SQL does not directly support this feature, users can >> >>> achieve >> >>> similar results with the help of `explain` syntax, e.g. >> >>> 'explain select * from read_state_metadata(...)' >> >>> >> >>> >> >>> Best, >> >>> Lincoln Lee >> >>> >> >>> >> >>> Gyula Fóra <gyula.f...@gmail.com> 于2025年3月27日周四 20:41写道: >> >>> >> >>> > Hey! >> >>> > >> >>> > I think the PTF approach strikes a great balance in simplicity and >> the >> >>> > capabilities that we get out of it. >> >>> > >> >>> > I think this could be a completely viable alternative to the >> dedicated >> >>> > connector, +1. >> >>> > >> >>> > Cheers, >> >>> > Gyula >> >>> > >> >>> > On Thu, Mar 27, 2025 at 10:37 AM Shengkai Fang <fskm...@gmail.com> >> >>> wrote: >> >>> > >> >>> > > Hi, Gabor. >> >>> > > >> >>> > > > Do I understand correctly that this is 2.x only feature and we >> >>> can't >> >>> > > backport it to 1.x line >> >>> > > >> >>> > > Yes. PTF is only supported in 2.x verison. >> >>> > > >> >>> > > > Is it possible to describe such function to see the column >> >>> names/types? >> >>> > > >> >>> > > Flink SQL doesn't support this feature, but postgres[2] or >> mysql[1] >> >>> has >> >>> > > similar feature. >> >>> > > >> >>> > > [1] >> >>> https://dev.mysql.com/doc/refman/8.4/en/show-create-procedure.html >> >>> > > [2] >> >>> > > >> >>> > > >> >>> > >> >>> >> https://stackoverflow.com/questions/6898453/show-the-code-of-a-function-procedure-and-trigger-in-postgresql >> >>> > > >> >>> > > Best, >> >>> > > Shengkai >> >>> > > >> >>> > > >> >>> > > Gabor Somogyi <gabor.g.somo...@gmail.com> 于2025年3月27日周四 16:25写道: >> >>> > > >> >>> > > > Hi Shengkai, >> >>> > > > >> >>> > > > Thanks for your effort with the example, this looks promising. >> >>> > > > I like the fact that users wouldn't need to sweat with complex >> >>> create >> >>> > > table >> >>> > > > statements. >> >>> > > > >> >>> > > > Couple of questions: >> >>> > > > * Do I understand correctly that this is 2.x only feature and we >> >>> can't >> >>> > > > backport it to 1.x line? >> >>> > > > I'm not intended to do any backport, just would like to know the >> >>> > > technical >> >>> > > > constraints. >> >>> > > > * Is it possible to describe such function to see the column >> >>> > names/types? >> >>> > > > >> >>> > > > BR, >> >>> > > > G >> >>> > > > >> >>> > > > >> >>> > > > On Thu, Mar 27, 2025 at 3:17 AM Shengkai Fang < >> fskm...@gmail.com> >> >>> > wrote: >> >>> > > > >> >>> > > > > Many thanks for your reminder, Leonard. Here's the link I >> >>> > mentioned[1]. >> >>> > > > > >> >>> > > > > Best, >> >>> > > > > Shengkai >> >>> > > > > >> >>> > > > > [1] https://github.com/apache/flink/pull/26358 >> >>> > > > > >> >>> > > > > Leonard Xu <xbjt...@gmail.com> 于2025年3月27日周四 10:05写道: >> >>> > > > > >> >>> > > > > > Your link is broken, Shengkai >> >>> > > > > > >> >>> > > > > > Best, >> >>> > > > > > Leonard >> >>> > > > > > >> >>> > > > > > > 2025年3月27日 10:01,Shengkai Fang <fskm...@gmail.com> 写道: >> >>> > > > > > > >> >>> > > > > > > Hi, All. >> >>> > > > > > > >> >>> > > > > > > I write a simple demo to illustrate my idea. Hope this >> helps. >> >>> > > > > > > >> >>> > > > > > > Best, >> >>> > > > > > > Shengkai >> >>> > > > > > > >> >>> > > > > > > >> >>> > > > > > >> >>> > > > > >> >>> > > > >> >>> > > >> >>> > >> >>> >> https://github.com/apache/flink/compare/master...fsk119:flink:example?expand=1 >> >>> > > > > > > >> >>> > > > > > > Gabor Somogyi <gabor.g.somo...@gmail.com> 于2025年3月26日周三 >> >>> 15:54写道: >> >>> > > > > > > >> >>> > > > > > >>> I'm fine with a seperate SQL connector for metadata, so >> >>> maybe >> >>> > we >> >>> > > > > could >> >>> > > > > > >> update the FLIP about our discussion? >> >>> > > > > > >> >> >>> > > > > > >> Sorry, I've forgotten this part. Yeah, no matter we >> choose >> >>> I'm >> >>> > > going >> >>> > > > > to >> >>> > > > > > >> update the FLIP. >> >>> > > > > > >> >> >>> > > > > > >> G >> >>> > > > > > >> >> >>> > > > > > >> >> >>> > > > > > >> On Wed, Mar 26, 2025 at 8:51 AM Gabor Somogyi < >> >>> > > > > > gabor.g.somo...@gmail.com> >> >>> > > > > > >> wrote: >> >>> > > > > > >> >> >>> > > > > > >>> Hi All, >> >>> > > > > > >>> >> >>> > > > > > >>> I've also lack of the knowledge of PTF so I've read just >> >>> the >> >>> > > > > motivation >> >>> > > > > > >>> part: >> >>> > > > > > >>> >> >>> > > > > > >>> "The SQL 2016 standard introduced a way of defining >> custom >> >>> SQL >> >>> > > > > > operators >> >>> > > > > > >>> defined by ISO/IEC 19075-7:2021 (Part 7: Polymorphic >> table >> >>> > > > > functions). >> >>> > > > > > >>> ~200 pages define how this new kind of function can >> >>> consume and >> >>> > > > > produce >> >>> > > > > > >>> tables with various execution properties. >> >>> > > > > > >>> Unfortunately, this part of the standard is not publicly >> >>> > > > available." >> >>> > > > > > >>> >> >>> > > > > > >>> Of course we can take a look at some examples but do we >> >>> really >> >>> > > want >> >>> > > > > to >> >>> > > > > > >>> expose state data with this construct >> >>> > > > > > >>> which is described in ~200 pages and part of the >> standard >> >>> is >> >>> > not >> >>> > > > > > publicly >> >>> > > > > > >>> available? 🙂 >> >>> > > > > > >>> I mean the dataset is couple of rows and the use-case is >> >>> join >> >>> > > with >> >>> > > > > > >> another >> >>> > > > > > >>> table like with state data. >> >>> > > > > > >>> If somebody can give advantages I would buy that but >> from >> >>> my >> >>> > > > limited >> >>> > > > > > >>> understanding this would be an overkill here. >> >>> > > > > > >>> >> >>> > > > > > >>> BR, >> >>> > > > > > >>> G >> >>> > > > > > >>> >> >>> > > > > > >>> >> >>> > > > > > >>> On Wed, Mar 26, 2025 at 8:28 AM Gyula Fóra < >> >>> > gyula.f...@gmail.com >> >>> > > > >> >>> > > > > > wrote: >> >>> > > > > > >>> >> >>> > > > > > >>>> Hi Zakelly , Shengkai! >> >>> > > > > > >>>> >> >>> > > > > > >>>> I don't know too much about PTFs, it would be >> interesting >> >>> to >> >>> > see >> >>> > > > how >> >>> > > > > > the >> >>> > > > > > >>>> usage would look in practice. >> >>> > > > > > >>>> >> >>> > > > > > >>>> Do you have some mockup/example in mind how the PTF >> would >> >>> look >> >>> > > for >> >>> > > > > > >> example >> >>> > > > > > >>>> when want to: >> >>> > > > > > >>>> - Simply display/aggregate whats in the metadata >> >>> > > > > > >>>> - Join keyed state with some metadata columns >> >>> > > > > > >>>> >> >>> > > > > > >>>> Thanks >> >>> > > > > > >>>> Gyula >> >>> > > > > > >>>> >> >>> > > > > > >>>> On Wed, Mar 26, 2025 at 7:33 AM Zakelly Lan < >> >>> > > > zakelly....@gmail.com> >> >>> > > > > > >>>> wrote: >> >>> > > > > > >>>> >> >>> > > > > > >>>>> Hi everyone, >> >>> > > > > > >>>>> >> >>> > > > > > >>>>> I'm fine with a seperate SQL connector for metadata, >> so >> >>> maybe >> >>> > > we >> >>> > > > > > could >> >>> > > > > > >>>>> update the FLIP about our discussion? And Shengkai >> >>> provides a >> >>> > > PTF >> >>> > > > > > >>>>> implementation, does that also meet the requirement? >> >>> > > > > > >>>>> >> >>> > > > > > >>>>> >> >>> > > > > > >>>>> Best, >> >>> > > > > > >>>>> Zakelly >> >>> > > > > > >>>>> >> >>> > > > > > >>>>> On Thu, Mar 20, 2025 at 4:47 PM Gabor Somogyi < >> >>> > > > > > >>>> gabor.g.somo...@gmail.com> >> >>> > > > > > >>>>> wrote: >> >>> > > > > > >>>>> >> >>> > > > > > >>>>>> Hi All, >> >>> > > > > > >>>>>> >> >>> > > > > > >>>>>> @Zakelly: Gyula summarised it correctly what I meant >> so >> >>> > please >> >>> > > > > treat >> >>> > > > > > >>>> the >> >>> > > > > > >>>>>> content as mine. >> >>> > > > > > >>>>>> As an addition I'm not against to add CLI at all, I'm >> >>> just >> >>> > > > stating >> >>> > > > > > >>>> that >> >>> > > > > > >>>>> in >> >>> > > > > > >>>>>> some cases like this, users would like to have >> >>> > > > > > >>>>>> a self-serving solution where they can provide SQL >> >>> > statements >> >>> > > > > which >> >>> > > > > > >>>> can >> >>> > > > > > >>>>>> trigger alerts automatically. >> >>> > > > > > >>>>>> >> >>> > > > > > >>>>>> My personal opinion is that CLI would be beneficial >> for >> >>> > > several >> >>> > > > > > >>>> cases. A >> >>> > > > > > >>>>>> good example is when users want to restart job >> >>> > > > > > >>>>>> from specific Kafka offsets which are persisted in a >> >>> > > savepoint. >> >>> > > > > For >> >>> > > > > > >>>> such >> >>> > > > > > >>>>>> scenario users are more than happy since they >> >>> > > > > > >>>>>> expect manual intervention with full control. So all >> in >> >>> all >> >>> > > one >> >>> > > > > can >> >>> > > > > > >>>> count >> >>> > > > > > >>>>>> on my +1 when CLI FLIP would come up... >> >>> > > > > > >>>>>> >> >>> > > > > > >>>>>> BR, >> >>> > > > > > >>>>>> G >> >>> > > > > > >>>>>> >> >>> > > > > > >>>>>> >> >>> > > > > > >>>>>> On Thu, Mar 20, 2025 at 8:20 AM Gyula Fóra < >> >>> > > > gyula.f...@gmail.com> >> >>> > > > > > >>>> wrote: >> >>> > > > > > >>>>>> >> >>> > > > > > >>>>>>> Hi! >> >>> > > > > > >>>>>>> >> >>> > > > > > >>>>>>> @Zakelly Lan <zakelly....@gmail.com> >> >>> > > > > > >>>>>>> I think what Gabor means is that users want to have >> >>> > > predefined >> >>> > > > > SQL >> >>> > > > > > >>>>> scripts >> >>> > > > > > >>>>>>> to perform state analysis tasks to debug/identify >> >>> problems. >> >>> > > > > > >>>>>>> Such as write a SQL script that joins the metadata >> >>> table >> >>> > with >> >>> > > > the >> >>> > > > > > >>>> state >> >>> > > > > > >>>>>>> and >> >>> > > > > > >>>>>>> do some analytics on it. >> >>> > > > > > >>>>>>> >> >>> > > > > > >>>>>>> If we have a meta table then the SQL script that >> can do >> >>> > this >> >>> > > is >> >>> > > > > > >> fixed >> >>> > > > > > >>>>> and >> >>> > > > > > >>>>>>> users can trigger this on demand by simply >> providing a >> >>> new >> >>> > > > > > >> savepoint >> >>> > > > > > >>>>> path. >> >>> > > > > > >>>>>>> >> >>> > > > > > >>>>>>> If we have a different mechanism to extract metadata >> >>> that >> >>> > is >> >>> > > > not >> >>> > > > > > >> SQL >> >>> > > > > > >>>>>>> native >> >>> > > > > > >>>>>>> then manual steps need to be executed and a custom >> SQL >> >>> > script >> >>> > > > > would >> >>> > > > > > >>>> need >> >>> > > > > > >>>>>>> to >> >>> > > > > > >>>>>>> be written that adds the manually extracted metadata >> >>> into >> >>> > the >> >>> > > > > > >> script. >> >>> > > > > > >>>>>>> >> >>> > > > > > >>>>>>> Cheers, >> >>> > > > > > >>>>>>> Gyula >> >>> > > > > > >>>>>>> >> >>> > > > > > >>>>>>> On Thu, Mar 20, 2025 at 4:32 AM Zakelly Lan < >> >>> > > > > zakelly....@gmail.com >> >>> > > > > > >>> >> >>> > > > > > >>>>>>> wrote: >> >>> > > > > > >>>>>>> >> >>> > > > > > >>>>>>>> Hi all, >> >>> > > > > > >>>>>>>> >> >>> > > > > > >>>>>>>> Thanks for your answers! Getting everyone aligned >> on >> >>> this >> >>> > > > topic >> >>> > > > > > >> is >> >>> > > > > > >>>>>>>> challenging, but it’s definitely worth the effort >> >>> since it >> >>> > > > will >> >>> > > > > > >>>> help >> >>> > > > > > >>>>>>>> streamline things moving forward. >> >>> > > > > > >>>>>>>> >> >>> > > > > > >>>>>>>> @Gabor are you saying that users are using some >> >>> scripts to >> >>> > > > > define >> >>> > > > > > >>>> the >> >>> > > > > > >>>>>>> SQL >> >>> > > > > > >>>>>>>> metadata connector and get the information, right? >> If >> >>> so, >> >>> > > > would >> >>> > > > > a >> >>> > > > > > >>>> CLI >> >>> > > > > > >>>>>>> tool >> >>> > > > > > >>>>>>>> be more convenient? It's easy to invoke and can get >> >>> the >> >>> > > result >> >>> > > > > > >>>>> swiftly. >> >>> > > > > > >>>>>>> And >> >>> > > > > > >>>>>>>> there should be some other systems to track the >> >>> checkpoint >> >>> > > > > > >> lineage >> >>> > > > > > >>>> and >> >>> > > > > > >>>>>>>> analyze if there are outliers in metadata (e.g. >> state >> >>> size >> >>> > > of >> >>> > > > > one >> >>> > > > > > >>>>>>> operator) >> >>> > > > > > >>>>>>>> right? Well, maybe I missed something so please >> >>> correct me >> >>> > > if >> >>> > > > > I'm >> >>> > > > > > >>>>> wrong. >> >>> > > > > > >>>>>>>> >> >>> > > > > > >>>>>>>> I think the overall vision in Flink SQL is to >> provide >> >>> a >> >>> > SQL >> >>> > > > > > >> native >> >>> > > > > > >>>>>>>>> environment where we can serve complex use-cases >> >>> like you >> >>> > > > would >> >>> > > > > > >>>>> expect >> >>> > > > > > >>>>>>>> in a >> >>> > > > > > >>>>>>>>> regular database. >> >>> > > > > > >>>>>>>> >> >>> > > > > > >>>>>>>> >> >>> > > > > > >>>>>>>> @Gyula Well, this is a good point. From the >> >>> perspective of >> >>> > > > > > >>>>> comprehensive >> >>> > > > > > >>>>>>>> SQL experience, I'd +1 for treating metadata as >> data. >> >>> > > > Although I >> >>> > > > > > >>>> doubt >> >>> > > > > > >>>>>>> if >> >>> > > > > > >>>>>>>> there is a need for processing metadata, I won't be >> >>> > against >> >>> > > a >> >>> > > > > > >>>> separate >> >>> > > > > > >>>>>>>> connector. >> >>> > > > > > >>>>>>>> >> >>> > > > > > >>>>>>>> Regarding the CLI tool, I still think it’s worth >> >>> > > implementing. >> >>> > > > > > >>>> Such a >> >>> > > > > > >>>>>>> tool >> >>> > > > > > >>>>>>>> could provide savepoint information before resuming >> >>> from a >> >>> > > > > > >>>> savepoint, >> >>> > > > > > >>>>>>> which >> >>> > > > > > >>>>>>>> would enhance the user experience in CLI-based >> >>> workflows. >> >>> > It >> >>> > > > > > >> would >> >>> > > > > > >>>> be >> >>> > > > > > >>>>>>> good >> >>> > > > > > >>>>>>>> if someone could implement this feature. We >> shouldn’t >> >>> > worry >> >>> > > > > about >> >>> > > > > > >>>>>>> whether >> >>> > > > > > >>>>>>>> this tool might be retired in the future. >> Regardless >> >>> of >> >>> > the >> >>> > > > > > >>>> SQL-based >> >>> > > > > > >>>>>>>> solution we eventually adopt, this capability will >> >>> remain >> >>> > > > > > >> essential >> >>> > > > > > >>>>> for >> >>> > > > > > >>>>>>> CLI >> >>> > > > > > >>>>>>>> users. This is another topic. >> >>> > > > > > >>>>>>>> >> >>> > > > > > >>>>>>>> >> >>> > > > > > >>>>>>>> Best, >> >>> > > > > > >>>>>>>> Zakelly >> >>> > > > > > >>>>>>>> >> >>> > > > > > >>>>>>>> >> >>> > > > > > >>>>>>>> On Thu, Mar 20, 2025 at 10:37 AM Shengkai Fang < >> >>> > > > > > >> fskm...@gmail.com> >> >>> > > > > > >>>>>>> wrote: >> >>> > > > > > >>>>>>>> >> >>> > > > > > >>>>>>>>> Hi. >> >>> > > > > > >>>>>>>>> >> >>> > > > > > >>>>>>>>> After reading the doc[1], I think Spark provides a >> >>> > function >> >>> > > > for >> >>> > > > > > >>>>> users >> >>> > > > > > >>>>>>> to >> >>> > > > > > >>>>>>>>> consume the metadata from the savepoint. In Flink >> >>> SQL, >> >>> > > > similar >> >>> > > > > > >>>>>>>>> functionality is implemented through Polymorphic >> >>> Table >> >>> > > > > > >> Functions >> >>> > > > > > >>>>>>> (PTF) as >> >>> > > > > > >>>>>>>>> proposed in FLIP-440[2]. Below is a code >> example[3] >> >>> > > > > > >> illustrating >> >>> > > > > > >>>>> this >> >>> > > > > > >>>>>>>>> concept: >> >>> > > > > > >>>>>>>>> >> >>> > > > > > >>>>>>>>> ``` >> >>> > > > > > >>>>>>>>> public static class ScalarArgsFunction extends >> >>> > > > > > >>>>>>>>> TestProcessTableFunctionBase { >> >>> > > > > > >>>>>>>>> public void eval(Integer i, Boolean b) { >> >>> > > > > > >>>>>>>>> collectObjects(i, b); >> >>> > > > > > >>>>>>>>> } >> >>> > > > > > >>>>>>>>> } >> >>> > > > > > >>>>>>>>> ``` >> >>> > > > > > >>>>>>>>> >> >>> > > > > > >>>>>>>>> ``` >> >>> > > > > > >>>>>>>>> INSERT INTO sink SELECT * FROM f(i => 42, b => >> >>> > CAST('TRUE' >> >>> > > AS >> >>> > > > > > >>>>>>> BOOLEAN)) >> >>> > > > > > >>>>>>>>> `` >> >>> > > > > > >>>>>>>>> >> >>> > > > > > >>>>>>>>> So we can add a builtin function named >> >>> > > `read_state_metadata` >> >>> > > > to >> >>> > > > > > >>>> read >> >>> > > > > > >>>>>>>>> savepoint data. >> >>> > > > > > >>>>>>>>> >> >>> > > > > > >>>>>>>>> Best, >> >>> > > > > > >>>>>>>>> Shengkai >> >>> > > > > > >>>>>>>>> >> >>> > > > > > >>>>>>>>> [1] >> >>> > > > > > >>>>>>>>> >> >>> > > > > > >>>>>>>>> >> >>> > > > > > >>>>>>>> >> >>> > > > > > >>>>>>> >> >>> > > > > > >>>>> >> >>> > > > > > >>>> >> >>> > > > > > >> >> >>> > > > > > >> >>> > > > > >> >>> > > > >> >>> > > >> >>> > >> >>> >> https://docs.databricks.com/aws/en/structured-streaming/read-state?language=SQL >> >>> > > > > > >>>>>>>>> [2] >> >>> > > > > > >>>>>>>>> >> >>> > > > > > >>>>>>>> >> >>> > > > > > >>>>>>> >> >>> > > > > > >>>>> >> >>> > > > > > >>>> >> >>> > > > > > >> >> >>> > > > > > >> >>> > > > > >> >>> > > > >> >>> > > >> >>> > >> >>> >> https://cwiki.apache.org/confluence/pages/viewpage.action?pageId=298781093 >> >>> > > > > > >>>>>>>>> [3] >> >>> > > > > > >>>>>>>>> >> >>> > > > > > >>>>>>>>> >> >>> > > > > > >>>>>>>> >> >>> > > > > > >>>>>>> >> >>> > > > > > >>>>> >> >>> > > > > > >>>> >> >>> > > > > > >> >> >>> > > > > > >> >>> > > > > >> >>> > > > >> >>> > > >> >>> > >> >>> >> https://github.com/apache/flink/blob/master/flink-table/flink-table-planner/src/test/java/org/apache/flink/table/planner/plan/nodes/exec/stream/ProcessTableFunctionTestPrograms.java#L140 >> >>> > > > > > >>>>>>>>> >> >>> > > > > > >>>>>>>>> Gyula Fóra <gyula.f...@gmail.com> 于2025年3月19日周三 >> >>> 18:37写道: >> >>> > > > > > >>>>>>>>> >> >>> > > > > > >>>>>>>>>> Hi All! >> >>> > > > > > >>>>>>>>>> >> >>> > > > > > >>>>>>>>>> Thank you for the answers and concerns from >> >>> everyone. >> >>> > > > > > >>>>>>>>>> >> >>> > > > > > >>>>>>>>>> On the CLI vs State Metadata Connector/Table >> >>> question I >> >>> > > > would >> >>> > > > > > >>>> also >> >>> > > > > > >>>>>>> like >> >>> > > > > > >>>>>>>>> to >> >>> > > > > > >>>>>>>>>> step back a little and look at the bigger >> picture. >> >>> > > > > > >>>>>>>>>> >> >>> > > > > > >>>>>>>>>> I think the overall vision in Flink SQL is to >> >>> provide a >> >>> > > SQL >> >>> > > > > > >>>> native >> >>> > > > > > >>>>>>>>>> environment where we can serve complex use-cases >> >>> like >> >>> > you >> >>> > > > > > >> would >> >>> > > > > > >>>>>>> expect >> >>> > > > > > >>>>>>>>> in a >> >>> > > > > > >>>>>>>>>> regular database. >> >>> > > > > > >>>>>>>>>> Most features, developments in the recent years >> have >> >>> > gone >> >>> > > > > > >> this >> >>> > > > > > >>>>> way. >> >>> > > > > > >>>>>>>>>> >> >>> > > > > > >>>>>>>>>> The State Metadata Table would be a natural and >> >>> > > > > > >> straightforward >> >>> > > > > > >>>>> fit >> >>> > > > > > >>>>>>>> here. >> >>> > > > > > >>>>>>>>>> So from my side, +1 for that. >> >>> > > > > > >>>>>>>>>> >> >>> > > > > > >>>>>>>>>> However I could understand if we are not ready to >> >>> add a >> >>> > > new >> >>> > > > > > >>>>>>>>>> connector/format due to maintenance concerns >> (and in >> >>> > > general >> >>> > > > > > >>>>> concern >> >>> > > > > > >>>>>>>>> about >> >>> > > > > > >>>>>>>>>> the design). >> >>> > > > > > >>>>>>>>>> If that's the issue then we should spend more >> time >> >>> on >> >>> > the >> >>> > > > > > >>>> design >> >>> > > > > > >>>>> to >> >>> > > > > > >>>>>>> get >> >>> > > > > > >>>>>>>>>> comfortable with the approach and seek feedback >> >>> from the >> >>> > > > > > >> wider >> >>> > > > > > >>>>>>>> community >> >>> > > > > > >>>>>>>>>> >> >>> > > > > > >>>>>>>>>> I am -1 for the CLI/tooling approach as that will >> >>> not >> >>> > > > provide >> >>> > > > > > >>>> the >> >>> > > > > > >>>>>>>>>> featureset we are looking for that is not already >> >>> > covered >> >>> > > by >> >>> > > > > > >>>> the >> >>> > > > > > >>>>>>> Java >> >>> > > > > > >>>>>>>>>> connector. And that approach would come with the >> >>> same >> >>> > > > > > >>>> maintenance >> >>> > > > > > >>>>>>>>>> implications. >> >>> > > > > > >>>>>>>>>> >> >>> > > > > > >>>>>>>>>> Cheers >> >>> > > > > > >>>>>>>>>> Gyula >> >>> > > > > > >>>>>>>>>> >> >>> > > > > > >>>>>>>>>> >> >>> > > > > > >>>>>>>>>> On Wed, Mar 19, 2025 at 11:24 AM Gabor Somogyi < >> >>> > > > > > >>>>>>>>> gabor.g.somo...@gmail.com> >> >>> > > > > > >>>>>>>>>> wrote: >> >>> > > > > > >>>>>>>>>> >> >>> > > > > > >>>>>>>>>>> Hi Zaklely, Shengkai >> >>> > > > > > >>>>>>>>>>> >> >>> > > > > > >>>>>>>>>>> Several topics are going on so adding gist >> answers >> >>> to >> >>> > > them. >> >>> > > > > > >>>> When >> >>> > > > > > >>>>>>> some >> >>> > > > > > >>>>>>>>>> topic >> >>> > > > > > >>>>>>>>>>> is not touched please highlight it. >> >>> > > > > > >>>>>>>>>>> >> >>> > > > > > >>>>>>>>>>> @Shengkai: I've read through all the previous >> FLIPs >> >>> > > related >> >>> > > > > > >>>>>>> catalogs >> >>> > > > > > >>>>>>>>> and >> >>> > > > > > >>>>>>>>>> if >> >>> > > > > > >>>>>>>>>>> we would like to keep the concepts there >> >>> > > > > > >>>>>>>>>>> then one-to-one mapping relationship between >> >>> savepoint >> >>> > > and >> >>> > > > > > >>>>> catalog >> >>> > > > > > >>>>>>>> is a >> >>> > > > > > >>>>>>>>>>> reasonable direction. In short I'm happy that >> >>> > > > > > >>>>>>>>>>> you've highlighted this and agree as a whole. >> I've >> >>> > > written >> >>> > > > > > >> it >> >>> > > > > > >>>>> down >> >>> > > > > > >>>>>>>>>>> previously, just want to double confirm that >> state >> >>> > > catalog >> >>> > > > > > >> is >> >>> > > > > > >>>>>>>>>>> essential and planned. When we reach this point >> >>> then >> >>> > your >> >>> > > > > > >>>> input >> >>> > > > > > >>>>> is >> >>> > > > > > >>>>>>>> more >> >>> > > > > > >>>>>>>>>>> than welcome. >> >>> > > > > > >>>>>>>>>>> >> >>> > > > > > >>>>>>>>>>> @Zakelly: We've tried the CLI and separate >> library >> >>> > > > > > >> approaches >> >>> > > > > > >>>>> with >> >>> > > > > > >>>>>>>>> users >> >>> > > > > > >>>>>>>>>>> already and these are not something which is >> >>> welcome >> >>> > > > > > >> because >> >>> > > > > > >>>> of >> >>> > > > > > >>>>>>> the >> >>> > > > > > >>>>>>>>>>> following: >> >>> > > > > > >>>>>>>>>>> * Users want to have automated tasks and not >> manual >> >>> > > > > > >>>> CLI/library >> >>> > > > > > >>>>>>>> output >> >>> > > > > > >>>>>>>>>>> parsing. This can be hacked around but our >> >>> experience >> >>> > is >> >>> > > > > > >>>>> negative >> >>> > > > > > >>>>>>> on >> >>> > > > > > >>>>>>>>> this >> >>> > > > > > >>>>>>>>>>> because it's just brittle. >> >>> > > > > > >>>>>>>>>>> * From development perspective It's way much >> bigger >> >>> > > effort >> >>> > > > > > >>>> than >> >>> > > > > > >>>>> a >> >>> > > > > > >>>>>>>>>> connector >> >>> > > > > > >>>>>>>>>>> (hard to test, packaging/version handling is and >> >>> extra >> >>> > > > > > >> layer >> >>> > > > > > >>>> of >> >>> > > > > > >>>>>>>>>> complexity, >> >>> > > > > > >>>>>>>>>>> external FS authentication is pain for users, >> >>> expecting >> >>> > > > > > >> them >> >>> > > > > > >>>> to >> >>> > > > > > >>>>>>>>> download >> >>> > > > > > >>>>>>>>>>> savepoints also) >> >>> > > > > > >>>>>>>>>>> * Purely personal opinion but if we would find >> >>> better >> >>> > > ways >> >>> > > > > > >>>> later >> >>> > > > > > >>>>>>> then >> >>> > > > > > >>>>>>>>>>> retire a CLI is not more lightweight than >> retire a >> >>> > > > > > >> connector >> >>> > > > > > >>>>>>>>>>> >> >>> > > > > > >>>>>>>>>>>> It would be great if you give some examples on >> how >> >>> > user >> >>> > > > > > >>>> could >> >>> > > > > > >>>>>>>>> leverage >> >>> > > > > > >>>>>>>>>>> the separate connector to process the metadata. >> >>> > > > > > >>>>>>>>>>> >> >>> > > > > > >>>>>>>>>>> The most simplest cases: >> >>> > > > > > >>>>>>>>>>> * give me the overgroving state uids >> >>> > > > > > >>>>>>>>>>> * give me the not known (new or renamed) state >> uids >> >>> > > > > > >>>>>>>>>>> * give me the state uids where state size >> >>> drastically >> >>> > > > > > >> dropped >> >>> > > > > > >>>>>>> compare >> >>> > > > > > >>>>>>>>> to >> >>> > > > > > >>>>>>>>>> a >> >>> > > > > > >>>>>>>>>>> previous savepoint (accidental state loss) >> >>> > > > > > >>>>>>>>>>> >> >>> > > > > > >>>>>>>>>>> Since it was mentioned: as a general offtopic >> >>> teaser, >> >>> > > yeah >> >>> > > > > > >> it >> >>> > > > > > >>>>>>> would >> >>> > > > > > >>>>>>>> be >> >>> > > > > > >>>>>>>>>> good >> >>> > > > > > >>>>>>>>>>> to have some sort of checkpoint/savepoint >> lineage >> >>> or >> >>> > > > > > >> however >> >>> > > > > > >>>> we >> >>> > > > > > >>>>>>> call >> >>> > > > > > >>>>>>>>> it. >> >>> > > > > > >>>>>>>>>>> Since we've not yet reached this point there >> are no >> >>> > > > > > >> technical >> >>> > > > > > >>>>>>>> details, >> >>> > > > > > >>>>>>>>>> it's >> >>> > > > > > >>>>>>>>>>> more like a vision. It's a common pattern that >> >>> > > > > > >>>>>>>>>>> jobs are physically running but somehow the >> state >> >>> > > > > > >> processing >> >>> > > > > > >>>> is >> >>> > > > > > >>>>>>> stuck >> >>> > > > > > >>>>>>>>> and >> >>> > > > > > >>>>>>>>>>> it would be good to add some way to find it out >> >>> > > > > > >>>> automatically. >> >>> > > > > > >>>>>>>>>>> The important saying here is automation and not >> >>> manual >> >>> > > > > > >>>>> evaluation >> >>> > > > > > >>>>>>>> since >> >>> > > > > > >>>>>>>>>>> handling 10k+ jobs is just not allowing that. >> >>> > > > > > >>>>>>>>>>> >> >>> > > > > > >>>>>>>>>>> BR, >> >>> > > > > > >>>>>>>>>>> G >> >>> > > > > > >>>>>>>>>>> >> >>> > > > > > >>>>>>>>>>> >> >>> > > > > > >>>>>>>>>>> On Wed, Mar 19, 2025 at 6:46 AM Shengkai Fang < >> >>> > > > > > >>>>> fskm...@gmail.com> >> >>> > > > > > >>>>>>>>> wrote: >> >>> > > > > > >>>>>>>>>>> >> >>> > > > > > >>>>>>>>>>>> Hi, All. >> >>> > > > > > >>>>>>>>>>>> >> >>> > > > > > >>>>>>>>>>>> About State Catalog, I want to share more >> thoughts >> >>> > about >> >>> > > > > > >>>> this. >> >>> > > > > > >>>>>>>>>>>> >> >>> > > > > > >>>>>>>>>>>> In the initial design concept, I understood >> that a >> >>> > > > > > >>>> savepoint >> >>> > > > > > >>>>>>> and a >> >>> > > > > > >>>>>>>>>> state >> >>> > > > > > >>>>>>>>>>>> catalog have a one-to-one mapping relationship. >> >>> Each >> >>> > > > > > >>>> operator >> >>> > > > > > >>>>>>>>>> corresponds >> >>> > > > > > >>>>>>>>>>>> to a database, and the state of each operator >> is >> >>> > > > > > >>>> represented >> >>> > > > > > >>>>> as >> >>> > > > > > >>>>>>>>>>> individual >> >>> > > > > > >>>>>>>>>>>> tables. The rationale behind this design is: >> >>> > > > > > >>>>>>>>>>>> >> >>> > > > > > >>>>>>>>>>>> *State Diversity*: An operator may involve >> >>> multiple >> >>> > > types >> >>> > > > > > >>>> of >> >>> > > > > > >>>>>>>> states. >> >>> > > > > > >>>>>>>>>> For >> >>> > > > > > >>>>>>>>>>>> example, in our VVR design, a "multi-join" >> >>> operator >> >>> > uses >> >>> > > > > > >>>> keyed >> >>> > > > > > >>>>>>>> states >> >>> > > > > > >>>>>>>>>> for >> >>> > > > > > >>>>>>>>>>>> two input streams and a broadcast state for the >> >>> third >> >>> > > > > > >>>> stream. >> >>> > > > > > >>>>>>> This >> >>> > > > > > >>>>>>>>>> makes >> >>> > > > > > >>>>>>>>>>> it >> >>> > > > > > >>>>>>>>>>>> challenging to represent all states of an >> operator >> >>> > > > > > >> within a >> >>> > > > > > >>>>>>> single >> >>> > > > > > >>>>>>>>>> table. >> >>> > > > > > >>>>>>>>>>>> *Scalability*: Internally, an operator might >> have >> >>> > > > > > >> multiple >> >>> > > > > > >>>>> keyed >> >>> > > > > > >>>>>>>>> states >> >>> > > > > > >>>>>>>>>>>> (e.g., value state and list state). However, >> large >> >>> > list >> >>> > > > > > >>>> states >> >>> > > > > > >>>>>>> may >> >>> > > > > > >>>>>>>>> not >> >>> > > > > > >>>>>>>>>>> fit >> >>> > > > > > >>>>>>>>>>>> entirely in memory. To address this, we >> recommend >> >>> > > > > > >>>> implementing >> >>> > > > > > >>>>>>> each >> >>> > > > > > >>>>>>>>>> state >> >>> > > > > > >>>>>>>>>>>> as a separate table. >> >>> > > > > > >>>>>>>>>>>> >> >>> > > > > > >>>>>>>>>>>> To resolve the loosely coupled relationships >> >>> between >> >>> > > > > > >>>> operator >> >>> > > > > > >>>>>>>> states, >> >>> > > > > > >>>>>>>>>> we >> >>> > > > > > >>>>>>>>>>>> propose embedding predefined views within the >> >>> catalog. >> >>> > > > > > >>>> These >> >>> > > > > > >>>>>>> views >> >>> > > > > > >>>>>>>>>>> simplify >> >>> > > > > > >>>>>>>>>>>> user understanding of operator implementations >> and >> >>> > > > > > >> provide >> >>> > > > > > >>>> a >> >>> > > > > > >>>>>>> more >> >>> > > > > > >>>>>>>>>>> intuitive >> >>> > > > > > >>>>>>>>>>>> perspective. For instance, a join operator may >> >>> have >> >>> > > > > > >>>> multiple >> >>> > > > > > >>>>>>> state >> >>> > > > > > >>>>>>>>>>>> implementations (depending on whether the join >> key >> >>> > > > > > >> includes >> >>> > > > > > >>>>>>> unique >> >>> > > > > > >>>>>>>>>>>> attributes), but users primarily care about the >> >>> data >> >>> > > > > > >>>>> associated >> >>> > > > > > >>>>>>>> with >> >>> > > > > > >>>>>>>>> a >> >>> > > > > > >>>>>>>>>>>> specific join key across input streams. >> >>> > > > > > >>>>>>>>>>>> >> >>> > > > > > >>>>>>>>>>>> Returning to the one-to-one mapping between >> >>> savepoints >> >>> > > > > > >> and >> >>> > > > > > >>>>>>>> catalogs, >> >>> > > > > > >>>>>>>>> we >> >>> > > > > > >>>>>>>>>>> aim >> >>> > > > > > >>>>>>>>>>>> to manage multiple user state catalogs through >> a >> >>> > catalog >> >>> > > > > > >>>>> store. >> >>> > > > > > >>>>>>>> When >> >>> > > > > > >>>>>>>>> a >> >>> > > > > > >>>>>>>>>>> user >> >>> > > > > > >>>>>>>>>>>> triggers a savepoint for a job on the platform: >> >>> > > > > > >>>>>>>>>>>> >> >>> > > > > > >>>>>>>>>>>> 1. The platform sends a REST request to the >> >>> > JobManager. >> >>> > > > > > >>>>>>>>>>>> 2. Simultaneously, it registers a new state >> >>> catalog in >> >>> > > > > > >> the >> >>> > > > > > >>>>>>> catalog >> >>> > > > > > >>>>>>>>>> store, >> >>> > > > > > >>>>>>>>>>>> enabling immediate analysis of state data on >> the >> >>> > > > > > >> platform. >> >>> > > > > > >>>>>>>>>>>> 3. Deleting a savepoint would also trigger the >> >>> removal >> >>> > > of >> >>> > > > > > >>>> its >> >>> > > > > > >>>>>>>>>> associated >> >>> > > > > > >>>>>>>>>>>> catalog. >> >>> > > > > > >>>>>>>>>>>> >> >>> > > > > > >>>>>>>>>>>> This vision assumes that states are >> >>> self-describing or >> >>> > > > > > >>>> that a >> >>> > > > > > >>>>>>> state >> >>> > > > > > >>>>>>>>>>>> metaservice is introduced to analyze savepoint >> >>> > > > > > >> structures. >> >>> > > > > > >>>>>>>>>>>> >> >>> > > > > > >>>>>>>>>>>>> How can users create logic to identify >> >>> differences >> >>> > > > > > >>>> between >> >>> > > > > > >>>>>>>> multiple >> >>> > > > > > >>>>>>>>>>>> savepoints? >> >>> > > > > > >>>>>>>>>>>> >> >>> > > > > > >>>>>>>>>>>> Since savepoints and state catalogs are >> one-to-one >> >>> > > > > > >> mapped, >> >>> > > > > > >>>>> users >> >>> > > > > > >>>>>>>> can >> >>> > > > > > >>>>>>>>>>> query >> >>> > > > > > >>>>>>>>>>>> metadata via their respective catalogs. For >> >>> example: >> >>> > > > > > >>>>>>>>>>>> >> >>> > > > > > >>>>>>>>>>>> 1. >> >>> > > > > > >>>>> >> >>> `savepoint-${id}`.`system`.`metadata_table`.`<operator-name>` >> >>> > > > > > >>>>>>>>>> provides >> >>> > > > > > >>>>>>>>>>>> operator-specific metadata (e.g., state size, >> >>> type). >> >>> > > > > > >>>>>>>>>>>> 2. Comparing metadata tables (e.g., schema >> >>> versions, >> >>> > > > > > >> state >> >>> > > > > > >>>>> entry >> >>> > > > > > >>>>>>>>>> counts) >> >>> > > > > > >>>>>>>>>>>> across catalogs reveals structural or >> quantitative >> >>> > > > > > >>>>> differences. >> >>> > > > > > >>>>>>>>>>>> 3. For deeper analysis, users could write SQL >> >>> queries >> >>> > to >> >>> > > > > > >>>>> compare >> >>> > > > > > >>>>>>>>>> specific >> >>> > > > > > >>>>>>>>>>>> state partitions or leverage the metaservice to >> >>> track >> >>> > > > > > >> state >> >>> > > > > > >>>>>>>> evolution >> >>> > > > > > >>>>>>>>>>>> (e.g., added/removed operators, modified state >> >>> > > > > > >>>>> configurations). >> >>> > > > > > >>>>>>>>>>>> >> >>> > > > > > >>>>>>>>>>>> If we plan to introduce a state catalog in the >> >>> > future, I >> >>> > > > > > >>>> would >> >>> > > > > > >>>>>>> lean >> >>> > > > > > >>>>>>>>>>> toward >> >>> > > > > > >>>>>>>>>>>> using metadata tables. If a utility tool can >> >>> address >> >>> > the >> >>> > > > > > >>>>>>> challenges >> >>> > > > > > >>>>>>>>> we >> >>> > > > > > >>>>>>>>>>>> face, could we avoid introducing an additional >> >>> > > connector? >> >>> > > > > > >>>>>>>>>>>> >> >>> > > > > > >>>>>>>>>>>> Best, >> >>> > > > > > >>>>>>>>>>>> Shengkai >> >>> > > > > > >>>>>>>>>>>> >> >>> > > > > > >>>>>>>>>>>> Gyula Fóra <gyula.f...@gmail.com> >> 于2025年3月17日周一 >> >>> > > 20:25写道: >> >>> > > > > > >>>>>>>>>>>> >> >>> > > > > > >>>>>>>>>>>>> Hi All! >> >>> > > > > > >>>>>>>>>>>>> >> >>> > > > > > >>>>>>>>>>>>> Without going into too much detail here are >> my 2 >> >>> > cents >> >>> > > > > > >>>>>>> regarding >> >>> > > > > > >>>>>>>>> the >> >>> > > > > > >>>>>>>>>>>>> virtual column / catalog metadata / table >> >>> (connector) >> >>> > > > > > >>>>>>> discussion >> >>> > > > > > >>>>>>>>> for >> >>> > > > > > >>>>>>>>>>> the >> >>> > > > > > >>>>>>>>>>>>> State metadata. >> >>> > > > > > >>>>>>>>>>>>> >> >>> > > > > > >>>>>>>>>>>>> State metadata such as the types of states, >> their >> >>> > > > > > >>>>> properties, >> >>> > > > > > >>>>>>>>> names, >> >>> > > > > > >>>>>>>>>>>> sizes >> >>> > > > > > >>>>>>>>>>>>> etc are all valuable information that can be >> >>> used to >> >>> > > > > > >>>> enrich >> >>> > > > > > >>>>>>> the >> >>> > > > > > >>>>>>>>>>>>> computations we do on state. >> >>> > > > > > >>>>>>>>>>>>> We can either analyze it standalone (such as >> >>> discover >> >>> > > > > > >>>>>>> anomalies, >> >>> > > > > > >>>>>>>>> for >> >>> > > > > > >>>>>>>>>>>> large >> >>> > > > > > >>>>>>>>>>>>> jobs with many states), across multiple >> >>> savepoints >> >>> > > > > > >>>> (discover >> >>> > > > > > >>>>>>> how >> >>> > > > > > >>>>>>>>>> state >> >>> > > > > > >>>>>>>>>>>>> changed over time) or by joining it with >> keyed or >> >>> > > > > > >>>> non-keyed >> >>> > > > > > >>>>>>> state >> >>> > > > > > >>>>>>>>>> data >> >>> > > > > > >>>>>>>>>>> to >> >>> > > > > > >>>>>>>>>>>>> serve more complex queries on the state. >> >>> > > > > > >>>>>>>>>>>>> >> >>> > > > > > >>>>>>>>>>>>> The only solution that seems to serve all >> these >> >>> > > > > > >> use-cases >> >>> > > > > > >>>>> and >> >>> > > > > > >>>>>>>>>>>> requirements >> >>> > > > > > >>>>>>>>>>>>> in a straightforward and SQL canonical way is >> to >> >>> > simply >> >>> > > > > > >>>>> expose >> >>> > > > > > >>>>>>>> the >> >>> > > > > > >>>>>>>>>>> state >> >>> > > > > > >>>>>>>>>>>>> metadata as a separate table. This is a >> metadata >> >>> > table >> >>> > > > > > >>>> but >> >>> > > > > > >>>>> you >> >>> > > > > > >>>>>>>> can >> >>> > > > > > >>>>>>>>>> also >> >>> > > > > > >>>>>>>>>>>>> think of it as data table, it makes no >> practical >> >>> > > > > > >>>> difference >> >>> > > > > > >>>>>>> here. >> >>> > > > > > >>>>>>>>>>>>> >> >>> > > > > > >>>>>>>>>>>>> Once we have a catalog later, the catalog can >> >>> offer >> >>> > > > > > >> this >> >>> > > > > > >>>>> table >> >>> > > > > > >>>>>>>> out >> >>> > > > > > >>>>>>>>> of >> >>> > > > > > >>>>>>>>>>> the >> >>> > > > > > >>>>>>>>>>>>> box, the same way databases provide metadata >> >>> tables. >> >>> > > > > > >> For >> >>> > > > > > >>>>> this >> >>> > > > > > >>>>>>> to >> >>> > > > > > >>>>>>>>> work >> >>> > > > > > >>>>>>>>>>>>> however we need another, simpler connector >> that >> >>> > creates >> >>> > > > > > >>>> this >> >>> > > > > > >>>>>>>> table. >> >>> > > > > > >>>>>>>>>>>>> >> >>> > > > > > >>>>>>>>>>>>> +1 for state metadata as a separate >> >>> connector/table, >> >>> > > > > > >>>> instead >> >>> > > > > > >>>>>>> of >> >>> > > > > > >>>>>>>>>> adding >> >>> > > > > > >>>>>>>>>>>>> virtual columns and adhoc catalog metadata >> that >> >>> is >> >>> > hard >> >>> > > > > > >>>> to >> >>> > > > > > >>>>> use >> >>> > > > > > >>>>>>>> in a >> >>> > > > > > >>>>>>>>>>> large >> >>> > > > > > >>>>>>>>>>>>> number of queries. >> >>> > > > > > >>>>>>>>>>>>> >> >>> > > > > > >>>>>>>>>>>>> Cheers, >> >>> > > > > > >>>>>>>>>>>>> Gyula >> >>> > > > > > >>>>>>>>>>>>> >> >>> > > > > > >>>>>>>>>>>>> On Mon, Mar 17, 2025 at 12:44 PM Gabor >> Somogyi < >> >>> > > > > > >>>>>>>>>>>> gabor.g.somo...@gmail.com> >> >>> > > > > > >>>>>>>>>>>>> wrote: >> >>> > > > > > >>>>>>>>>>>>> >> >>> > > > > > >>>>>>>>>>>>>> 1. State TTL for Value Columns >> >>> > > > > > >>>>>>>>>>>>>> >> >>> > > > > > >>>>>>>>>>>>>>> I’m planning on adding this, and we may >> >>> collaborate >> >>> > > > > > >>>> on >> >>> > > > > > >>>>> it >> >>> > > > > > >>>>>>> in >> >>> > > > > > >>>>>>>>> the >> >>> > > > > > >>>>>>>>>>>>> future. >> >>> > > > > > >>>>>>>>>>>>>> >> >>> > > > > > >>>>>>>>>>>>>> +1 on this, just ping me. >> >>> > > > > > >>>>>>>>>>>>>> >> >>> > > > > > >>>>>>>>>>>>>> 2. Metadata Table vs. Metadata Column >> >>> > > > > > >>>>>>>>>>>>>> >> >>> > > > > > >>>>>>>>>>>>>> After some code digging and POC all I can say >> >>> that >> >>> > > > > > >> with >> >>> > > > > > >>>>>>> heavy >> >>> > > > > > >>>>>>>>>> effort >> >>> > > > > > >>>>>>>>>>> we >> >>> > > > > > >>>>>>>>>>>>> can >> >>> > > > > > >>>>>>>>>>>>>> maybe add such changes that we're able to >> show >> >>> > > > > > >> metadata >> >>> > > > > > >>>>> of a >> >>> > > > > > >>>>>>>>>>> savepoint >> >>> > > > > > >>>>>>>>>>>>> from >> >>> > > > > > >>>>>>>>>>>>>> catalog. >> >>> > > > > > >>>>>>>>>>>>>> I'm not against that but from user >> perspective >> >>> this >> >>> > > > > > >> has >> >>> > > > > > >>>>>>> limited >> >>> > > > > > >>>>>>>>>>> value, >> >>> > > > > > >>>>>>>>>>>>> let >> >>> > > > > > >>>>>>>>>>>>>> me explain why. >> >>> > > > > > >>>>>>>>>>>>>> >> >>> > > > > > >>>>>>>>>>>>>> From high level perspective I see the >> following >> >>> > > > > > >> which I >> >>> > > > > > >>>>> see >> >>> > > > > > >>>>>>>>>> agreement >> >>> > > > > > >>>>>>>>>>>> on: >> >>> > > > > > >>>>>>>>>>>>>> * We should have a catalog which is >> >>> representing one >> >>> > > > > > >> or >> >>> > > > > > >>>>> more >> >>> > > > > > >>>>>>>> jobs >> >>> > > > > > >>>>>>>>>>>>> savepoint >> >>> > > > > > >>>>>>>>>>>>>> data set (future plan) >> >>> > > > > > >>>>>>>>>>>>>> * Savepoints should be able to be registered >> in >> >>> the >> >>> > > > > > >>>>> catalog >> >>> > > > > > >>>>>>>> which >> >>> > > > > > >>>>>>>>>> are >> >>> > > > > > >>>>>>>>>>>>> then >> >>> > > > > > >>>>>>>>>>>>>> databases (future plan) >> >>> > > > > > >>>>>>>>>>>>>> * There must be a possiblity to create tables >> >>> from >> >>> > > > > > >>>>> databases >> >>> > > > > > >>>>>>>>> where >> >>> > > > > > >>>>>>>>>>>> users >> >>> > > > > > >>>>>>>>>>>>>> can read state data (exists already) >> >>> > > > > > >>>>>>>>>>>>>> >> >>> > > > > > >>>>>>>>>>>>>> In terms of metadata, If I understand >> correctly >> >>> then >> >>> > > > > > >>>> the >> >>> > > > > > >>>>>>>>> suggested >> >>> > > > > > >>>>>>>>>>>>> approach >> >>> > > > > > >>>>>>>>>>>>>> would be to access >> >>> > > > > > >>>>>>>>>>>>>> it from the catalog describe command, right? >> >>> Adding >> >>> > > > > > >>>> that >> >>> > > > > > >>>>>>> info >> >>> > > > > > >>>>>>>>> when >> >>> > > > > > >>>>>>>>>>>>> specific >> >>> > > > > > >>>>>>>>>>>>>> database describe command >> >>> > > > > > >>>>>>>>>>>>>> is executed could be done. >> >>> > > > > > >>>>>>>>>>>>>> >> >>> > > > > > >>>>>>>>>>>>>> The question is for instance how can users >> >>> create >> >>> > > > > > >> such >> >>> > > > > > >>>> a >> >>> > > > > > >>>>>>> logic >> >>> > > > > > >>>>>>>>> that >> >>> > > > > > >>>>>>>>>>>> tells >> >>> > > > > > >>>>>>>>>>>>>> them what is >> >>> > > > > > >>>>>>>>>>>>>> the difference between multiple savepoints? >> >>> > > > > > >>>>>>>>>>>>>> Just to give some examples: >> >>> > > > > > >>>>>>>>>>>>>> * per operator size changes between >> savepoints >> >>> > > > > > >>>>>>>>>>>>>> * show values from operator data where state >> >>> size >> >>> > > > > > >>>> reaches >> >>> > > > > > >>>>> a >> >>> > > > > > >>>>>>>>>> boundary >> >>> > > > > > >>>>>>>>>>>>>> * in general "find which checkpoint ruined >> >>> things" >> >>> > is >> >>> > > > > > >>>>> quite >> >>> > > > > > >>>>>>>>> common >> >>> > > > > > >>>>>>>>>>>>> pattern >> >>> > > > > > >>>>>>>>>>>>>> What I would like to highlight here is that >> from >> >>> > > > > > >> Flink >> >>> > > > > > >>>>>>> point of >> >>> > > > > > >>>>>>>>>> view >> >>> > > > > > >>>>>>>>>>>> the >> >>> > > > > > >>>>>>>>>>>>>> metadata can be >> >>> > > > > > >>>>>>>>>>>>>> considered as a static side output >> information >> >>> but >> >>> > > > > > >> for >> >>> > > > > > >>>>> users >> >>> > > > > > >>>>>>>>> these >> >>> > > > > > >>>>>>>>>>>> values >> >>> > > > > > >>>>>>>>>>>>>> are actual real data >> >>> > > > > > >>>>>>>>>>>>>> where logic is planned to build around. >> >>> > > > > > >>>>>>>>>>>>>> >> >>> > > > > > >>>>>>>>>>>>>>> The metadata is more like one-time >> information >> >>> > > > > > >>>> instead >> >>> > > > > > >>>>> of >> >>> > > > > > >>>>>>> a >> >>> > > > > > >>>>>>>>>>> streaming >> >>> > > > > > >>>>>>>>>>>>>> data that changes all >> >>> > > > > > >>>>>>>>>>>>>> the time, so a single connector seems to be >> an >> >>> > > > > > >>>> overkill. >> >>> > > > > > >>>>>>>>>>>>>> >> >>> > > > > > >>>>>>>>>>>>>> State data is also static within a savepoint >> and >> >>> > > > > > >> that's >> >>> > > > > > >>>>> the >> >>> > > > > > >>>>>>>>> reason >> >>> > > > > > >>>>>>>>>>> why >> >>> > > > > > >>>>>>>>>>>>> the >> >>> > > > > > >>>>>>>>>>>>>> state processor API is working in batch mode. >> >>> > > > > > >>>>>>>>>>>>>> When we handle multiple checkpoints in a >> >>> streaming >> >>> > > > > > >>>> fashion >> >>> > > > > > >>>>>>> then >> >>> > > > > > >>>>>>>>>> this >> >>> > > > > > >>>>>>>>>>>> can >> >>> > > > > > >>>>>>>>>>>>> be >> >>> > > > > > >>>>>>>>>>>>>> viewed from another angle. >> >>> > > > > > >>>>>>>>>>>>>> >> >>> > > > > > >>>>>>>>>>>>>> We can come up with more lightweight solution >> >>> other >> >>> > > > > > >>>> than a >> >>> > > > > > >>>>>>> new >> >>> > > > > > >>>>>>>>>>>> connector >> >>> > > > > > >>>>>>>>>>>>>> but enforcing users to parse the catalog >> >>> > > > > > >>>>>>>>>>>>>> describe command output in order to compare >> >>> multiple >> >>> > > > > > >>>>>>> savepoints >> >>> > > > > > >>>>>>>>>>> doesn't >> >>> > > > > > >>>>>>>>>>>>>> sound smooth user experience. >> >>> > > > > > >>>>>>>>>>>>>> Honestly I've no other idea how exposing >> >>> metadata as >> >>> > > > > > >>>> real >> >>> > > > > > >>>>>>> user >> >>> > > > > > >>>>>>>>> data >> >>> > > > > > >>>>>>>>>>> so >> >>> > > > > > >>>>>>>>>>>>>> waiting on other approaches. >> >>> > > > > > >>>>>>>>>>>>>> >> >>> > > > > > >>>>>>>>>>>>>> BR, >> >>> > > > > > >>>>>>>>>>>>>> G >> >>> > > > > > >>>>>>>>>>>>>> >> >>> > > > > > >>>>>>>>>>>>>> >> >>> > > > > > >>>>>>>>>>>>>> On Thu, Mar 13, 2025 at 2:44 AM Shengkai >> Fang < >> >>> > > > > > >>>>>>>> fskm...@gmail.com >> >>> > > > > > >>>>>>>>>> >> >>> > > > > > >>>>>>>>>>>> wrote: >> >>> > > > > > >>>>>>>>>>>>>> >> >>> > > > > > >>>>>>>>>>>>>>> Looking forward to hearing the good news! >> >>> > > > > > >>>>>>>>>>>>>>> >> >>> > > > > > >>>>>>>>>>>>>>> Best, >> >>> > > > > > >>>>>>>>>>>>>>> Shengkai >> >>> > > > > > >>>>>>>>>>>>>>> >> >>> > > > > > >>>>>>>>>>>>>>> Gabor Somogyi <gabor.g.somo...@gmail.com> >> >>> > > > > > >>>> 于2025年3月12日周三 >> >>> > > > > > >>>>>>>>> 22:24写道: >> >>> > > > > > >>>>>>>>>>>>>>> >> >>> > > > > > >>>>>>>>>>>>>>>> Thanks for both the valuable input! >> >>> > > > > > >>>>>>>>>>>>>>>> >> >>> > > > > > >>>>>>>>>>>>>>>> Let me take a closer look at the >> suggestions, >> >>> > > > > > >> like >> >>> > > > > > >>>> the >> >>> > > > > > >>>>>>>>> Catalog >> >>> > > > > > >>>>>>>>>>>>>>> capabilities >> >>> > > > > > >>>>>>>>>>>>>>>> and possibility of embedding >> TypeInformation >> >>> or >> >>> > > > > > >>>>>>>>>>>>>>>> StateDescriptor metadata directly into the >> raw >> >>> > > > > > >>>> state >> >>> > > > > > >>>>>>>> files... >> >>> > > > > > >>>>>>>>>>>>>>>> >> >>> > > > > > >>>>>>>>>>>>>>>> BR, >> >>> > > > > > >>>>>>>>>>>>>>>> G >> >>> > > > > > >>>>>>>>>>>>>>>> >> >>> > > > > > >>>>>>>>>>>>>>>> >> >>> > > > > > >>>>>>>>>>>>>>>> On Wed, Mar 12, 2025 at 8:17 AM Shengkai >> Fang >> >>> < >> >>> > > > > > >>>>>>>>>> fskm...@gmail.com >> >>> > > > > > >>>>>>>>>>>> >> >>> > > > > > >>>>>>>>>>>>>> wrote: >> >>> > > > > > >>>>>>>>>>>>>>>> >> >>> > > > > > >>>>>>>>>>>>>>>>> Thanks for Zakelly's clarification. >> >>> > > > > > >>>>>>>>>>>>>>>>> >> >>> > > > > > >>>>>>>>>>>>>>>>> 1. State TTL for Value Columns >> >>> > > > > > >>>>>>>>>>>>>>>>> >> >>> > > > > > >>>>>>>>>>>>>>>>> +1 to delay the discussion about this. >> >>> > > > > > >>>>>>>>>>>>>>>>> >> >>> > > > > > >>>>>>>>>>>>>>>>> 2. Metadata Table vs. Metadata Column >> >>> > > > > > >>>>>>>>>>>>>>>>> >> >>> > > > > > >>>>>>>>>>>>>>>>> I’d like to share my perspective on the >> State >> >>> > > > > > >>>>> Catalog >> >>> > > > > > >>>>>>>>>> proposal. >> >>> > > > > > >>>>>>>>>>>>> While >> >>> > > > > > >>>>>>>>>>>>>>>>> introducing this capability is beneficial, >> >>> > > > > > >> there >> >>> > > > > > >>>> is >> >>> > > > > > >>>>> a >> >>> > > > > > >>>>>>>>>> blocker: >> >>> > > > > > >>>>>>>>>>>> the >> >>> > > > > > >>>>>>>>>>>>>>>> current >> >>> > > > > > >>>>>>>>>>>>>>>>> StateBackend architecture does not permit >> >>> > > > > > >>>> operators >> >>> > > > > > >>>>> to >> >>> > > > > > >>>>>>>>> encode >> >>> > > > > > >>>>>>>>>>>>>>>>> TypeInformation into the state—it only >> >>> > > > > > >> preserves >> >>> > > > > > >>>> the >> >>> > > > > > >>>>>>>>>>> Serializer. >> >>> > > > > > >>>>>>>>>>>>> This >> >>> > > > > > >>>>>>>>>>>>>>>>> limitation creates an asymmetry, as >> operators >> >>> > > > > > >>>> alone >> >>> > > > > > >>>>>>>> retain >> >>> > > > > > >>>>>>>>>>>>> knowledge >> >>> > > > > > >>>>>>>>>>>>>> of >> >>> > > > > > >>>>>>>>>>>>>>>> the >> >>> > > > > > >>>>>>>>>>>>>>>>> data structure’s schema. >> >>> > > > > > >>>>>>>>>>>>>>>>> >> >>> > > > > > >>>>>>>>>>>>>>>>> To address this, I suggest allowing >> operators >> >>> > > > > > >> to >> >>> > > > > > >>>>> embed >> >>> > > > > > >>>>>>>>>>>>>> TypeInformation >> >>> > > > > > >>>>>>>>>>>>>>> or >> >>> > > > > > >>>>>>>>>>>>>>>>> StateDescriptor metadata directly into the >> >>> raw >> >>> > > > > > >>>> state >> >>> > > > > > >>>>>>>> files. >> >>> > > > > > >>>>>>>>>>> Such >> >>> > > > > > >>>>>>>>>>>> a >> >>> > > > > > >>>>>>>>>>>>>>> design >> >>> > > > > > >>>>>>>>>>>>>>>>> would enable the Catalog to: >> >>> > > > > > >>>>>>>>>>>>>>>>> >> >>> > > > > > >>>>>>>>>>>>>>>>> 1. Parse state files and programmatically >> >>> > > > > > >> derive >> >>> > > > > > >>>> the >> >>> > > > > > >>>>>>>> schema >> >>> > > > > > >>>>>>>>>> and >> >>> > > > > > >>>>>>>>>>>>>>>> structural >> >>> > > > > > >>>>>>>>>>>>>>>>> guarantees for each state. >> >>> > > > > > >>>>>>>>>>>>>>>>> 2. Leverage existing Flink Table >> utilities, >> >>> > > > > > >> such >> >>> > > > > > >>>> as >> >>> > > > > > >>>>>>>>>>>>>>>>> LegacyTypeInfoDataTypeConverter (in >> >>> > > > > > >>>>>>>>>>>>>>> org.apache.flink.table.types.utils), >> >>> > > > > > >>>>>>>>>>>>>>>> to >> >>> > > > > > >>>>>>>>>>>>>>>>> bridge TypeInformation and DataType >> >>> > > > > > >> conversions. >> >>> > > > > > >>>>>>>>>>>>>>>>> >> >>> > > > > > >>>>>>>>>>>>>>>>> If we can not store the TypeInformation or >> >>> > > > > > >>>>>>>> StateDescriptor >> >>> > > > > > >>>>>>>>>> into >> >>> > > > > > >>>>>>>>>>>> the >> >>> > > > > > >>>>>>>>>>>>>> raw >> >>> > > > > > >>>>>>>>>>>>>>>>> state files, I am +1 for this FLIP to use >> >>> > > > > > >>>> metadata >> >>> > > > > > >>>>>>> column >> >>> > > > > > >>>>>>>>> to >> >>> > > > > > >>>>>>>>>>>>> retrieve >> >>> > > > > > >>>>>>>>>>>>>>>>> information. >> >>> > > > > > >>>>>>>>>>>>>>>>> >> >>> > > > > > >>>>>>>>>>>>>>>>> Best, >> >>> > > > > > >>>>>>>>>>>>>>>>> Shengkai >> >>> > > > > > >>>>>>>>>>>>>>>>> >> >>> > > > > > >>>>>>>>>>>>>>>>> >> >>> > > > > > >>>>>>>>>>>>>>>>> >> >>> > > > > > >>>>>>>>>>>>>>>>> >> >>> > > > > > >>>>>>>>>>>>>>>>> Zakelly Lan <zakelly....@gmail.com> >> >>> > > > > > >>>> 于2025年3月12日周三 >> >>> > > > > > >>>>>>>> 12:43写道: >> >>> > > > > > >>>>>>>>>>>>>>>>> >> >>> > > > > > >>>>>>>>>>>>>>>>>> Hi Gabor and Shengkai, >> >>> > > > > > >>>>>>>>>>>>>>>>>> >> >>> > > > > > >>>>>>>>>>>>>>>>>> Thanks for sharing your thoughts! This >> is a >> >>> > > > > > >>>> long >> >>> > > > > > >>>>>>>>> discussion >> >>> > > > > > >>>>>>>>>>> and >> >>> > > > > > >>>>>>>>>>>>>> sorry >> >>> > > > > > >>>>>>>>>>>>>>>> for >> >>> > > > > > >>>>>>>>>>>>>>>>>> the late reply (I'm busy catching up with >> >>> > > > > > >>>> release >> >>> > > > > > >>>>>>> 2.0 >> >>> > > > > > >>>>>>>>> these >> >>> > > > > > >>>>>>>>>>>>> days). >> >>> > > > > > >>>>>>>>>>>>>>>>>> >> >>> > > > > > >>>>>>>>>>>>>>>>>> 1. State TTL for Value Columns >> >>> > > > > > >>>>>>>>>>>>>>>>>> >> >>> > > > > > >>>>>>>>>>>>>>>>>> >> >>> > > > > > >>>>>>>>>>>>>>>>>> Let me first clarify your thoughts to >> ensure >> >>> > > > > > >> I >> >>> > > > > > >>>>>>>> understand >> >>> > > > > > >>>>>>>>>>>>>> correctly. >> >>> > > > > > >>>>>>>>>>>>>>>>> IIUC, >> >>> > > > > > >>>>>>>>>>>>>>>>>> there is no persistent configuration for >> >>> > > > > > >> state >> >>> > > > > > >>>> TTL >> >>> > > > > > >>>>>>> in >> >>> > > > > > >>>>>>>> the >> >>> > > > > > >>>>>>>>>>>>>> checkpoint. >> >>> > > > > > >>>>>>>>>>>>>>>>> While >> >>> > > > > > >>>>>>>>>>>>>>>>>> you can infer that TTL is enabled by >> reading >> >>> > > > > > >>>> the >> >>> > > > > > >>>>>>>>>> serializer, >> >>> > > > > > >>>>>>>>>>>> the >> >>> > > > > > >>>>>>>>>>>>>>>>> checkpoint >> >>> > > > > > >>>>>>>>>>>>>>>>>> itself only stores the last access time >> for >> >>> > > > > > >>>> each >> >>> > > > > > >>>>>>> value. >> >>> > > > > > >>>>>>>>> So >> >>> > > > > > >>>>>>>>>>> the >> >>> > > > > > >>>>>>>>>>>>> only >> >>> > > > > > >>>>>>>>>>>>>>>> thing >> >>> > > > > > >>>>>>>>>>>>>>>>>> we can show is the last access time for >> each >> >>> > > > > > >>>>> value. >> >>> > > > > > >>>>>>> But >> >>> > > > > > >>>>>>>>> it >> >>> > > > > > >>>>>>>>>> is >> >>> > > > > > >>>>>>>>>>>> not >> >>> > > > > > >>>>>>>>>>>>>>>>> required >> >>> > > > > > >>>>>>>>>>>>>>>>>> for all state backends to store this, as >> >>> they >> >>> > > > > > >>>> may >> >>> > > > > > >>>>>>>>> directly >> >>> > > > > > >>>>>>>>>>>> store >> >>> > > > > > >>>>>>>>>>>>>> the >> >>> > > > > > >>>>>>>>>>>>>>>>>> expired time. This will also increase the >> >>> > > > > > >>>>>>> difficulty of >> >>> > > > > > >>>>>>>>>>>>>>> implementation >> >>> > > > > > >>>>>>>>>>>>>>>> & >> >>> > > > > > >>>>>>>>>>>>>>>>>> maintenance. >> >>> > > > > > >>>>>>>>>>>>>>>>>> >> >>> > > > > > >>>>>>>>>>>>>>>>>> This once again reiterates the >> importance of >> >>> > > > > > >>>>> unified >> >>> > > > > > >>>>>>>>>> metadata >> >>> > > > > > >>>>>>>>>>>> for >> >>> > > > > > >>>>>>>>>>>>>>>>>> checkpoints. I’m planning on adding this, >> >>> and >> >>> > > > > > >>>> we >> >>> > > > > > >>>>> may >> >>> > > > > > >>>>>>>>>>>> collaborate >> >>> > > > > > >>>>>>>>>>>>> on >> >>> > > > > > >>>>>>>>>>>>>>> it >> >>> > > > > > >>>>>>>>>>>>>>>> in >> >>> > > > > > >>>>>>>>>>>>>>>>>> the future. >> >>> > > > > > >>>>>>>>>>>>>>>>>> >> >>> > > > > > >>>>>>>>>>>>>>>>>> 2. Metadata Table vs. Metadata Column >> >>> > > > > > >>>>>>>>>>>>>>>>>> >> >>> > > > > > >>>>>>>>>>>>>>>>>> >> >>> > > > > > >>>>>>>>>>>>>>>>>> I'm not in favor of adding a new >> connector >> >>> > > > > > >> for >> >>> > > > > > >>>>>>>> metadata. >> >>> > > > > > >>>>>>>>>> The >> >>> > > > > > >>>>>>>>>>>>>> metadata >> >>> > > > > > >>>>>>>>>>>>>>>> is >> >>> > > > > > >>>>>>>>>>>>>>>>>> more like one-time information instead >> of a >> >>> > > > > > >>>>>>> streaming >> >>> > > > > > >>>>>>>>> data >> >>> > > > > > >>>>>>>>>>> that >> >>> > > > > > >>>>>>>>>>>>>>> changes >> >>> > > > > > >>>>>>>>>>>>>>>>> all >> >>> > > > > > >>>>>>>>>>>>>>>>>> the time, so a single connector seems to >> be >> >>> > > > > > >> an >> >>> > > > > > >>>>>>>> overkill. >> >>> > > > > > >>>>>>>>> It >> >>> > > > > > >>>>>>>>>>> is >> >>> > > > > > >>>>>>>>>>>>> not >> >>> > > > > > >>>>>>>>>>>>>>> easy >> >>> > > > > > >>>>>>>>>>>>>>>>> to >> >>> > > > > > >>>>>>>>>>>>>>>>>> withdraw a connector if we have a better >> >>> > > > > > >>>> solution >> >>> > > > > > >>>>> in >> >>> > > > > > >>>>>>>>>> future. >> >>> > > > > > >>>>>>>>>>>> I'm >> >>> > > > > > >>>>>>>>>>>>>> not >> >>> > > > > > >>>>>>>>>>>>>>>>>> familiar with current Catalog >> capabilities, >> >>> > > > > > >>>> and if >> >>> > > > > > >>>>>>> it >> >>> > > > > > >>>>>>>>> could >> >>> > > > > > >>>>>>>>>>>>> extract >> >>> > > > > > >>>>>>>>>>>>>>> and >> >>> > > > > > >>>>>>>>>>>>>>>>>> show some operator-level information from >> >>> > > > > > >>>>> savepoint, >> >>> > > > > > >>>>>>>> that >> >>> > > > > > >>>>>>>>>>> would >> >>> > > > > > >>>>>>>>>>>>> be >> >>> > > > > > >>>>>>>>>>>>>>>> great. >> >>> > > > > > >>>>>>>>>>>>>>>>>> >> >>> > > > > > >>>>>>>>>>>>>>>>>> If the Catalog can't do that, I would >> >>> > > > > > >> consider >> >>> > > > > > >>>> the >> >>> > > > > > >>>>>>>>> current >> >>> > > > > > >>>>>>>>>>> FLIP >> >>> > > > > > >>>>>>>>>>>>> to >> >>> > > > > > >>>>>>>>>>>>>>> be a >> >>> > > > > > >>>>>>>>>>>>>>>>>> compromise solution. >> >>> > > > > > >>>>>>>>>>>>>>>>>> >> >>> > > > > > >>>>>>>>>>>>>>>>>> And if we have that unified metadata for >> >>> > > > > > >>>>>>>>>> checkpoint/savepoint >> >>> > > > > > >>>>>>>>>>>> in >> >>> > > > > > >>>>>>>>>>>>>>>> future, >> >>> > > > > > >>>>>>>>>>>>>>>>> we >> >>> > > > > > >>>>>>>>>>>>>>>>>> may directly register savepoint in >> catalog, >> >>> > > > > > >> and >> >>> > > > > > >>>>>>> create >> >>> > > > > > >>>>>>>> a >> >>> > > > > > >>>>>>>>>>> source >> >>> > > > > > >>>>>>>>>>>>>>> without >> >>> > > > > > >>>>>>>>>>>>>>>>>> specifying complex columns, as well as >> >>> > > > > > >> describe >> >>> > > > > > >>>>> the >> >>> > > > > > >>>>>>>>>> savepoint >> >>> > > > > > >>>>>>>>>>>>>> catalog >> >>> > > > > > >>>>>>>>>>>>>>>> to >> >>> > > > > > >>>>>>>>>>>>>>>>>> get the metadata. That's a good solution >> in >> >>> > > > > > >> my >> >>> > > > > > >>>>> mind. >> >>> > > > > > >>>>>>>>>>>>>>>>>> >> >>> > > > > > >>>>>>>>>>>>>>>>>> >> >>> > > > > > >>>>>>>>>>>>>>>>>> Best, >> >>> > > > > > >>>>>>>>>>>>>>>>>> Zakelly >> >>> > > > > > >>>>>>>>>>>>>>>>>> >> >>> > > > > > >>>>>>>>>>>>>>>>>> On Wed, Mar 12, 2025 at 10:35 AM Shengkai >> >>> > > > > > >> Fang >> >>> > > > > > >>>> < >> >>> > > > > > >>>>>>>>>>>>> fskm...@gmail.com> >> >>> > > > > > >>>>>>>>>>>>>>>>> wrote: >> >>> > > > > > >>>>>>>>>>>>>>>>>> >> >>> > > > > > >>>>>>>>>>>>>>>>>>> Hi Gabor, >> >>> > > > > > >>>>>>>>>>>>>>>>>>> >> >>> > > > > > >>>>>>>>>>>>>>>>>>>> 2. Adding a new connector with >> >>> > > > > > >>>>>>> `savepoint-metadata` >> >>> > > > > > >>>>>>>>>>>>>>>>>>> >> >>> > > > > > >>>>>>>>>>>>>>>>>>> I would argue against introducing a new >> >>> > > > > > >>>>> connector >> >>> > > > > > >>>>>>>> type >> >>> > > > > > >>>>>>>>>>> named >> >>> > > > > > >>>>>>>>>>>>>>>>>>> savepoint-metadata, as the existing >> Catalog >> >>> > > > > > >>>>>>> mechanism >> >>> > > > > > >>>>>>>>> can >> >>> > > > > > >>>>>>>>>>>>>>> inherently >> >>> > > > > > >>>>>>>>>>>>>>>>>>> provide the necessary connector factory >> >>> > > > > > >>>>>>> capabilities. >> >>> > > > > > >>>>>>>>>> I’ve >> >>> > > > > > >>>>>>>>>>>>>> detailed >> >>> > > > > > >>>>>>>>>>>>>>>>> this >> >>> > > > > > >>>>>>>>>>>>>>>>>>> proposal in branch[1]. Please take a >> moment >> >>> > > > > > >>>> to >> >>> > > > > > >>>>>>> review >> >>> > > > > > >>>>>>>>> it. >> >>> > > > > > >>>>>>>>>>>>>>>>>>> >> >>> > > > > > >>>>>>>>>>>>>>>>>>> If we introduce a connector named >> >>> > > > > > >>>>>>>> `savepoint-metadata`, >> >>> > > > > > >>>>>>>>>> it >> >>> > > > > > >>>>>>>>>>>>> means >> >>> > > > > > >>>>>>>>>>>>>>> user >> >>> > > > > > >>>>>>>>>>>>>>>>> can >> >>> > > > > > >>>>>>>>>>>>>>>>>>> create a temporary table with connector >> >>> > > > > > >>>>>>>>>>> `savepoint-metadata` >> >>> > > > > > >>>>>>>>>>>>> and >> >>> > > > > > >>>>>>>>>>>>>>> the >> >>> > > > > > >>>>>>>>>>>>>>>>>>> connector needs to check whether table >> >>> > > > > > >>>> schema is >> >>> > > > > > >>>>>>> same >> >>> > > > > > >>>>>>>>> to >> >>> > > > > > >>>>>>>>>>> the >> >>> > > > > > >>>>>>>>>>>>>> schema >> >>> > > > > > >>>>>>>>>>>>>>>> we >> >>> > > > > > >>>>>>>>>>>>>>>>>>> proposed in the FLIP. On the other hand, >> >>> > > > > > >> it's >> >>> > > > > > >>>>> not >> >>> > > > > > >>>>>>>> easy >> >>> > > > > > >>>>>>>>>> work >> >>> > > > > > >>>>>>>>>>>> for >> >>> > > > > > >>>>>>>>>>>>>>>> others >> >>> > > > > > >>>>>>>>>>>>>>>>> to >> >>> > > > > > >>>>>>>>>>>>>>>>>>> users a metadata table with same schema. >> >>> > > > > > >>>>>>>>>>>>>>>>>>> >> >>> > > > > > >>>>>>>>>>>>>>>>>>> [1] >> >>> > > > > > >>>>>>>>>>>>>>>>>>> >> >>> > > > > > >>>>>>>>>>>>>>>>>>> >> >>> > > > > > >>>>>>>>>>>>>>>>>> >> >>> > > > > > >>>>>>>>>>>>>>>>> >> >>> > > > > > >>>>>>>>>>>>>>>> >> >>> > > > > > >>>>>>>>>>>>>>> >> >>> > > > > > >>>>>>>>>>>>>> >> >>> > > > > > >>>>>>>>>>>>> >> >>> > > > > > >>>>>>>>>>>> >> >>> > > > > > >>>>>>>>>>> >> >>> > > > > > >>>>>>>>>> >> >>> > > > > > >>>>>>>>> >> >>> > > > > > >>>>>>>> >> >>> > > > > > >>>>>>> >> >>> > > > > > >>>>> >> >>> > > > > > >>>> >> >>> > > > > > >> >> >>> > > > > > >> >>> > > > > >> >>> > > > >> >>> > > >> >>> > >> >>> >> https://github.com/apache/flink/compare/master...fsk119:flink:state-metadata?expand=1#diff-712a7bc92fe46c405fb0e61b475bb2a005cb7a72bab7df28bbb92744bcb5f465R63 >> >>> > > > > > >>>>>>>>>>>>>>>>>>> >> >>> > > > > > >>>>>>>>>>>>>>>>>>> Best, >> >>> > > > > > >>>>>>>>>>>>>>>>>>> Shengkai >> >>> > > > > > >>>>>>>>>>>>>>>>>>> >> >>> > > > > > >>>>>>>>>>>>>>>>>>> Gabor Somogyi < >> gabor.g.somo...@gmail.com> >> >>> > > > > > >>>>>>>>> 于2025年3月11日周二 >> >>> > > > > > >>>>>>>>>>>>> 16:56写道: >> >>> > > > > > >>>>>>>>>>>>>>>>>>> >> >>> > > > > > >>>>>>>>>>>>>>>>>>>> Hi Shengkai, >> >>> > > > > > >>>>>>>>>>>>>>>>>>>> >> >>> > > > > > >>>>>>>>>>>>>>>>>>>>> 1. State TTL for Value Columns >> >>> > > > > > >>>>>>>>>>>>>>>>>>>> >> >>> > > > > > >>>>>>>>>>>>>>>>>>>> From directional perspective I agree >> your >> >>> > > > > > >>>> idea >> >>> > > > > > >>>>>>> how >> >>> > > > > > >>>>>>>> it >> >>> > > > > > >>>>>>>>>> can >> >>> > > > > > >>>>>>>>>>>> be >> >>> > > > > > >>>>>>>>>>>>>>>>>> implemented. >> >>> > > > > > >>>>>>>>>>>>>>>>>>>> Previously I've mentioned that TTL >> >>> > > > > > >>>> information >> >>> > > > > > >>>>>>> is >> >>> > > > > > >>>>>>>> not >> >>> > > > > > >>>>>>>>>>>> exposed >> >>> > > > > > >>>>>>>>>>>>>> on >> >>> > > > > > >>>>>>>>>>>>>>>> the >> >>> > > > > > >>>>>>>>>>>>>>>>>>> state >> >>> > > > > > >>>>>>>>>>>>>>>>>>>> processor API (which the SQL state >> >>> > > > > > >>>> connector >> >>> > > > > > >>>>>>> uses >> >>> > > > > > >>>>>>>> to >> >>> > > > > > >>>>>>>>>> read >> >>> > > > > > >>>>>>>>>>>>> data) >> >>> > > > > > >>>>>>>>>>>>>>>>>>>> and unless somebody show me the >> opposite >> >>> > > > > > >>>> this >> >>> > > > > > >>>>>>> FLIP >> >>> > > > > > >>>>>>>> is >> >>> > > > > > >>>>>>>>>> not >> >>> > > > > > >>>>>>>>>>>>> going >> >>> > > > > > >>>>>>>>>>>>>>> to >> >>> > > > > > >>>>>>>>>>>>>>>>>>> address >> >>> > > > > > >>>>>>>>>>>>>>>>>>>> this to avoid feature creep. Our users >> >>> > > > > > >> are >> >>> > > > > > >>>>> also >> >>> > > > > > >>>>>>>>>>> interested >> >>> > > > > > >>>>>>>>>>>> in >> >>> > > > > > >>>>>>>>>>>>>> TTL >> >>> > > > > > >>>>>>>>>>>>>>>> so >> >>> > > > > > >>>>>>>>>>>>>>>>>>>> sooner or later we're going to expose >> it, >> >>> > > > > > >>>> this >> >>> > > > > > >>>>>>> is >> >>> > > > > > >>>>>>>>>> matter >> >>> > > > > > >>>>>>>>>>> of >> >>> > > > > > >>>>>>>>>>>>>>>>> scheduling. >> >>> > > > > > >>>>>>>>>>>>>>>>>>>> >> >>> > > > > > >>>>>>>>>>>>>>>>>>>>> 2. Adding a new connector with >> >>> > > > > > >>>>>>>> `savepoint-metadata` >> >>> > > > > > >>>>>>>>>>>>>>>>>>>> >> >>> > > > > > >>>>>>>>>>>>>>>>>>>> Not sure I understand your point at all >> >>> > > > > > >>>>> related >> >>> > > > > > >>>>>>>>>>>> StateCatalog. >> >>> > > > > > >>>>>>>>>>>>>>> First >> >>> > > > > > >>>>>>>>>>>>>>>>> of >> >>> > > > > > >>>>>>>>>>>>>>>>>>> all >> >>> > > > > > >>>>>>>>>>>>>>>>>>>> I can't agree more that StateCatalog is >> >>> > > > > > >>>> needed >> >>> > > > > > >>>>>>> and >> >>> > > > > > >>>>>>>>> is a >> >>> > > > > > >>>>>>>>>>>>> planned >> >>> > > > > > >>>>>>>>>>>>>>>>>> building >> >>> > > > > > >>>>>>>>>>>>>>>>>>>> block in an upcoming >> >>> > > > > > >>>>>>>>>>>>>>>>>>>> FLIP but not sure how can it help now? >> No >> >>> > > > > > >>>>> matter >> >>> > > > > > >>>>>>>>> what, >> >>> > > > > > >>>>>>>>>>> your >> >>> > > > > > >>>>>>>>>>>>>>>> knowledge >> >>> > > > > > >>>>>>>>>>>>>>>>>> is >> >>> > > > > > >>>>>>>>>>>>>>>>>>>> essential when we add StateCatalog. Let >> >>> > > > > > >> me >> >>> > > > > > >>>>>>> expose >> >>> > > > > > >>>>>>>> my >> >>> > > > > > >>>>>>>>>>>>>>> understanding >> >>> > > > > > >>>>>>>>>>>>>>>> in >> >>> > > > > > >>>>>>>>>>>>>>>>>>> this >> >>> > > > > > >>>>>>>>>>>>>>>>>>>> area: >> >>> > > > > > >>>>>>>>>>>>>>>>>>>> * First we need create table statements >> >>> > > > > > >> to >> >>> > > > > > >>>>>>> access >> >>> > > > > > >>>>>>>>> state >> >>> > > > > > >>>>>>>>>>>> data >> >>> > > > > > >>>>>>>>>>>>>> and >> >>> > > > > > >>>>>>>>>>>>>>>>>> metadata >> >>> > > > > > >>>>>>>>>>>>>>>>>>>> * When we have that then we can add >> >>> > > > > > >>>>> StateCatalog >> >>> > > > > > >>>>>>>>> which >> >>> > > > > > >>>>>>>>>>>> could >> >>> > > > > > >>>>>>>>>>>>>>>>>> potentially >> >>> > > > > > >>>>>>>>>>>>>>>>>>>> ease the life of users by for ex. >> giving >> >>> > > > > > >>>>>>>>> off-the-shelf >> >>> > > > > > >>>>>>>>>>>> tables >> >>> > > > > > >>>>>>>>>>>>>>>> without >> >>> > > > > > >>>>>>>>>>>>>>>>>>>> sweating with create table statements >> >>> > > > > > >>>>>>>>>>>>>>>>>>>> >> >>> > > > > > >>>>>>>>>>>>>>>>>>>> User expectations: >> >>> > > > > > >>>>>>>>>>>>>>>>>>>> * See state data (this is fulfilled >> with >> >>> > > > > > >>>> the >> >>> > > > > > >>>>>>>> existing >> >>> > > > > > >>>>>>>>>>>>>> connector) >> >>> > > > > > >>>>>>>>>>>>>>>>>>>> * See metadata about state data like >> TTL >> >>> > > > > > >>>> (this >> >>> > > > > > >>>>>>> can >> >>> > > > > > >>>>>>>> be >> >>> > > > > > >>>>>>>>>>> added >> >>> > > > > > >>>>>>>>>>>>> as >> >>> > > > > > >>>>>>>>>>>>>>>>> metadata >> >>> > > > > > >>>>>>>>>>>>>>>>>>>> column as you suggested since it >> belongs >> >>> > > > > > >> to >> >>> > > > > > >>>>> the >> >>> > > > > > >>>>>>>> data) >> >>> > > > > > >>>>>>>>>>>>>>>>>>>> * See metadata about operators (this >> can >> >>> > > > > > >> be >> >>> > > > > > >>>>>>> added >> >>> > > > > > >>>>>>>>> from >> >>> > > > > > >>>>>>>>>>>>>>>>>>> savepoint-metadata) >> >>> > > > > > >>>>>>>>>>>>>>>>>>>> >> >>> > > > > > >>>>>>>>>>>>>>>>>>>> Important to highlight that state data >> >>> > > > > > >>>> table >> >>> > > > > > >>>>>>> format >> >>> > > > > > >>>>>>>>>>> differs >> >>> > > > > > >>>>>>>>>>>>>> from >> >>> > > > > > >>>>>>>>>>>>>>>>> state >> >>> > > > > > >>>>>>>>>>>>>>>>>>>> metadata table format. Namely one table >> >>> > > > > > >> has >> >>> > > > > > >>>>> rows >> >>> > > > > > >>>>>>>> for >> >>> > > > > > >>>>>>>>>>> state >> >>> > > > > > >>>>>>>>>>>>>> values >> >>> > > > > > >>>>>>>>>>>>>>>> and >> >>> > > > > > >>>>>>>>>>>>>>>>>>>> another has rows for operators, right? >> >>> > > > > > >>>>>>>>>>>>>>>>>>>> I think that's the reason why you've >> >>> > > > > > >>>>> pinpointed >> >>> > > > > > >>>>>>> out >> >>> > > > > > >>>>>>>>>> that >> >>> > > > > > >>>>>>>>>>>> the >> >>> > > > > > >>>>>>>>>>>>>>>>> suggested >> >>> > > > > > >>>>>>>>>>>>>>>>>>>> metadata columns are somewhat clunky. >> >>> > > > > > >>>>>>>>>>>>>>>>>>>> >> >>> > > > > > >>>>>>>>>>>>>>>>>>>> As a conclusion I agree to add >> >>> > > > > > >>>>> ${state-name}_ttl >> >>> > > > > > >>>>>>>>>> metadata >> >>> > > > > > >>>>>>>>>>>>>> column >> >>> > > > > > >>>>>>>>>>>>>>>>> later >> >>> > > > > > >>>>>>>>>>>>>>>>>> on >> >>> > > > > > >>>>>>>>>>>>>>>>>>>> since it belongs to the state value and >> >>> > > > > > >>>>> adding a >> >>> > > > > > >>>>>>>> new >> >>> > > > > > >>>>>>>>>>> table >> >>> > > > > > >>>>>>>>>>>>> type >> >>> > > > > > >>>>>>>>>>>>>>>> (like >> >>> > > > > > >>>>>>>>>>>>>>>>>> you >> >>> > > > > > >>>>>>>>>>>>>>>>>>>> suggested similar to PG [1]) >> >>> > > > > > >>>>>>>>>>>>>>>>>>>> for metadata. Please see how Spark does >> >>> > > > > > >>>> that >> >>> > > > > > >>>>> too >> >>> > > > > > >>>>>>>> [2]. >> >>> > > > > > >>>>>>>>>>>>>>>>>>>> >> >>> > > > > > >>>>>>>>>>>>>>>>>>>> If you have better approach then please >> >>> > > > > > >>>>>>> elaborate >> >>> > > > > > >>>>>>>>> with >> >>> > > > > > >>>>>>>>>>> more >> >>> > > > > > >>>>>>>>>>>>>>> details >> >>> > > > > > >>>>>>>>>>>>>>>>> and >> >>> > > > > > >>>>>>>>>>>>>>>>>>>> help me to understand your point. >> >>> > > > > > >>>>>>>>>>>>>>>>>>>> >> >>> > > > > > >>>>>>>>>>>>>>>>>>>>> Up until now we've seen even in TB >> >>> > > > > > >>>>> savepoints >> >>> > > > > > >>>>>>>> that >> >>> > > > > > >>>>>>>>>> the >> >>> > > > > > >>>>>>>>>>>>> number >> >>> > > > > > >>>>>>>>>>>>>>> of >> >>> > > > > > >>>>>>>>>>>>>>>>> keys >> >>> > > > > > >>>>>>>>>>>>>>>>>>> can >> >>> > > > > > >>>>>>>>>>>>>>>>>>>>> be extremely huge but not the per key >> >>> > > > > > >>>> state >> >>> > > > > > >>>>>>>> itself. >> >>> > > > > > >>>>>>>>>>>>>>>>>>>>> But again, this is a good feature >> as-is >> >>> > > > > > >>>> and >> >>> > > > > > >>>>>>> can >> >>> > > > > > >>>>>>>> be >> >>> > > > > > >>>>>>>>>>>> handled >> >>> > > > > > >>>>>>>>>>>>>> in a >> >>> > > > > > >>>>>>>>>>>>>>>>>>> separate >> >>> > > > > > >>>>>>>>>>>>>>>>>>>>> jira. >> >>> > > > > > >>>>>>>>>>>>>>>>>>>> >> >>> > > > > > >>>>>>>>>>>>>>>>>>>> I've just created >> >>> > > > > > >>>>>>>>>>>>>>>> >> >>> > > > > > >> https://issues.apache.org/jira/browse/FLINK-37456. >> >>> > > > > > >>>>>>>>>>>>>>>>>>>> >> >>> > > > > > >>>>>>>>>>>>>>>>>>>> [1] >> >>> > > > > > >>>>>>>>>>>>>> >> >>> > > > > > >>>>> >> >>> https://www.postgresql.org/docs/current/view-pg-tables.htmlhttps://www.databricks.com/blog/announcing-state-reader-api-new-statestore-data-source >> >>> > > > > > >>>>>>>>>>>>>>>>>>>> >> >>> > > > > > >>>>>>>>>>>>>>>>>>>> BR, >> >>> > > > > > >>>>>>>>>>>>>>>>>>>> G >> >>> > > > > > >>>>>>>>>>>>>>>>>>>> >> >>> > > > > > >>>>>>>>>>>>>>>>>>>> >> >>> > > > > > >>>>>>>>>>>>>>>>>>>> On Tue, Mar 11, 2025 at 3:55 AM >> Shengkai >> >>> > > > > > >>>> Fang >> >>> > > > > > >>>>> < >> >>> > > > > > >>>>>>>>>>>>>> fskm...@gmail.com >> >>> > > > > > >>>>>>>>>>>>>>>> >> >>> > > > > > >>>>>>>>>>>>>>>>>> wrote: >> >>> > > > > > >>>>>>>>>>>>>>>>>>>> >> >>> > > > > > >>>>>>>>>>>>>>>>>>>>> Hi, Gabor. Thanks for your response. >> >>> > > > > > >>>>>>>>>>>>>>>>>>>>> >> >>> > > > > > >>>>>>>>>>>>>>>>>>>>>> 1. State TTL for Value Columns >> >>> > > > > > >>>>>>>>>>>>>>>>>>>>> >> >>> > > > > > >>>>>>>>>>>>>>>>>>>>> Thank you for addressing the >> >>> > > > > > >> limitations >> >>> > > > > > >>>>> here. >> >>> > > > > > >>>>>>>>>>> However, I >> >>> > > > > > >>>>>>>>>>>>>>> believe >> >>> > > > > > >>>>>>>>>>>>>>>>> it >> >>> > > > > > >>>>>>>>>>>>>>>>>>>> would >> >>> > > > > > >>>>>>>>>>>>>>>>>>>>> be beneficial to further clarify the >> >>> > > > > > >> API >> >>> > > > > > >>>> in >> >>> > > > > > >>>>>>> this >> >>> > > > > > >>>>>>>>> FLIP >> >>> > > > > > >>>>>>>>>>>>>> regarding >> >>> > > > > > >>>>>>>>>>>>>>>> how >> >>> > > > > > >>>>>>>>>>>>>>>>>>> users >> >>> > > > > > >>>>>>>>>>>>>>>>>>>>> can specify the TTL column. >> >>> > > > > > >>>>>>>>>>>>>>>>>>>>> >> >>> > > > > > >>>>>>>>>>>>>>>>>>>>> One potential approach that comes to >> >>> > > > > > >>>> mind is >> >>> > > > > > >>>>>>>> using >> >>> > > > > > >>>>>>>>> a >> >>> > > > > > >>>>>>>>>>>>>>> standardized >> >>> > > > > > >>>>>>>>>>>>>>>>>>> naming >> >>> > > > > > >>>>>>>>>>>>>>>>>>>>> convention such as ${state-name}_ttl >> >>> > > > > > >> for >> >>> > > > > > >>>> the >> >>> > > > > > >>>>>>>>> metadata >> >>> > > > > > >>>>>>>>>>>>> column >> >>> > > > > > >>>>>>>>>>>>>>> that >> >>> > > > > > >>>>>>>>>>>>>>>>>>> defines >> >>> > > > > > >>>>>>>>>>>>>>>>>>>>> the TTL value. In terms of >> >>> > > > > > >>>> implementation, >> >>> > > > > > >>>>> the >> >>> > > > > > >>>>>>>>>>>>>>>> listReadableMetadata >> >>> > > > > > >>>>>>>>>>>>>>>>>>>>> function could: >> >>> > > > > > >>>>>>>>>>>>>>>>>>>>> >> >>> > > > > > >>>>>>>>>>>>>>>>>>>>> 1. Read the table’s columns and >> >>> > > > > > >>>>> configuration, >> >>> > > > > > >>>>>>>>>>>>>>>>>>>>> 2. Extract all defined state names, >> and >> >>> > > > > > >>>>>>>>>>>>>>>>>>>>> 3. Return a structured list of >> metadata >> >>> > > > > > >>>>>>> entries >> >>> > > > > > >>>>>>>>>>> formatted >> >>> > > > > > >>>>>>>>>>>>> as >> >>> > > > > > >>>>>>>>>>>>>>>>>>>>> ${state-name}_ttl. >> >>> > > > > > >>>>>>>>>>>>>>>>>>>>> >> >>> > > > > > >>>>>>>>>>>>>>>>>>>>> WDYT? >> >>> > > > > > >>>>>>>>>>>>>>>>>>>>> >> >>> > > > > > >>>>>>>>>>>>>>>>>>>>>> 2. Adding a new connector with >> >>> > > > > > >>>>>>>>> `savepoint-metadata` >> >>> > > > > > >>>>>>>>>>>>>>>>>>>>> >> >>> > > > > > >>>>>>>>>>>>>>>>>>>>> Introducing a new connector type at >> >>> > > > > > >> this >> >>> > > > > > >>>>> stage >> >>> > > > > > >>>>>>>> may >> >>> > > > > > >>>>>>>>>>>>>>> unnecessarily >> >>> > > > > > >>>>>>>>>>>>>>>>>>>> complicate >> >>> > > > > > >>>>>>>>>>>>>>>>>>>>> the system. Given that every table >> >>> > > > > > >>>> already >> >>> > > > > > >>>>>>>> belongs >> >>> > > > > > >>>>>>>>>> to a >> >>> > > > > > >>>>>>>>>>>>>>> Catalog, >> >>> > > > > > >>>>>>>>>>>>>>>>>> which >> >>> > > > > > >>>>>>>>>>>>>>>>>>> is >> >>> > > > > > >>>>>>>>>>>>>>>>>>>>> designed to provide a Factory for >> >>> > > > > > >>>> building >> >>> > > > > > >>>>>>> source >> >>> > > > > > >>>>>>>>> or >> >>> > > > > > >>>>>>>>>>> sink >> >>> > > > > > >>>>>>>>>>>>>>>>>> connectors, I >> >>> > > > > > >>>>>>>>>>>>>>>>>>>>> propose integrating a dedicated >> >>> > > > > > >>>> StateCatalog >> >>> > > > > > >>>>>>>>> instead. >> >>> > > > > > >>>>>>>>>>>> This >> >>> > > > > > >>>>>>>>>>>>>>>> approach >> >>> > > > > > >>>>>>>>>>>>>>>>>>> would >> >>> > > > > > >>>>>>>>>>>>>>>>>>>>> allow us to: >> >>> > > > > > >>>>>>>>>>>>>>>>>>>>> >> >>> > > > > > >>>>>>>>>>>>>>>>>>>>> 1. Leverage the Catalog’s existing >> >>> > > > > > >>>>>>> capabilities >> >>> > > > > > >>>>>>>> to >> >>> > > > > > >>>>>>>>>>> manage >> >>> > > > > > >>>>>>>>>>>>> TTL >> >>> > > > > > >>>>>>>>>>>>>>>>>> metadata >> >>> > > > > > >>>>>>>>>>>>>>>>>>>>> (e.g., state names and TTL logic) >> >>> > > > > > >> without >> >>> > > > > > >>>>>>>>> duplicating >> >>> > > > > > >>>>>>>>>>>>>>>>> functionality. >> >>> > > > > > >>>>>>>>>>>>>>>>>>>>> 2. Provide a unified interface for >> >>> > > > > > >>>> connector >> >>> > > > > > >>>>>>>>>>>> instantiation >> >>> > > > > > >>>>>>>>>>>>>> and >> >>> > > > > > >>>>>>>>>>>>>>>>>> metadata >> >>> > > > > > >>>>>>>>>>>>>>>>>>>>> handling through the Catalog’s Factory >> >>> > > > > > >>>>>>> pattern. >> >>> > > > > > >>>>>>>>>>>>>>>>>>>>> >> >>> > > > > > >>>>>>>>>>>>>>>>>>>>> Would this design decision better >> align >> >>> > > > > > >>>> with >> >>> > > > > > >>>>>>> our >> >>> > > > > > >>>>>>>>>>>>>> architecture’s >> >>> > > > > > >>>>>>>>>>>>>>>>>>>>> extensibility and reduce redundancy? >> >>> > > > > > >>>>>>>>>>>>>>>>>>>>> >> >>> > > > > > >>>>>>>>>>>>>>>>>>>>>> Up until now we've seen even in TB >> >>> > > > > > >>>>>>> savepoints >> >>> > > > > > >>>>>>>>> that >> >>> > > > > > >>>>>>>>>>> the >> >>> > > > > > >>>>>>>>>>>>>> number >> >>> > > > > > >>>>>>>>>>>>>>>> of >> >>> > > > > > >>>>>>>>>>>>>>>>>> keys >> >>> > > > > > >>>>>>>>>>>>>>>>>>>> can >> >>> > > > > > >>>>>>>>>>>>>>>>>>>>>> be extremely huge but not the per key >> >>> > > > > > >>>>> state >> >>> > > > > > >>>>>>>>> itself. >> >>> > > > > > >>>>>>>>>>>>>>>>>>>>>> But again, this is a good feature >> >>> > > > > > >> as-is >> >>> > > > > > >>>>> and >> >>> > > > > > >>>>>>> can >> >>> > > > > > >>>>>>>>> be >> >>> > > > > > >>>>>>>>>>>>> handled >> >>> > > > > > >>>>>>>>>>>>>>> in a >> >>> > > > > > >>>>>>>>>>>>>>>>>>>> separate >> >>> > > > > > >>>>>>>>>>>>>>>>>>>>>> jira. >> >>> > > > > > >>>>>>>>>>>>>>>>>>>>> >> >>> > > > > > >>>>>>>>>>>>>>>>>>>>> +1 for a separate jira. >> >>> > > > > > >>>>>>>>>>>>>>>>>>>>> >> >>> > > > > > >>>>>>>>>>>>>>>>>>>>> Best, >> >>> > > > > > >>>>>>>>>>>>>>>>>>>>> Shengkai >> >>> > > > > > >>>>>>>>>>>>>>>>>>>>> >> >>> > > > > > >>>>>>>>>>>>>>>>>>>>> Gabor Somogyi < >> >>> > > > > > >> gabor.g.somo...@gmail.com >> >>> > > > > > >>>>> >> >>> > > > > > >>>>>>>>>>> 于2025年3月10日周一 >> >>> > > > > > >>>>>>>>>>>>>>> 19:05写道: >> >>> > > > > > >>>>>>>>>>>>>>>>>>>>> >> >>> > > > > > >>>>>>>>>>>>>>>>>>>>>> Hi Shengkai, >> >>> > > > > > >>>>>>>>>>>>>>>>>>>>>> >> >>> > > > > > >>>>>>>>>>>>>>>>>>>>>> Please see my comments inline. >> >>> > > > > > >>>>>>>>>>>>>>>>>>>>>> >> >>> > > > > > >>>>>>>>>>>>>>>>>>>>>> BR, >> >>> > > > > > >>>>>>>>>>>>>>>>>>>>>> G >> >>> > > > > > >>>>>>>>>>>>>>>>>>>>>> >> >>> > > > > > >>>>>>>>>>>>>>>>>>>>>> >> >>> > > > > > >>>>>>>>>>>>>>>>>>>>>> On Mon, Mar 3, 2025 at 7:07 AM >> >>> > > > > > >> Shengkai >> >>> > > > > > >>>>>>> Fang < >> >>> > > > > > >>>>>>>>>>>>>>>> fskm...@gmail.com> >> >>> > > > > > >>>>>>>>>>>>>>>>>>>> wrote: >> >>> > > > > > >>>>>>>>>>>>>>>>>>>>>> >> >>> > > > > > >>>>>>>>>>>>>>>>>>>>>>> Hi, Gabor. Thanks for your the >> >>> > > > > > >> FLIP. >> >>> > > > > > >>>> I >> >>> > > > > > >>>>>>> have >> >>> > > > > > >>>>>>>>> some >> >>> > > > > > >>>>>>>>>>>>>> questions >> >>> > > > > > >>>>>>>>>>>>>>>>> about >> >>> > > > > > >>>>>>>>>>>>>>>>>>> the >> >>> > > > > > >>>>>>>>>>>>>>>>>>>>>> FLIP: >> >>> > > > > > >>>>>>>>>>>>>>>>>>>>>>> >> >>> > > > > > >>>>>>>>>>>>>>>>>>>>>>> 1. State TTL for Value Columns >> >>> > > > > > >>>>>>>>>>>>>>>>>>>>>>> How can users retrieve the state >> >>> > > > > > >> TTL >> >>> > > > > > >>>>>>>>>> (Time-to-Live) >> >>> > > > > > >>>>>>>>>>>> for >> >>> > > > > > >>>>>>>>>>>>>>> each >> >>> > > > > > >>>>>>>>>>>>>>>>>> value >> >>> > > > > > >>>>>>>>>>>>>>>>>>>>>> column? >> >>> > > > > > >>>>>>>>>>>>>>>>>>>>>>> From my understanding of the >> >>> > > > > > >> current >> >>> > > > > > >>>>>>> design, >> >>> > > > > > >>>>>>>> it >> >>> > > > > > >>>>>>>>>>> seems >> >>> > > > > > >>>>>>>>>>>>>> that >> >>> > > > > > >>>>>>>>>>>>>>>> this >> >>> > > > > > >>>>>>>>>>>>>>>>>>>>>>> functionality is not supported. >> >>> > > > > > >> Could >> >>> > > > > > >>>>> you >> >>> > > > > > >>>>>>>>> clarify >> >>> > > > > > >>>>>>>>>>> if >> >>> > > > > > >>>>>>>>>>>>>> there >> >>> > > > > > >>>>>>>>>>>>>>>> are >> >>> > > > > > >>>>>>>>>>>>>>>>>>> plans >> >>> > > > > > >>>>>>>>>>>>>>>>>>>> to >> >>> > > > > > >>>>>>>>>>>>>>>>>>>>>>> address this limitation? >> >>> > > > > > >>>>>>>>>>>>>>>>>>>>>>> >> >>> > > > > > >>>>>>>>>>>>>>>>>>>>>> >> >>> > > > > > >>>>>>>>>>>>>>>>>>>>>> Since the state processor API is not >> >>> > > > > > >>>> yet >> >>> > > > > > >>>>>>>> exposing >> >>> > > > > > >>>>>>>>>>> this >> >>> > > > > > >>>>>>>>>>>>>>>>> information >> >>> > > > > > >>>>>>>>>>>>>>>>>>> this >> >>> > > > > > >>>>>>>>>>>>>>>>>>>>>> would require several steps. >> >>> > > > > > >>>>>>>>>>>>>>>>>>>>>> First, the state processor API >> >>> > > > > > >> support >> >>> > > > > > >>>>>>> needs to >> >>> > > > > > >>>>>>>>> be >> >>> > > > > > >>>>>>>>>>>> added >> >>> > > > > > >>>>>>>>>>>>>>> which >> >>> > > > > > >>>>>>>>>>>>>>>>> can >> >>> > > > > > >>>>>>>>>>>>>>>>>> be >> >>> > > > > > >>>>>>>>>>>>>>>>>>>>> then >> >>> > > > > > >>>>>>>>>>>>>>>>>>>>>> exposed on the SQL API. >> >>> > > > > > >>>>>>>>>>>>>>>>>>>>>> This is definitely a future >> >>> > > > > > >> improvement >> >>> > > > > > >>>>>>> which >> >>> > > > > > >>>>>>>> is >> >>> > > > > > >>>>>>>>>>> useful >> >>> > > > > > >>>>>>>>>>>>> and >> >>> > > > > > >>>>>>>>>>>>>>> can >> >>> > > > > > >>>>>>>>>>>>>>>>> be >> >>> > > > > > >>>>>>>>>>>>>>>>>>>>> handled >> >>> > > > > > >>>>>>>>>>>>>>>>>>>>>> in a separate jira. >> >>> > > > > > >>>>>>>>>>>>>>>>>>>>>> >> >>> > > > > > >>>>>>>>>>>>>>>>>>>>>> >> >>> > > > > > >>>>>>>>>>>>>>>>>>>>>>> 2. Metadata Table vs. Metadata >> >>> > > > > > >> Column >> >>> > > > > > >>>>>>>>>>>>>>>>>>>>>>> The metadata information described >> >>> > > > > > >> in >> >>> > > > > > >>>>> the >> >>> > > > > > >>>>>>>> FLIP >> >>> > > > > > >>>>>>>>>>>> appears >> >>> > > > > > >>>>>>>>>>>>> to >> >>> > > > > > >>>>>>>>>>>>>>> be >> >>> > > > > > >>>>>>>>>>>>>>>>>>> intended >> >>> > > > > > >>>>>>>>>>>>>>>>>>>>> to >> >>> > > > > > >>>>>>>>>>>>>>>>>>>>>>> describe the state files stored at >> >>> > > > > > >> a >> >>> > > > > > >>>>>>> specific >> >>> > > > > > >>>>>>>>>>>> location. >> >>> > > > > > >>>>>>>>>>>>>> To >> >>> > > > > > >>>>>>>>>>>>>>>> me, >> >>> > > > > > >>>>>>>>>>>>>>>>>> this >> >>> > > > > > >>>>>>>>>>>>>>>>>>>>>> concept >> >>> > > > > > >>>>>>>>>>>>>>>>>>>>>>> aligns more closely with system >> >>> > > > > > >>>> tables >> >>> > > > > > >>>>>>> like >> >>> > > > > > >>>>>>>>>>> pg_tables >> >>> > > > > > >>>>>>>>>>>>> in >> >>> > > > > > >>>>>>>>>>>>>>>>>> PostgreSQL >> >>> > > > > > >>>>>>>>>>>>>>>>>>>> [1] >> >>> > > > > > >>>>>>>>>>>>>>>>>>>>>> or >> >>> > > > > > >>>>>>>>>>>>>>>>>>>>>>> the INFORMATION_SCHEMA in MySQL >> >>> > > > > > >> [2]. >> >>> > > > > > >>>>>>>>>>>>>>>>>>>>>>> >> >>> > > > > > >>>>>>>>>>>>>>>>>>>>>> >> >>> > > > > > >>>>>>>>>>>>>>>>>>>>>> Adding a new connector with >> >>> > > > > > >>>>>>>> `savepoint-metadata` >> >>> > > > > > >>>>>>>>>> is a >> >>> > > > > > >>>>>>>>>>>>>>>> possibility >> >>> > > > > > >>>>>>>>>>>>>>>>>>> where >> >>> > > > > > >>>>>>>>>>>>>>>>>>>>> we >> >>> > > > > > >>>>>>>>>>>>>>>>>>>>>> can create such functionality. >> >>> > > > > > >>>>>>>>>>>>>>>>>>>>>> I'm not against that, just want to >> >>> > > > > > >>>> have a >> >>> > > > > > >>>>>>>> common >> >>> > > > > > >>>>>>>>>>>>> agreement >> >>> > > > > > >>>>>>>>>>>>>>> that >> >>> > > > > > >>>>>>>>>>>>>>>>> we >> >>> > > > > > >>>>>>>>>>>>>>>>>>>> would >> >>> > > > > > >>>>>>>>>>>>>>>>>>>>>> like to move that direction. >> >>> > > > > > >>>>>>>>>>>>>>>>>>>>>> (As a side note not just PG but Spark >> >>> > > > > > >>>> also >> >>> > > > > > >>>>>>> has >> >>> > > > > > >>>>>>>>>>> similar >> >>> > > > > > >>>>>>>>>>>>>>> approach >> >>> > > > > > >>>>>>>>>>>>>>>>>> and I >> >>> > > > > > >>>>>>>>>>>>>>>>>>>>>> basically like the idea). >> >>> > > > > > >>>>>>>>>>>>>>>>>>>>>> If we would go that direction >> >>> > > > > > >> savepoint >> >>> > > > > > >>>>>>>> metadata >> >>> > > > > > >>>>>>>>>> can >> >>> > > > > > >>>>>>>>>>> be >> >>> > > > > > >>>>>>>>>>>>>>> reached >> >>> > > > > > >>>>>>>>>>>>>>>>> in >> >>> > > > > > >>>>>>>>>>>>>>>>>> a >> >>> > > > > > >>>>>>>>>>>>>>>>>>>> way >> >>> > > > > > >>>>>>>>>>>>>>>>>>>>>> that one row would represent >> >>> > > > > > >>>>>>>>>>>>>>>>>>>>>> an operator with it's values >> >>> > > > > > >> something >> >>> > > > > > >>>>> like >> >>> > > > > > >>>>>>>> this┌─────────┬─────────┬─────────┬─────────┬─────────┬─────────┬─────────┬────────┐│operatorN│operatorU│operatorH│paralleli│maxParall│subtaskSt│coordinat│totalSta│ >> >>> > > > > > >>>>>>>>>>>>>>>>>>>>>> │ame │id │ash │sm >> >>> > > > > > >>>>>>> │elism >> >>> > > > > > >>>>>>>>>>>>>>>>>>>>>> │atesCount│orStateSi│tesSizeI│ >> >>> > > > > > >>>>>>>>>>>>>>>>>>>>>> │ │ │ │ >> >>> > > > > > >>>> │ >> >>> > > > > > >>>>>>>>> │ >> >>> > > > > > >>>>>>>>>>>>>>>>>>>>>> │zeInBytes│nBytes │├─────────┼─────────┼─────────┼─────────┼─────────┼─────────┼─────────┼────────┤ >> >>> > > > > > >>>>>>>>>>>>>>>>>>>>>> │Source: │datagen-s│47aee9439│2 >> >>> > > > > > >>>>> │128 >> >>> > > > > > >>>>>>>>>> │2 >> >>> > > > > > >>>>>>>>>>>>>>> │16 >> >>> > > > > > >>>>>>>>>>>>>>>>>>>>>> │546 │ >> >>> > > > > > >>>>>>>>>>>>>>>>>>>>>> │datagen-s│ource-uid│4d6ea26e2│ >> >>> > > > > > >>>> │ >> >>> > > > > > >>>>>>>>> │ >> >>> > > > > > >>>>>>>>>>>>>> │ >> >>> > > > > > >>>>>>>>>>>>>>>>>>> │ >> >>> > > > > > >>>>>>>>>>>>>>>>>>>>>> │ >> >>> > > > > > >>>>>>>>>>>>>>>>>>>>>> │ource │ │d544bef0a│ >> >>> > > > > > >>>> │ >> >>> > > > > > >>>>>>>>> │ >> >>> > > > > > >>>>>>>>>>>>>> │ >> >>> > > > > > >>>>>>>>>>>>>>>>>>> │ >> >>> > > > > > >>>>>>>>>>>>>>>>>>>>>> │ >> >>> > > > > > >>>>>>>>>>>>>>>>>>>>>> │ │ │37bb5 │ >> >>> > > > > > >>>> │ >> >>> > > > > > >>>>>>>>> │ >> >>> > > > > > >>>>>>>>>>>>>> │ >> >>> > > > > > >>>>>>>>>>>>>>>>>>> │ >> >>> > > > > > >>>>>>>>>>>>>>>>>>>>>> │├─────────┼─────────┼─────────┼─────────┼─────────┼─────────┼─────────┼────────┤ >> >>> > > > > > >>>>>>>>>>>>>>>>>>>>>> │long-udf-│long-udf-│6ed3f40bf│2 >> >>> > > > > > >>>>> │128 >> >>> > > > > > >>>>>>>>>> │2 >> >>> > > > > > >>>>>>>>>>>>>>> │0 >> >>> > > > > > >>>>>>>>>>>>>>>>>>>> │0 >> >>> > > > > > >>>>>>>>>>>>>>>>>>>>>> │ >> >>> > > > > > >>>>>>>>>>>>>>>>>>>>>> │with-mast│with-mast│f3c8dfcdf│ >> >>> > > > > > >>>> │ >> >>> > > > > > >>>>>>>>> │ >> >>> > > > > > >>>>>>>>>>>>>> │ >> >>> > > > > > >>>>>>>>>>>>>>>>>>> │ >> >>> > > > > > >>>>>>>>>>>>>>>>>>>>>> │ >> >>> > > > > > >>>>>>>>>>>>>>>>>>>>>> │er-hook │er-hook-u│cb95128a1│ >> >>> > > > > > >>>> │ >> >>> > > > > > >>>>>>>>> │ >> >>> > > > > > >>>>>>>>>>>>>> │ >> >>> > > > > > >>>>>>>>>>>>>>>>>>> │ >> >>> > > > > > >>>>>>>>>>>>>>>>>>>>>> │ >> >>> > > > > > >>>>>>>>>>>>>>>>>>>>>> │ │id │018f1 │ >> >>> > > > > > >>>> │ >> >>> > > > > > >>>>>>>>> │ >> >>> > > > > > >>>>>>>>>>>>>> │ >> >>> > > > > > >>>>>>>>>>>>>>>>>>> │ >> >>> > > > > > >>>>>>>>>>>>>>>>>>>>>> │├─────────┼─────────┼─────────┼─────────┼─────────┼─────────┼─────────┼────────┤ >> >>> > > > > > >>>>>>>>>>>>>>>>>>>>>> │value-pro│value-pro│ca4f5fe9a│2 >> >>> > > > > > >>>>> │128 >> >>> > > > > > >>>>>>>>>> │2 >> >>> > > > > > >>>>>>>>>>>>>>> │0 >> >>> > > > > > >>>>>>>>>>>>>>>>>>>>>> │40726 │ >> >>> > > > > > >>>>>>>>>>>>>>>>>>>>>> │cess │cess-uid │637b656f0│ >> >>> > > > > > >>>> │ >> >>> > > > > > >>>>>>>>> │ >> >>> > > > > > >>>>>>>>>>>>>> │ >> >>> > > > > > >>>>>>>>>>>>>>>>>>> │ >> >>> > > > > > >>>>>>>>>>>>>>>>>>>>>> │ >> >>> > > > > > >>>>>>>>>>>>>>>>>>>>>> │ │ │9ea78b3e7│ >> >>> > > > > > >>>> │ >> >>> > > > > > >>>>>>>>> │ >> >>> > > > > > >>>>>>>>>>>>>> │ >> >>> > > > > > >>>>>>>>>>>>>>>>>>> │ >> >>> > > > > > >>>>>>>>>>>>>>>>>>>>>> │ >> >>> > > > > > >>>>>>>>>>>>>>>>>>>>>> │ │ │a15b9 │ >> >>> > > > > > >>>> │ >> >>> > > > > > >>>>>>>>> │ >> >>> > > > > > >>>>>>>>>>>>>> │ >> >>> > > > > > >>>>>>>>>>>>>>>>>>> │ >> >>> > > > > > >>>>>>>>>>>>>>>>>>>>>> │├─────────┼─────────┼─────────┼─────────┼─────────┼─────────┼─────────┼────────┤ >> >>> > > > > > >>>>>>>>>>>>>>>>>>>>>> >> >>> > > > > > >>>>>>>>>>>>>>>>>>>>>> This table can then be joined with >> >>> > > > > > >> the >> >>> > > > > > >>>>>>> actually >> >>> > > > > > >>>>>>>>>>>> existing >> >>> > > > > > >>>>>>>>>>>>>>>>>> `savepoint` >> >>> > > > > > >>>>>>>>>>>>>>>>>>>>>> connector created tables based on UID >> >>> > > > > > >>>> hash >> >>> > > > > > >>>>>>>> (which >> >>> > > > > > >>>>>>>>>> is >> >>> > > > > > >>>>>>>>>>>>> unique >> >>> > > > > > >>>>>>>>>>>>>>> and >> >>> > > > > > >>>>>>>>>>>>>>>>>>> always >> >>> > > > > > >>>>>>>>>>>>>>>>>>>>>> exists). >> >>> > > > > > >>>>>>>>>>>>>>>>>>>>>> This would mean that the already >> >>> > > > > > >>>> existing >> >>> > > > > > >>>>>>> table >> >>> > > > > > >>>>>>>>>> would >> >>> > > > > > >>>>>>>>>>>>> need >> >>> > > > > > >>>>>>>>>>>>>>>> only a >> >>> > > > > > >>>>>>>>>>>>>>>>>>>> single >> >>> > > > > > >>>>>>>>>>>>>>>>>>>>>> metadata column which is the UID >> >>> > > > > > >> hash. >> >>> > > > > > >>>>>>>>>>>>>>>>>>>>>> WDYT? >> >>> > > > > > >>>>>>>>>>>>>>>>>>>>>> @zakelly, plz share your thoughts >> >>> > > > > > >> too. >> >>> > > > > > >>>>>>>>>>>>>>>>>>>>>> >> >>> > > > > > >>>>>>>>>>>>>>>>>>>>>> >> >>> > > > > > >>>>>>>>>>>>>>>>>>>>>>> If we opt to use metadata columns, >> >>> > > > > > >>>> every >> >>> > > > > > >>>>>>>> record >> >>> > > > > > >>>>>>>>>> in >> >>> > > > > > >>>>>>>>>>>> the >> >>> > > > > > >>>>>>>>>>>>>>> table >> >>> > > > > > >>>>>>>>>>>>>>>>>> would >> >>> > > > > > >>>>>>>>>>>>>>>>>>>> end >> >>> > > > > > >>>>>>>>>>>>>>>>>>>>> up >> >>> > > > > > >>>>>>>>>>>>>>>>>>>>>>> having identical values for these >> >>> > > > > > >>>>> columns >> >>> > > > > > >>>>>>>>> (please >> >>> > > > > > >>>>>>>>>>>>> correct >> >>> > > > > > >>>>>>>>>>>>>>> me >> >>> > > > > > >>>>>>>>>>>>>>>> if >> >>> > > > > > >>>>>>>>>>>>>>>>>> I’m >> >>> > > > > > >>>>>>>>>>>>>>>>>>>>>>> mistaken). On the other hand, the >> >>> > > > > > >>>> state >> >>> > > > > > >>>>>>>>> connector >> >>> > > > > > >>>>>>>>>>>>>> requires >> >>> > > > > > >>>>>>>>>>>>>>>>> users >> >>> > > > > > >>>>>>>>>>>>>>>>>> to >> >>> > > > > > >>>>>>>>>>>>>>>>>>>>>> specify >> >>> > > > > > >>>>>>>>>>>>>>>>>>>>>>> an operator UID or operator UID >> >>> > > > > > >> hash, >> >>> > > > > > >>>>>>> after >> >>> > > > > > >>>>>>>>> which >> >>> > > > > > >>>>>>>>>>> it >> >>> > > > > > >>>>>>>>>>>>>>> outputs >> >>> > > > > > >>>>>>>>>>>>>>>>>>>>> user-defined >> >>> > > > > > >>>>>>>>>>>>>>>>>>>>>>> values in its records. This >> >>> > > > > > >> approach >> >>> > > > > > >>>>> feels >> >>> > > > > > >>>>>>>>>> somewhat >> >>> > > > > > >>>>>>>>>>>>>>> redundant >> >>> > > > > > >>>>>>>>>>>>>>>>> to >> >>> > > > > > >>>>>>>>>>>>>>>>>>> me. >> >>> > > > > > >>>>>>>>>>>>>>>>>>>>>>> >> >>> > > > > > >>>>>>>>>>>>>>>>>>>>>> >> >>> > > > > > >>>>>>>>>>>>>>>>>>>>>> If we would add a new >> >>> > > > > > >>>> `savepoint-metadata` >> >>> > > > > > >>>>>>>>>> connector >> >>> > > > > > >>>>>>>>>>>> then >> >>> > > > > > >>>>>>>>>>>>>>> this >> >>> > > > > > >>>>>>>>>>>>>>>>> can >> >>> > > > > > >>>>>>>>>>>>>>>>>> be >> >>> > > > > > >>>>>>>>>>>>>>>>>>>>>> addressed. >> >>> > > > > > >>>>>>>>>>>>>>>>>>>>>> On the other hand UID and UID hash >> >>> > > > > > >> are >> >>> > > > > > >>>>>>> having >> >>> > > > > > >>>>>>>>>>> either-or >> >>> > > > > > >>>>>>>>>>>>>>>>>> relationship >> >>> > > > > > >>>>>>>>>>>>>>>>>>>> from >> >>> > > > > > >>>>>>>>>>>>>>>>>>>>>> config perspective, >> >>> > > > > > >>>>>>>>>>>>>>>>>>>>>> so when a user provides the UID then >> >>> > > > > > >>>>> he/she >> >>> > > > > > >>>>>>> can >> >>> > > > > > >>>>>>>>> be >> >>> > > > > > >>>>>>>>>>>>>> interested >> >>> > > > > > >>>>>>>>>>>>>>>> in >> >>> > > > > > >>>>>>>>>>>>>>>>>> the >> >>> > > > > > >>>>>>>>>>>>>>>>>>>> hash >> >>> > > > > > >>>>>>>>>>>>>>>>>>>>>> for further calculations >> >>> > > > > > >>>>>>>>>>>>>>>>>>>>>> (the whole Flink internals are >> >>> > > > > > >>>> depending >> >>> > > > > > >>>>> on >> >>> > > > > > >>>>>>> the >> >>> > > > > > >>>>>>>>>>> hash). >> >>> > > > > > >>>>>>>>>>>>>>> Printing >> >>> > > > > > >>>>>>>>>>>>>>>>> out >> >>> > > > > > >>>>>>>>>>>>>>>>>>> the >> >>> > > > > > >>>>>>>>>>>>>>>>>>>>>> human readable UID >> >>> > > > > > >>>>>>>>>>>>>>>>>>>>>> is an explicit requirement from the >> >>> > > > > > >>>> user >> >>> > > > > > >>>>>>> side >> >>> > > > > > >>>>>>>>>> because >> >>> > > > > > >>>>>>>>>>>>>> hashes >> >>> > > > > > >>>>>>>>>>>>>>>> are >> >>> > > > > > >>>>>>>>>>>>>>>>>> not >> >>> > > > > > >>>>>>>>>>>>>>>>>>>>> human >> >>> > > > > > >>>>>>>>>>>>>>>>>>>>>> readable. >> >>> > > > > > >>>>>>>>>>>>>>>>>>>>>> >> >>> > > > > > >>>>>>>>>>>>>>>>>>>>>> >> >>> > > > > > >>>>>>>>>>>>>>>>>>>>>>> 3. Handling LIST and MAP States in >> >>> > > > > > >>>> the >> >>> > > > > > >>>>>>> State >> >>> > > > > > >>>>>>>>>>>> Connector >> >>> > > > > > >>>>>>>>>>>>>>>>>>>>>>> I have concerns about how the >> >>> > > > > > >> current >> >>> > > > > > >>>>>>> design >> >>> > > > > > >>>>>>>>>>> handles >> >>> > > > > > >>>>>>>>>>>>> LIST >> >>> > > > > > >>>>>>>>>>>>>>> and >> >>> > > > > > >>>>>>>>>>>>>>>>> MAP >> >>> > > > > > >>>>>>>>>>>>>>>>>>>>> states. >> >>> > > > > > >>>>>>>>>>>>>>>>>>>>>>> Specifically, the state connector >> >>> > > > > > >>>> uses >> >>> > > > > > >>>>>>> Flink >> >>> > > > > > >>>>>>>>>> SQL’s >> >>> > > > > > >>>>>>>>>>>> MAP >> >>> > > > > > >>>>>>>>>>>>>> and >> >>> > > > > > >>>>>>>>>>>>>>>>> ARRAY >> >>> > > > > > >>>>>>>>>>>>>>>>>>>> types, >> >>> > > > > > >>>>>>>>>>>>>>>>>>>>>>> which implies that it attempts to >> >>> > > > > > >>>> load >> >>> > > > > > >>>>>>> entire >> >>> > > > > > >>>>>>>>> MAP >> >>> > > > > > >>>>>>>>>>> or >> >>> > > > > > >>>>>>>>>>>>> LIST >> >>> > > > > > >>>>>>>>>>>>>>>>> states >> >>> > > > > > >>>>>>>>>>>>>>>>>>> into >> >>> > > > > > >>>>>>>>>>>>>>>>>>>>>>> memory. >> >>> > > > > > >>>>>>>>>>>>>>>>>>>>>>> >> >>> > > > > > >>>>>>>>>>>>>>>>>>>>>>> However, in many real-world >> >>> > > > > > >>>> scenarios, >> >>> > > > > > >>>>>>> these >> >>> > > > > > >>>>>>>>>> states >> >>> > > > > > >>>>>>>>>>>> can >> >>> > > > > > >>>>>>>>>>>>>>> grow >> >>> > > > > > >>>>>>>>>>>>>>>>> very >> >>> > > > > > >>>>>>>>>>>>>>>>>>>>> large. >> >>> > > > > > >>>>>>>>>>>>>>>>>>>>>>> Typically, the state API addresses >> >>> > > > > > >>>> this >> >>> > > > > > >>>>> by >> >>> > > > > > >>>>>>>>>>> providing >> >>> > > > > > >>>>>>>>>>>> an >> >>> > > > > > >>>>>>>>>>>>>>>>> iterator >> >>> > > > > > >>>>>>>>>>>>>>>>>> to >> >>> > > > > > >>>>>>>>>>>>>>>>>>>>>>> traverse elements within the state >> >>> > > > > > >>>>>>>>> incrementally. >> >>> > > > > > >>>>>>>>>>> I’m >> >>> > > > > > >>>>>>>>>>>>>>> unsure >> >>> > > > > > >>>>>>>>>>>>>>>>>>> whether >> >>> > > > > > >>>>>>>>>>>>>>>>>>>>> I’ve >> >>> > > > > > >>>>>>>>>>>>>>>>>>>>>>> missed something in FLIP-496 or >> >>> > > > > > >>>>> FLIP-512, >> >>> > > > > > >>>>>>> but >> >>> > > > > > >>>>>>>>> it >> >>> > > > > > >>>>>>>>>>>> seems >> >>> > > > > > >>>>>>>>>>>>>> that >> >>> > > > > > >>>>>>>>>>>>>>>> the >> >>> > > > > > >>>>>>>>>>>>>>>>>>>> current >> >>> > > > > > >>>>>>>>>>>>>>>>>>>>>>> design might struggle with >> >>> > > > > > >>>> scalability >> >>> > > > > > >>>>> in >> >>> > > > > > >>>>>>>> such >> >>> > > > > > >>>>>>>>>>> cases. >> >>> > > > > > >>>>>>>>>>>>>>>>>>>>>>> >> >>> > > > > > >>>>>>>>>>>>>>>>>>>>>> >> >>> > > > > > >>>>>>>>>>>>>>>>>>>>>> You see it good, the current >> >>> > > > > > >>>>> implementation >> >>> > > > > > >>>>>>>> keeps >> >>> > > > > > >>>>>>>>>>> state >> >>> > > > > > >>>>>>>>>>>>>> for a >> >>> > > > > > >>>>>>>>>>>>>>>>>> single >> >>> > > > > > >>>>>>>>>>>>>>>>>>>> key >> >>> > > > > > >>>>>>>>>>>>>>>>>>>>> in >> >>> > > > > > >>>>>>>>>>>>>>>>>>>>>> memory. >> >>> > > > > > >>>>>>>>>>>>>>>>>>>>>> Back in the days we've considered >> >>> > > > > > >> this >> >>> > > > > > >>>>>>>> potential >> >>> > > > > > >>>>>>>>>>> issue >> >>> > > > > > >>>>>>>>>>>>> and >> >>> > > > > > >>>>>>>>>>>>>>>>>> concluded >> >>> > > > > > >>>>>>>>>>>>>>>>>>>> that >> >>> > > > > > >>>>>>>>>>>>>>>>>>>>>> this is not necessarily >> >>> > > > > > >>>>>>>>>>>>>>>>>>>>>> needed for the initial version and >> >>> > > > > > >> can >> >>> > > > > > >>>> be >> >>> > > > > > >>>>>>> done >> >>> > > > > > >>>>>>>>> as a >> >>> > > > > > >>>>>>>>>>>> later >> >>> > > > > > >>>>>>>>>>>>>>>>>>> improvement. >> >>> > > > > > >>>>>>>>>>>>>>>>>>>>>> >> >>> > > > > > >>>>>>>>>>>>>>>>>>>>>> Up until now we've seen even in TB >> >>> > > > > > >>>>>>> savepoints >> >>> > > > > > >>>>>>>>> that >> >>> > > > > > >>>>>>>>>>> the >> >>> > > > > > >>>>>>>>>>>>>> number >> >>> > > > > > >>>>>>>>>>>>>>>> of >> >>> > > > > > >>>>>>>>>>>>>>>>>> keys >> >>> > > > > > >>>>>>>>>>>>>>>>>>>> can >> >>> > > > > > >>>>>>>>>>>>>>>>>>>>>> be extremely huge but not the per key >> >>> > > > > > >>>>> state >> >>> > > > > > >>>>>>>>> itself. >> >>> > > > > > >>>>>>>>>>>>>>>>>>>>>> But again, this is a good feature >> >>> > > > > > >> as-is >> >>> > > > > > >>>>> and >> >>> > > > > > >>>>>>> can >> >>> > > > > > >>>>>>>>> be >> >>> > > > > > >>>>>>>>>>>>> handled >> >>> > > > > > >>>>>>>>>>>>>>> in a >> >>> > > > > > >>>>>>>>>>>>>>>>>>>> separate >> >>> > > > > > >>>>>>>>>>>>>>>>>>>>>> jira. >> >>> > > > > > >>>>>>>>>>>>>>>>>>>>>> >> >>> > > > > > >>>>>>>>>>>>>>>>>>>>>> >> >>> > > > > > >>>>>>>>>>>>>>>>>>>>>>> >> >>> > > > > > >>>>>>>>>>>>>>>>>>>>>>> Best, >> >>> > > > > > >>>>>>>>>>>>>>>>>>>>>>> Shengkai >> >>> > > > > > >>>>>>>>>>>>>>>>>>>>>>> >> >>> > > > > > >>>>>>>>>>>>>>>>>>>>>>> [1] >> >>> > > > > > >>>>>>>>>>>>>>>>> >> >>> > > > > > >>>>>>>>> >> >>> > > https://www.postgresql.org/docs/current/view-pg-tables.htmlhttps://dev.mysql.com/doc/refman/8.4/en/information-schema-tables-table.html >> >>> > > > > > >>>>>>>>>>>>>>>>>>>>>>> >> >>> > > > > > >>>>>>>>>>>>>>>>>>>>>>> Gabor Somogyi < >> >>> > > > > > >>>>> gabor.g.somo...@gmail.com> >> >>> > > > > > >>>>>>>>>>>> 于2025年3月3日周一 >> >>> > > > > > >>>>>>>>>>>>>>>>> 02:00写道: >> >>> > > > > > >>>>>>>>>>>>>>>>>>>>>>> >> >>> > > > > > >>>>>>>>>>>>>>>>>>>>>>>> Hi Zakelly, >> >>> > > > > > >>>>>>>>>>>>>>>>>>>>>>>> >> >>> > > > > > >>>>>>>>>>>>>>>>>>>>>>>> In order to shoot for simplicity >> >>> > > > > > >>>>>>> `METADATA >> >>> > > > > > >>>>>>>>>>> VIRTUAL` >> >>> > > > > > >>>>>>>>>>>>> as >> >>> > > > > > >>>>>>>>>>>>>>> key >> >>> > > > > > >>>>>>>>>>>>>>>>>> words >> >>> > > > > > >>>>>>>>>>>>>>>>>>>> for >> >>> > > > > > >>>>>>>>>>>>>>>>>>>>>>>> definition is the target. >> >>> > > > > > >>>>>>>>>>>>>>>>>>>>>>>> When it's not super complex the >> >>> > > > > > >>>> latter >> >>> > > > > > >>>>>>> can >> >>> > > > > > >>>>>>>> be >> >>> > > > > > >>>>>>>>>>> added >> >>> > > > > > >>>>>>>>>>>>>> too. >> >>> > > > > > >>>>>>>>>>>>>>>>>>>>>>>> >> >>> > > > > > >>>>>>>>>>>>>>>>>>>>>>>> BR, >> >>> > > > > > >>>>>>>>>>>>>>>>>>>>>>>> G >> >>> > > > > > >>>>>>>>>>>>>>>>>>>>>>>> >> >>> > > > > > >>>>>>>>>>>>>>>>>>>>>>>> >> >>> > > > > > >>>>>>>>>>>>>>>>>>>>>>>> On Sun, Mar 2, 2025 at 3:37 PM >> >>> > > > > > >>>> Zakelly >> >>> > > > > > >>>>>>> Lan >> >>> > > > > > >>>>>>>> < >> >>> > > > > > >>>>>>>>>>>>>>>>>>> zakelly....@gmail.com> >> >>> > > > > > >>>>>>>>>>>>>>>>>>>>>>> wrote: >> >>> > > > > > >>>>>>>>>>>>>>>>>>>>>>>> >> >>> > > > > > >>>>>>>>>>>>>>>>>>>>>>>>> Hi Gabor, >> >>> > > > > > >>>>>>>>>>>>>>>>>>>>>>>>> >> >>> > > > > > >>>>>>>>>>>>>>>>>>>>>>>>> +1 for this. >> >>> > > > > > >>>>>>>>>>>>>>>>>>>>>>>>> >> >>> > > > > > >>>>>>>>>>>>>>>>>>>>>>>>> Will the metadata column use >> >>> > > > > > >>>>> `METADATA >> >>> > > > > > >>>>>>>>>> VIRTUAL` >> >>> > > > > > >>>>>>>>>>>> as >> >>> > > > > > >>>>>>>>>>>>>> key >> >>> > > > > > >>>>>>>>>>>>>>>>> words >> >>> > > > > > >>>>>>>>>>>>>>>>>>> for >> >>> > > > > > >>>>>>>>>>>>>>>>>>>>>>>>> definition, or `METADATA FROM >> >>> > > > > > >> xxx >> >>> > > > > > >>>>>>>> VIRTUAL` >> >>> > > > > > >>>>>>>>>> for >> >>> > > > > > >>>>>>>>>>>>>>> renaming, >> >>> > > > > > >>>>>>>>>>>>>>>>> just >> >>> > > > > > >>>>>>>>>>>>>>>>>>>> like >> >>> > > > > > >>>>>>>>>>>>>>>>>>>>>> the >> >>> > > > > > >>>>>>>>>>>>>>>>>>>>>>>>> Kafka table? >> >>> > > > > > >>>>>>>>>>>>>>>>>>>>>>>>> >> >>> > > > > > >>>>>>>>>>>>>>>>>>>>>>>>> >> >>> > > > > > >>>>>>>>>>>>>>>>>>>>>>>>> Best, >> >>> > > > > > >>>>>>>>>>>>>>>>>>>>>>>>> Zakelly >> >>> > > > > > >>>>>>>>>>>>>>>>>>>>>>>>> >> >>> > > > > > >>>>>>>>>>>>>>>>>>>>>>>>> On Sat, Mar 1, 2025 at 1:31 PM >> >>> > > > > > >>>> Gabor >> >>> > > > > > >>>>>>>>> Somogyi >> >>> > > > > > >>>>>>>>>> < >> >>> > > > > > >>>>>>>>>>>>>>>>>>>>>>> gabor.g.somo...@gmail.com> >> >>> > > > > > >>>>>>>>>>>>>>>>>>>>>>>>> wrote: >> >>> > > > > > >>>>>>>>>>>>>>>>>>>>>>>>> >> >>> > > > > > >>>>>>>>>>>>>>>>>>>>>>>>>> Hi All, >> >>> > > > > > >>>>>>>>>>>>>>>>>>>>>>>>>> >> >>> > > > > > >>>>>>>>>>>>>>>>>>>>>>>>>> I'd like to start a >> >>> > > > > > >> discussion >> >>> > > > > > >>>> of >> >>> > > > > > >>>>>>>>> FLIP-512: >> >>> > > > > > >>>>>>>>>>> Add >> >>> > > > > > >>>>>>>>>>>>>> meta >> >>> > > > > > >>>>>>>>>>>>>>>>>>>> information >> >>> > > > > > >>>>>>>>>>>>>>>>>>>>> to >> >>> > > > > > >>>>>>>>>>>>>>>>>>>>>>> SQL >> >>> > > > > > >>>>>>>>>>>>>>>>>>>>>>>>>> state connector [1]. >> >>> > > > > > >>>>>>>>>>>>>>>>>>>>>>>>>> Feel free to add your >> >>> > > > > > >> thoughts >> >>> > > > > > >>>> to >> >>> > > > > > >>>>>>> make >> >>> > > > > > >>>>>>>>> this >> >>> > > > > > >>>>>>>>>>>>> feature >> >>> > > > > > >>>>>>>>>>>>>>>>> betterhttps://cwiki.apache.org/confluence/display/FLINK/FLIP-512%3A+Add+meta+information+to+SQL+state+connector