Just found out that PTF in batch mode is not supported, plz see the dev mailing about it [1].
[1] https://lists.apache.org/thread/ytm9m1qt4pq2q2gjngfktrn8vrlvkf07 BR, G On Thu, Mar 27, 2025 at 3:38β―PM Gabor Somogyi <gabor.g.somo...@gmail.com> wrote: > In the meantime I've just updated the FLIP according to this to be > optimistic π > > BR, > G > > On Thu, Mar 27, 2025 at 2:15β―PM Gabor Somogyi <gabor.g.somo...@gmail.com> > wrote: > >> Considering all the facts I also +1 on PTF. Even if something is missing >> we can add later. >> >> @Zakelly Lan <zakelly....@gmail.com> @Shengkai Fang are you also on the >> same page or have something to add? >> >> BR, >> G >> >> >> On Thu, Mar 27, 2025 at 1:50β―PM Lincoln Lee <lincoln.8...@gmail.com> >> wrote: >> >>> +1 for PTF >>> >>> > Is it possible to describe such function to see the column names/types? >>> >>> Although Flink SQL does not directly support this feature, users can >>> achieve >>> similar results with the help of `explain` syntax, e.g. >>> 'explain select * from read_state_metadata(...)' >>> >>> >>> Best, >>> Lincoln Lee >>> >>> >>> Gyula FΓ³ra <gyula.f...@gmail.com> δΊ2025εΉ΄3ζ27ζ₯ε¨ε 20:41ειοΌ >>> >>> > Hey! >>> > >>> > I think the PTF approach strikes a great balance in simplicity and the >>> > capabilities that we get out of it. >>> > >>> > I think this could be a completely viable alternative to the dedicated >>> > connector, +1. >>> > >>> > Cheers, >>> > Gyula >>> > >>> > On Thu, Mar 27, 2025 at 10:37β―AM Shengkai Fang <fskm...@gmail.com> >>> wrote: >>> > >>> > > Hi, Gabor. >>> > > >>> > > > Do I understand correctly that this is 2.x only feature and we >>> can't >>> > > backport it to 1.x line >>> > > >>> > > Yes. PTF is only supported in 2.x verison. >>> > > >>> > > > Is it possible to describe such function to see the column >>> names/types? >>> > > >>> > > Flink SQL doesn't support this feature, but postgres[2] or mysql[1] >>> has >>> > > similar feature. >>> > > >>> > > [1] >>> https://dev.mysql.com/doc/refman/8.4/en/show-create-procedure.html >>> > > [2] >>> > > >>> > > >>> > >>> https://stackoverflow.com/questions/6898453/show-the-code-of-a-function-procedure-and-trigger-in-postgresql >>> > > >>> > > Best, >>> > > Shengkai >>> > > >>> > > >>> > > Gabor Somogyi <gabor.g.somo...@gmail.com> δΊ2025εΉ΄3ζ27ζ₯ε¨ε 16:25ειοΌ >>> > > >>> > > > Hi Shengkai, >>> > > > >>> > > > Thanks for your effort with the example, this looks promising. >>> > > > I like the fact that users wouldn't need to sweat with complex >>> create >>> > > table >>> > > > statements. >>> > > > >>> > > > Couple of questions: >>> > > > * Do I understand correctly that this is 2.x only feature and we >>> can't >>> > > > backport it to 1.x line? >>> > > > I'm not intended to do any backport, just would like to know the >>> > > technical >>> > > > constraints. >>> > > > * Is it possible to describe such function to see the column >>> > names/types? >>> > > > >>> > > > BR, >>> > > > G >>> > > > >>> > > > >>> > > > On Thu, Mar 27, 2025 at 3:17β―AM Shengkai Fang <fskm...@gmail.com> >>> > wrote: >>> > > > >>> > > > > Many thanks for your reminder, Leonard. Here's the link I >>> > mentioned[1]. >>> > > > > >>> > > > > Best, >>> > > > > Shengkai >>> > > > > >>> > > > > [1] https://github.com/apache/flink/pull/26358 >>> > > > > >>> > > > > Leonard Xu <xbjt...@gmail.com> δΊ2025εΉ΄3ζ27ζ₯ε¨ε 10:05ειοΌ >>> > > > > >>> > > > > > Your link is broken, Shengkai >>> > > > > > >>> > > > > > Best, >>> > > > > > Leonard >>> > > > > > >>> > > > > > > 2025εΉ΄3ζ27ζ₯ 10:01οΌShengkai Fang <fskm...@gmail.com> ειοΌ >>> > > > > > > >>> > > > > > > Hi, All. >>> > > > > > > >>> > > > > > > I write a simple demo to illustrate my idea. Hope this helps. >>> > > > > > > >>> > > > > > > Best, >>> > > > > > > Shengkai >>> > > > > > > >>> > > > > > > >>> > > > > > >>> > > > > >>> > > > >>> > > >>> > >>> https://github.com/apache/flink/compare/master...fsk119:flink:example?expand=1 >>> > > > > > > >>> > > > > > > Gabor Somogyi <gabor.g.somo...@gmail.com> δΊ2025εΉ΄3ζ26ζ₯ε¨δΈ >>> 15:54ειοΌ >>> > > > > > > >>> > > > > > >>> I'm fine with a seperate SQL connector for metadata, so >>> maybe >>> > we >>> > > > > could >>> > > > > > >> update the FLIP about our discussion? >>> > > > > > >> >>> > > > > > >> Sorry, I've forgotten this part. Yeah, no matter we choose >>> I'm >>> > > going >>> > > > > to >>> > > > > > >> update the FLIP. >>> > > > > > >> >>> > > > > > >> G >>> > > > > > >> >>> > > > > > >> >>> > > > > > >> On Wed, Mar 26, 2025 at 8:51β―AM Gabor Somogyi < >>> > > > > > gabor.g.somo...@gmail.com> >>> > > > > > >> wrote: >>> > > > > > >> >>> > > > > > >>> Hi All, >>> > > > > > >>> >>> > > > > > >>> I've also lack of the knowledge of PTF so I've read just >>> the >>> > > > > motivation >>> > > > > > >>> part: >>> > > > > > >>> >>> > > > > > >>> "The SQL 2016 standard introduced a way of defining custom >>> SQL >>> > > > > > operators >>> > > > > > >>> defined by ISO/IEC 19075-7:2021 (Part 7: Polymorphic table >>> > > > > functions). >>> > > > > > >>> ~200 pages define how this new kind of function can >>> consume and >>> > > > > produce >>> > > > > > >>> tables with various execution properties. >>> > > > > > >>> Unfortunately, this part of the standard is not publicly >>> > > > available." >>> > > > > > >>> >>> > > > > > >>> Of course we can take a look at some examples but do we >>> really >>> > > want >>> > > > > to >>> > > > > > >>> expose state data with this construct >>> > > > > > >>> which is described in ~200 pages and part of the standard >>> is >>> > not >>> > > > > > publicly >>> > > > > > >>> available? π >>> > > > > > >>> I mean the dataset is couple of rows and the use-case is >>> join >>> > > with >>> > > > > > >> another >>> > > > > > >>> table like with state data. >>> > > > > > >>> If somebody can give advantages I would buy that but from >>> my >>> > > > limited >>> > > > > > >>> understanding this would be an overkill here. >>> > > > > > >>> >>> > > > > > >>> BR, >>> > > > > > >>> G >>> > > > > > >>> >>> > > > > > >>> >>> > > > > > >>> On Wed, Mar 26, 2025 at 8:28β―AM Gyula FΓ³ra < >>> > gyula.f...@gmail.com >>> > > > >>> > > > > > wrote: >>> > > > > > >>> >>> > > > > > >>>> Hi Zakelly , Shengkai! >>> > > > > > >>>> >>> > > > > > >>>> I don't know too much about PTFs, it would be interesting >>> to >>> > see >>> > > > how >>> > > > > > the >>> > > > > > >>>> usage would look in practice. >>> > > > > > >>>> >>> > > > > > >>>> Do you have some mockup/example in mind how the PTF would >>> look >>> > > for >>> > > > > > >> example >>> > > > > > >>>> when want to: >>> > > > > > >>>> - Simply display/aggregate whats in the metadata >>> > > > > > >>>> - Join keyed state with some metadata columns >>> > > > > > >>>> >>> > > > > > >>>> Thanks >>> > > > > > >>>> Gyula >>> > > > > > >>>> >>> > > > > > >>>> On Wed, Mar 26, 2025 at 7:33β―AM Zakelly Lan < >>> > > > zakelly....@gmail.com> >>> > > > > > >>>> wrote: >>> > > > > > >>>> >>> > > > > > >>>>> Hi everyone, >>> > > > > > >>>>> >>> > > > > > >>>>> I'm fine with a seperate SQL connector for metadata, so >>> maybe >>> > > we >>> > > > > > could >>> > > > > > >>>>> update the FLIP about our discussion? And Shengkai >>> provides a >>> > > PTF >>> > > > > > >>>>> implementation, does that also meet the requirement? >>> > > > > > >>>>> >>> > > > > > >>>>> >>> > > > > > >>>>> Best, >>> > > > > > >>>>> Zakelly >>> > > > > > >>>>> >>> > > > > > >>>>> On Thu, Mar 20, 2025 at 4:47β―PM Gabor Somogyi < >>> > > > > > >>>> gabor.g.somo...@gmail.com> >>> > > > > > >>>>> wrote: >>> > > > > > >>>>> >>> > > > > > >>>>>> Hi All, >>> > > > > > >>>>>> >>> > > > > > >>>>>> @Zakelly: Gyula summarised it correctly what I meant so >>> > please >>> > > > > treat >>> > > > > > >>>> the >>> > > > > > >>>>>> content as mine. >>> > > > > > >>>>>> As an addition I'm not against to add CLI at all, I'm >>> just >>> > > > stating >>> > > > > > >>>> that >>> > > > > > >>>>> in >>> > > > > > >>>>>> some cases like this, users would like to have >>> > > > > > >>>>>> a self-serving solution where they can provide SQL >>> > statements >>> > > > > which >>> > > > > > >>>> can >>> > > > > > >>>>>> trigger alerts automatically. >>> > > > > > >>>>>> >>> > > > > > >>>>>> My personal opinion is that CLI would be beneficial for >>> > > several >>> > > > > > >>>> cases. A >>> > > > > > >>>>>> good example is when users want to restart job >>> > > > > > >>>>>> from specific Kafka offsets which are persisted in a >>> > > savepoint. >>> > > > > For >>> > > > > > >>>> such >>> > > > > > >>>>>> scenario users are more than happy since they >>> > > > > > >>>>>> expect manual intervention with full control. So all in >>> all >>> > > one >>> > > > > can >>> > > > > > >>>> count >>> > > > > > >>>>>> on my +1 when CLI FLIP would come up... >>> > > > > > >>>>>> >>> > > > > > >>>>>> BR, >>> > > > > > >>>>>> G >>> > > > > > >>>>>> >>> > > > > > >>>>>> >>> > > > > > >>>>>> On Thu, Mar 20, 2025 at 8:20β―AM Gyula FΓ³ra < >>> > > > gyula.f...@gmail.com> >>> > > > > > >>>> wrote: >>> > > > > > >>>>>> >>> > > > > > >>>>>>> Hi! >>> > > > > > >>>>>>> >>> > > > > > >>>>>>> @Zakelly Lan <zakelly....@gmail.com> >>> > > > > > >>>>>>> I think what Gabor means is that users want to have >>> > > predefined >>> > > > > SQL >>> > > > > > >>>>> scripts >>> > > > > > >>>>>>> to perform state analysis tasks to debug/identify >>> problems. >>> > > > > > >>>>>>> Such as write a SQL script that joins the metadata >>> table >>> > with >>> > > > the >>> > > > > > >>>> state >>> > > > > > >>>>>>> and >>> > > > > > >>>>>>> do some analytics on it. >>> > > > > > >>>>>>> >>> > > > > > >>>>>>> If we have a meta table then the SQL script that can do >>> > this >>> > > is >>> > > > > > >> fixed >>> > > > > > >>>>> and >>> > > > > > >>>>>>> users can trigger this on demand by simply providing a >>> new >>> > > > > > >> savepoint >>> > > > > > >>>>> path. >>> > > > > > >>>>>>> >>> > > > > > >>>>>>> If we have a different mechanism to extract metadata >>> that >>> > is >>> > > > not >>> > > > > > >> SQL >>> > > > > > >>>>>>> native >>> > > > > > >>>>>>> then manual steps need to be executed and a custom SQL >>> > script >>> > > > > would >>> > > > > > >>>> need >>> > > > > > >>>>>>> to >>> > > > > > >>>>>>> be written that adds the manually extracted metadata >>> into >>> > the >>> > > > > > >> script. >>> > > > > > >>>>>>> >>> > > > > > >>>>>>> Cheers, >>> > > > > > >>>>>>> Gyula >>> > > > > > >>>>>>> >>> > > > > > >>>>>>> On Thu, Mar 20, 2025 at 4:32β―AM Zakelly Lan < >>> > > > > zakelly....@gmail.com >>> > > > > > >>> >>> > > > > > >>>>>>> wrote: >>> > > > > > >>>>>>> >>> > > > > > >>>>>>>> Hi all, >>> > > > > > >>>>>>>> >>> > > > > > >>>>>>>> Thanks for your answers! Getting everyone aligned on >>> this >>> > > > topic >>> > > > > > >> is >>> > > > > > >>>>>>>> challenging, but itβs definitely worth the effort >>> since it >>> > > > will >>> > > > > > >>>> help >>> > > > > > >>>>>>>> streamline things moving forward. >>> > > > > > >>>>>>>> >>> > > > > > >>>>>>>> @Gabor are you saying that users are using some >>> scripts to >>> > > > > define >>> > > > > > >>>> the >>> > > > > > >>>>>>> SQL >>> > > > > > >>>>>>>> metadata connector and get the information, right? If >>> so, >>> > > > would >>> > > > > a >>> > > > > > >>>> CLI >>> > > > > > >>>>>>> tool >>> > > > > > >>>>>>>> be more convenient? It's easy to invoke and can get >>> the >>> > > result >>> > > > > > >>>>> swiftly. >>> > > > > > >>>>>>> And >>> > > > > > >>>>>>>> there should be some other systems to track the >>> checkpoint >>> > > > > > >> lineage >>> > > > > > >>>> and >>> > > > > > >>>>>>>> analyze if there are outliers in metadata (e.g. state >>> size >>> > > of >>> > > > > one >>> > > > > > >>>>>>> operator) >>> > > > > > >>>>>>>> right? Well, maybe I missed something so please >>> correct me >>> > > if >>> > > > > I'm >>> > > > > > >>>>> wrong. >>> > > > > > >>>>>>>> >>> > > > > > >>>>>>>> I think the overall vision in Flink SQL is to provide >>> a >>> > SQL >>> > > > > > >> native >>> > > > > > >>>>>>>>> environment where we can serve complex use-cases >>> like you >>> > > > would >>> > > > > > >>>>> expect >>> > > > > > >>>>>>>> in a >>> > > > > > >>>>>>>>> regular database. >>> > > > > > >>>>>>>> >>> > > > > > >>>>>>>> >>> > > > > > >>>>>>>> @Gyula Well, this is a good point. From the >>> perspective of >>> > > > > > >>>>> comprehensive >>> > > > > > >>>>>>>> SQL experience, I'd +1 for treating metadata as data. >>> > > > Although I >>> > > > > > >>>> doubt >>> > > > > > >>>>>>> if >>> > > > > > >>>>>>>> there is a need for processing metadata, I won't be >>> > against >>> > > a >>> > > > > > >>>> separate >>> > > > > > >>>>>>>> connector. >>> > > > > > >>>>>>>> >>> > > > > > >>>>>>>> Regarding the CLI tool, I still think itβs worth >>> > > implementing. >>> > > > > > >>>> Such a >>> > > > > > >>>>>>> tool >>> > > > > > >>>>>>>> could provide savepoint information before resuming >>> from a >>> > > > > > >>>> savepoint, >>> > > > > > >>>>>>> which >>> > > > > > >>>>>>>> would enhance the user experience in CLI-based >>> workflows. >>> > It >>> > > > > > >> would >>> > > > > > >>>> be >>> > > > > > >>>>>>> good >>> > > > > > >>>>>>>> if someone could implement this feature. We shouldnβt >>> > worry >>> > > > > about >>> > > > > > >>>>>>> whether >>> > > > > > >>>>>>>> this tool might be retired in the future. Regardless >>> of >>> > the >>> > > > > > >>>> SQL-based >>> > > > > > >>>>>>>> solution we eventually adopt, this capability will >>> remain >>> > > > > > >> essential >>> > > > > > >>>>> for >>> > > > > > >>>>>>> CLI >>> > > > > > >>>>>>>> users. This is another topic. >>> > > > > > >>>>>>>> >>> > > > > > >>>>>>>> >>> > > > > > >>>>>>>> Best, >>> > > > > > >>>>>>>> Zakelly >>> > > > > > >>>>>>>> >>> > > > > > >>>>>>>> >>> > > > > > >>>>>>>> On Thu, Mar 20, 2025 at 10:37β―AM Shengkai Fang < >>> > > > > > >> fskm...@gmail.com> >>> > > > > > >>>>>>> wrote: >>> > > > > > >>>>>>>> >>> > > > > > >>>>>>>>> Hi. >>> > > > > > >>>>>>>>> >>> > > > > > >>>>>>>>> After reading the doc[1], I think Spark provides a >>> > function >>> > > > for >>> > > > > > >>>>> users >>> > > > > > >>>>>>> to >>> > > > > > >>>>>>>>> consume the metadata from the savepoint. In Flink >>> SQL, >>> > > > similar >>> > > > > > >>>>>>>>> functionality is implemented through Polymorphic >>> Table >>> > > > > > >> Functions >>> > > > > > >>>>>>> (PTF) as >>> > > > > > >>>>>>>>> proposed in FLIP-440[2]. Below is a code example[3] >>> > > > > > >> illustrating >>> > > > > > >>>>> this >>> > > > > > >>>>>>>>> concept: >>> > > > > > >>>>>>>>> >>> > > > > > >>>>>>>>> ``` >>> > > > > > >>>>>>>>> public static class ScalarArgsFunction extends >>> > > > > > >>>>>>>>> TestProcessTableFunctionBase { >>> > > > > > >>>>>>>>> public void eval(Integer i, Boolean b) { >>> > > > > > >>>>>>>>> collectObjects(i, b); >>> > > > > > >>>>>>>>> } >>> > > > > > >>>>>>>>> } >>> > > > > > >>>>>>>>> ``` >>> > > > > > >>>>>>>>> >>> > > > > > >>>>>>>>> ``` >>> > > > > > >>>>>>>>> INSERT INTO sink SELECT * FROM f(i => 42, b => >>> > CAST('TRUE' >>> > > AS >>> > > > > > >>>>>>> BOOLEAN)) >>> > > > > > >>>>>>>>> `` >>> > > > > > >>>>>>>>> >>> > > > > > >>>>>>>>> So we can add a builtin function named >>> > > `read_state_metadata` >>> > > > to >>> > > > > > >>>> read >>> > > > > > >>>>>>>>> savepoint data. >>> > > > > > >>>>>>>>> >>> > > > > > >>>>>>>>> Best, >>> > > > > > >>>>>>>>> Shengkai >>> > > > > > >>>>>>>>> >>> > > > > > >>>>>>>>> [1] >>> > > > > > >>>>>>>>> >>> > > > > > >>>>>>>>> >>> > > > > > >>>>>>>> >>> > > > > > >>>>>>> >>> > > > > > >>>>> >>> > > > > > >>>> >>> > > > > > >> >>> > > > > > >>> > > > > >>> > > > >>> > > >>> > >>> https://docs.databricks.com/aws/en/structured-streaming/read-state?language=SQL >>> > > > > > >>>>>>>>> [2] >>> > > > > > >>>>>>>>> >>> > > > > > >>>>>>>> >>> > > > > > >>>>>>> >>> > > > > > >>>>> >>> > > > > > >>>> >>> > > > > > >> >>> > > > > > >>> > > > > >>> > > > >>> > > >>> > >>> https://cwiki.apache.org/confluence/pages/viewpage.action?pageId=298781093 >>> > > > > > >>>>>>>>> [3] >>> > > > > > >>>>>>>>> >>> > > > > > >>>>>>>>> >>> > > > > > >>>>>>>> >>> > > > > > >>>>>>> >>> > > > > > >>>>> >>> > > > > > >>>> >>> > > > > > >> >>> > > > > > >>> > > > > >>> > > > >>> > > >>> > >>> https://github.com/apache/flink/blob/master/flink-table/flink-table-planner/src/test/java/org/apache/flink/table/planner/plan/nodes/exec/stream/ProcessTableFunctionTestPrograms.java#L140 >>> > > > > > >>>>>>>>> >>> > > > > > >>>>>>>>> Gyula FΓ³ra <gyula.f...@gmail.com> δΊ2025εΉ΄3ζ19ζ₯ε¨δΈ >>> 18:37ειοΌ >>> > > > > > >>>>>>>>> >>> > > > > > >>>>>>>>>> Hi All! >>> > > > > > >>>>>>>>>> >>> > > > > > >>>>>>>>>> Thank you for the answers and concerns from >>> everyone. >>> > > > > > >>>>>>>>>> >>> > > > > > >>>>>>>>>> On the CLI vs State Metadata Connector/Table >>> question I >>> > > > would >>> > > > > > >>>> also >>> > > > > > >>>>>>> like >>> > > > > > >>>>>>>>> to >>> > > > > > >>>>>>>>>> step back a little and look at the bigger picture. >>> > > > > > >>>>>>>>>> >>> > > > > > >>>>>>>>>> I think the overall vision in Flink SQL is to >>> provide a >>> > > SQL >>> > > > > > >>>> native >>> > > > > > >>>>>>>>>> environment where we can serve complex use-cases >>> like >>> > you >>> > > > > > >> would >>> > > > > > >>>>>>> expect >>> > > > > > >>>>>>>>> in a >>> > > > > > >>>>>>>>>> regular database. >>> > > > > > >>>>>>>>>> Most features, developments in the recent years have >>> > gone >>> > > > > > >> this >>> > > > > > >>>>> way. >>> > > > > > >>>>>>>>>> >>> > > > > > >>>>>>>>>> The State Metadata Table would be a natural and >>> > > > > > >> straightforward >>> > > > > > >>>>> fit >>> > > > > > >>>>>>>> here. >>> > > > > > >>>>>>>>>> So from my side, +1 for that. >>> > > > > > >>>>>>>>>> >>> > > > > > >>>>>>>>>> However I could understand if we are not ready to >>> add a >>> > > new >>> > > > > > >>>>>>>>>> connector/format due to maintenance concerns (and in >>> > > general >>> > > > > > >>>>> concern >>> > > > > > >>>>>>>>> about >>> > > > > > >>>>>>>>>> the design). >>> > > > > > >>>>>>>>>> If that's the issue then we should spend more time >>> on >>> > the >>> > > > > > >>>> design >>> > > > > > >>>>> to >>> > > > > > >>>>>>> get >>> > > > > > >>>>>>>>>> comfortable with the approach and seek feedback >>> from the >>> > > > > > >> wider >>> > > > > > >>>>>>>> community >>> > > > > > >>>>>>>>>> >>> > > > > > >>>>>>>>>> I am -1 for the CLI/tooling approach as that will >>> not >>> > > > provide >>> > > > > > >>>> the >>> > > > > > >>>>>>>>>> featureset we are looking for that is not already >>> > covered >>> > > by >>> > > > > > >>>> the >>> > > > > > >>>>>>> Java >>> > > > > > >>>>>>>>>> connector. And that approach would come with the >>> same >>> > > > > > >>>> maintenance >>> > > > > > >>>>>>>>>> implications. >>> > > > > > >>>>>>>>>> >>> > > > > > >>>>>>>>>> Cheers >>> > > > > > >>>>>>>>>> Gyula >>> > > > > > >>>>>>>>>> >>> > > > > > >>>>>>>>>> >>> > > > > > >>>>>>>>>> On Wed, Mar 19, 2025 at 11:24β―AM Gabor Somogyi < >>> > > > > > >>>>>>>>> gabor.g.somo...@gmail.com> >>> > > > > > >>>>>>>>>> wrote: >>> > > > > > >>>>>>>>>> >>> > > > > > >>>>>>>>>>> Hi Zaklely, Shengkai >>> > > > > > >>>>>>>>>>> >>> > > > > > >>>>>>>>>>> Several topics are going on so adding gist answers >>> to >>> > > them. >>> > > > > > >>>> When >>> > > > > > >>>>>>> some >>> > > > > > >>>>>>>>>> topic >>> > > > > > >>>>>>>>>>> is not touched please highlight it. >>> > > > > > >>>>>>>>>>> >>> > > > > > >>>>>>>>>>> @Shengkai: I've read through all the previous FLIPs >>> > > related >>> > > > > > >>>>>>> catalogs >>> > > > > > >>>>>>>>> and >>> > > > > > >>>>>>>>>> if >>> > > > > > >>>>>>>>>>> we would like to keep the concepts there >>> > > > > > >>>>>>>>>>> then one-to-one mapping relationship between >>> savepoint >>> > > and >>> > > > > > >>>>> catalog >>> > > > > > >>>>>>>> is a >>> > > > > > >>>>>>>>>>> reasonable direction. In short I'm happy that >>> > > > > > >>>>>>>>>>> you've highlighted this and agree as a whole. I've >>> > > written >>> > > > > > >> it >>> > > > > > >>>>> down >>> > > > > > >>>>>>>>>>> previously, just want to double confirm that state >>> > > catalog >>> > > > > > >> is >>> > > > > > >>>>>>>>>>> essential and planned. When we reach this point >>> then >>> > your >>> > > > > > >>>> input >>> > > > > > >>>>> is >>> > > > > > >>>>>>>> more >>> > > > > > >>>>>>>>>>> than welcome. >>> > > > > > >>>>>>>>>>> >>> > > > > > >>>>>>>>>>> @Zakelly: We've tried the CLI and separate library >>> > > > > > >> approaches >>> > > > > > >>>>> with >>> > > > > > >>>>>>>>> users >>> > > > > > >>>>>>>>>>> already and these are not something which is >>> welcome >>> > > > > > >> because >>> > > > > > >>>> of >>> > > > > > >>>>>>> the >>> > > > > > >>>>>>>>>>> following: >>> > > > > > >>>>>>>>>>> * Users want to have automated tasks and not manual >>> > > > > > >>>> CLI/library >>> > > > > > >>>>>>>> output >>> > > > > > >>>>>>>>>>> parsing. This can be hacked around but our >>> experience >>> > is >>> > > > > > >>>>> negative >>> > > > > > >>>>>>> on >>> > > > > > >>>>>>>>> this >>> > > > > > >>>>>>>>>>> because it's just brittle. >>> > > > > > >>>>>>>>>>> * From development perspective It's way much bigger >>> > > effort >>> > > > > > >>>> than >>> > > > > > >>>>> a >>> > > > > > >>>>>>>>>> connector >>> > > > > > >>>>>>>>>>> (hard to test, packaging/version handling is and >>> extra >>> > > > > > >> layer >>> > > > > > >>>> of >>> > > > > > >>>>>>>>>> complexity, >>> > > > > > >>>>>>>>>>> external FS authentication is pain for users, >>> expecting >>> > > > > > >> them >>> > > > > > >>>> to >>> > > > > > >>>>>>>>> download >>> > > > > > >>>>>>>>>>> savepoints also) >>> > > > > > >>>>>>>>>>> * Purely personal opinion but if we would find >>> better >>> > > ways >>> > > > > > >>>> later >>> > > > > > >>>>>>> then >>> > > > > > >>>>>>>>>>> retire a CLI is not more lightweight than retire a >>> > > > > > >> connector >>> > > > > > >>>>>>>>>>> >>> > > > > > >>>>>>>>>>>> It would be great if you give some examples on how >>> > user >>> > > > > > >>>> could >>> > > > > > >>>>>>>>> leverage >>> > > > > > >>>>>>>>>>> the separate connector to process the metadata. >>> > > > > > >>>>>>>>>>> >>> > > > > > >>>>>>>>>>> The most simplest cases: >>> > > > > > >>>>>>>>>>> * give me the overgroving state uids >>> > > > > > >>>>>>>>>>> * give me the not known (new or renamed) state uids >>> > > > > > >>>>>>>>>>> * give me the state uids where state size >>> drastically >>> > > > > > >> dropped >>> > > > > > >>>>>>> compare >>> > > > > > >>>>>>>>> to >>> > > > > > >>>>>>>>>> a >>> > > > > > >>>>>>>>>>> previous savepoint (accidental state loss) >>> > > > > > >>>>>>>>>>> >>> > > > > > >>>>>>>>>>> Since it was mentioned: as a general offtopic >>> teaser, >>> > > yeah >>> > > > > > >> it >>> > > > > > >>>>>>> would >>> > > > > > >>>>>>>> be >>> > > > > > >>>>>>>>>> good >>> > > > > > >>>>>>>>>>> to have some sort of checkpoint/savepoint lineage >>> or >>> > > > > > >> however >>> > > > > > >>>> we >>> > > > > > >>>>>>> call >>> > > > > > >>>>>>>>> it. >>> > > > > > >>>>>>>>>>> Since we've not yet reached this point there are no >>> > > > > > >> technical >>> > > > > > >>>>>>>> details, >>> > > > > > >>>>>>>>>> it's >>> > > > > > >>>>>>>>>>> more like a vision. It's a common pattern that >>> > > > > > >>>>>>>>>>> jobs are physically running but somehow the state >>> > > > > > >> processing >>> > > > > > >>>> is >>> > > > > > >>>>>>> stuck >>> > > > > > >>>>>>>>> and >>> > > > > > >>>>>>>>>>> it would be good to add some way to find it out >>> > > > > > >>>> automatically. >>> > > > > > >>>>>>>>>>> The important saying here is automation and not >>> manual >>> > > > > > >>>>> evaluation >>> > > > > > >>>>>>>> since >>> > > > > > >>>>>>>>>>> handling 10k+ jobs is just not allowing that. >>> > > > > > >>>>>>>>>>> >>> > > > > > >>>>>>>>>>> BR, >>> > > > > > >>>>>>>>>>> G >>> > > > > > >>>>>>>>>>> >>> > > > > > >>>>>>>>>>> >>> > > > > > >>>>>>>>>>> On Wed, Mar 19, 2025 at 6:46β―AM Shengkai Fang < >>> > > > > > >>>>> fskm...@gmail.com> >>> > > > > > >>>>>>>>> wrote: >>> > > > > > >>>>>>>>>>> >>> > > > > > >>>>>>>>>>>> Hi, All. >>> > > > > > >>>>>>>>>>>> >>> > > > > > >>>>>>>>>>>> About State Catalog, I want to share more thoughts >>> > about >>> > > > > > >>>> this. >>> > > > > > >>>>>>>>>>>> >>> > > > > > >>>>>>>>>>>> In the initial design concept, I understood that a >>> > > > > > >>>> savepoint >>> > > > > > >>>>>>> and a >>> > > > > > >>>>>>>>>> state >>> > > > > > >>>>>>>>>>>> catalog have a one-to-one mapping relationship. >>> Each >>> > > > > > >>>> operator >>> > > > > > >>>>>>>>>> corresponds >>> > > > > > >>>>>>>>>>>> to a database, and the state of each operator is >>> > > > > > >>>> represented >>> > > > > > >>>>> as >>> > > > > > >>>>>>>>>>> individual >>> > > > > > >>>>>>>>>>>> tables. The rationale behind this design is: >>> > > > > > >>>>>>>>>>>> >>> > > > > > >>>>>>>>>>>> *State Diversity*: An operator may involve >>> multiple >>> > > types >>> > > > > > >>>> of >>> > > > > > >>>>>>>> states. >>> > > > > > >>>>>>>>>> For >>> > > > > > >>>>>>>>>>>> example, in our VVR design, a "multi-join" >>> operator >>> > uses >>> > > > > > >>>> keyed >>> > > > > > >>>>>>>> states >>> > > > > > >>>>>>>>>> for >>> > > > > > >>>>>>>>>>>> two input streams and a broadcast state for the >>> third >>> > > > > > >>>> stream. >>> > > > > > >>>>>>> This >>> > > > > > >>>>>>>>>> makes >>> > > > > > >>>>>>>>>>> it >>> > > > > > >>>>>>>>>>>> challenging to represent all states of an operator >>> > > > > > >> within a >>> > > > > > >>>>>>> single >>> > > > > > >>>>>>>>>> table. >>> > > > > > >>>>>>>>>>>> *Scalability*: Internally, an operator might have >>> > > > > > >> multiple >>> > > > > > >>>>> keyed >>> > > > > > >>>>>>>>> states >>> > > > > > >>>>>>>>>>>> (e.g., value state and list state). However, large >>> > list >>> > > > > > >>>> states >>> > > > > > >>>>>>> may >>> > > > > > >>>>>>>>> not >>> > > > > > >>>>>>>>>>> fit >>> > > > > > >>>>>>>>>>>> entirely in memory. To address this, we recommend >>> > > > > > >>>> implementing >>> > > > > > >>>>>>> each >>> > > > > > >>>>>>>>>> state >>> > > > > > >>>>>>>>>>>> as a separate table. >>> > > > > > >>>>>>>>>>>> >>> > > > > > >>>>>>>>>>>> To resolve the loosely coupled relationships >>> between >>> > > > > > >>>> operator >>> > > > > > >>>>>>>> states, >>> > > > > > >>>>>>>>>> we >>> > > > > > >>>>>>>>>>>> propose embedding predefined views within the >>> catalog. >>> > > > > > >>>> These >>> > > > > > >>>>>>> views >>> > > > > > >>>>>>>>>>> simplify >>> > > > > > >>>>>>>>>>>> user understanding of operator implementations and >>> > > > > > >> provide >>> > > > > > >>>> a >>> > > > > > >>>>>>> more >>> > > > > > >>>>>>>>>>> intuitive >>> > > > > > >>>>>>>>>>>> perspective. For instance, a join operator may >>> have >>> > > > > > >>>> multiple >>> > > > > > >>>>>>> state >>> > > > > > >>>>>>>>>>>> implementations (depending on whether the join key >>> > > > > > >> includes >>> > > > > > >>>>>>> unique >>> > > > > > >>>>>>>>>>>> attributes), but users primarily care about the >>> data >>> > > > > > >>>>> associated >>> > > > > > >>>>>>>> with >>> > > > > > >>>>>>>>> a >>> > > > > > >>>>>>>>>>>> specific join key across input streams. >>> > > > > > >>>>>>>>>>>> >>> > > > > > >>>>>>>>>>>> Returning to the one-to-one mapping between >>> savepoints >>> > > > > > >> and >>> > > > > > >>>>>>>> catalogs, >>> > > > > > >>>>>>>>> we >>> > > > > > >>>>>>>>>>> aim >>> > > > > > >>>>>>>>>>>> to manage multiple user state catalogs through a >>> > catalog >>> > > > > > >>>>> store. >>> > > > > > >>>>>>>> When >>> > > > > > >>>>>>>>> a >>> > > > > > >>>>>>>>>>> user >>> > > > > > >>>>>>>>>>>> triggers a savepoint for a job on the platform: >>> > > > > > >>>>>>>>>>>> >>> > > > > > >>>>>>>>>>>> 1. The platform sends a REST request to the >>> > JobManager. >>> > > > > > >>>>>>>>>>>> 2. Simultaneously, it registers a new state >>> catalog in >>> > > > > > >> the >>> > > > > > >>>>>>> catalog >>> > > > > > >>>>>>>>>> store, >>> > > > > > >>>>>>>>>>>> enabling immediate analysis of state data on the >>> > > > > > >> platform. >>> > > > > > >>>>>>>>>>>> 3. Deleting a savepoint would also trigger the >>> removal >>> > > of >>> > > > > > >>>> its >>> > > > > > >>>>>>>>>> associated >>> > > > > > >>>>>>>>>>>> catalog. >>> > > > > > >>>>>>>>>>>> >>> > > > > > >>>>>>>>>>>> This vision assumes that states are >>> self-describing or >>> > > > > > >>>> that a >>> > > > > > >>>>>>> state >>> > > > > > >>>>>>>>>>>> metaservice is introduced to analyze savepoint >>> > > > > > >> structures. >>> > > > > > >>>>>>>>>>>> >>> > > > > > >>>>>>>>>>>>> How can users create logic to identify >>> differences >>> > > > > > >>>> between >>> > > > > > >>>>>>>> multiple >>> > > > > > >>>>>>>>>>>> savepoints? >>> > > > > > >>>>>>>>>>>> >>> > > > > > >>>>>>>>>>>> Since savepoints and state catalogs are one-to-one >>> > > > > > >> mapped, >>> > > > > > >>>>> users >>> > > > > > >>>>>>>> can >>> > > > > > >>>>>>>>>>> query >>> > > > > > >>>>>>>>>>>> metadata via their respective catalogs. For >>> example: >>> > > > > > >>>>>>>>>>>> >>> > > > > > >>>>>>>>>>>> 1. >>> > > > > > >>>>> >>> `savepoint-${id}`.`system`.`metadata_table`.`<operator-name>` >>> > > > > > >>>>>>>>>> provides >>> > > > > > >>>>>>>>>>>> operator-specific metadata (e.g., state size, >>> type). >>> > > > > > >>>>>>>>>>>> 2. Comparing metadata tables (e.g., schema >>> versions, >>> > > > > > >> state >>> > > > > > >>>>> entry >>> > > > > > >>>>>>>>>> counts) >>> > > > > > >>>>>>>>>>>> across catalogs reveals structural or quantitative >>> > > > > > >>>>> differences. >>> > > > > > >>>>>>>>>>>> 3. For deeper analysis, users could write SQL >>> queries >>> > to >>> > > > > > >>>>> compare >>> > > > > > >>>>>>>>>> specific >>> > > > > > >>>>>>>>>>>> state partitions or leverage the metaservice to >>> track >>> > > > > > >> state >>> > > > > > >>>>>>>> evolution >>> > > > > > >>>>>>>>>>>> (e.g., added/removed operators, modified state >>> > > > > > >>>>> configurations). >>> > > > > > >>>>>>>>>>>> >>> > > > > > >>>>>>>>>>>> If we plan to introduce a state catalog in the >>> > future, I >>> > > > > > >>>> would >>> > > > > > >>>>>>> lean >>> > > > > > >>>>>>>>>>> toward >>> > > > > > >>>>>>>>>>>> using metadata tables. If a utility tool can >>> address >>> > the >>> > > > > > >>>>>>> challenges >>> > > > > > >>>>>>>>> we >>> > > > > > >>>>>>>>>>>> face, could we avoid introducing an additional >>> > > connector? >>> > > > > > >>>>>>>>>>>> >>> > > > > > >>>>>>>>>>>> Best, >>> > > > > > >>>>>>>>>>>> Shengkai >>> > > > > > >>>>>>>>>>>> >>> > > > > > >>>>>>>>>>>> Gyula FΓ³ra <gyula.f...@gmail.com> δΊ2025εΉ΄3ζ17ζ₯ε¨δΈ >>> > > 20:25ειοΌ >>> > > > > > >>>>>>>>>>>> >>> > > > > > >>>>>>>>>>>>> Hi All! >>> > > > > > >>>>>>>>>>>>> >>> > > > > > >>>>>>>>>>>>> Without going into too much detail here are my 2 >>> > cents >>> > > > > > >>>>>>> regarding >>> > > > > > >>>>>>>>> the >>> > > > > > >>>>>>>>>>>>> virtual column / catalog metadata / table >>> (connector) >>> > > > > > >>>>>>> discussion >>> > > > > > >>>>>>>>> for >>> > > > > > >>>>>>>>>>> the >>> > > > > > >>>>>>>>>>>>> State metadata. >>> > > > > > >>>>>>>>>>>>> >>> > > > > > >>>>>>>>>>>>> State metadata such as the types of states, their >>> > > > > > >>>>> properties, >>> > > > > > >>>>>>>>> names, >>> > > > > > >>>>>>>>>>>> sizes >>> > > > > > >>>>>>>>>>>>> etc are all valuable information that can be >>> used to >>> > > > > > >>>> enrich >>> > > > > > >>>>>>> the >>> > > > > > >>>>>>>>>>>>> computations we do on state. >>> > > > > > >>>>>>>>>>>>> We can either analyze it standalone (such as >>> discover >>> > > > > > >>>>>>> anomalies, >>> > > > > > >>>>>>>>> for >>> > > > > > >>>>>>>>>>>> large >>> > > > > > >>>>>>>>>>>>> jobs with many states), across multiple >>> savepoints >>> > > > > > >>>> (discover >>> > > > > > >>>>>>> how >>> > > > > > >>>>>>>>>> state >>> > > > > > >>>>>>>>>>>>> changed over time) or by joining it with keyed or >>> > > > > > >>>> non-keyed >>> > > > > > >>>>>>> state >>> > > > > > >>>>>>>>>> data >>> > > > > > >>>>>>>>>>> to >>> > > > > > >>>>>>>>>>>>> serve more complex queries on the state. >>> > > > > > >>>>>>>>>>>>> >>> > > > > > >>>>>>>>>>>>> The only solution that seems to serve all these >>> > > > > > >> use-cases >>> > > > > > >>>>> and >>> > > > > > >>>>>>>>>>>> requirements >>> > > > > > >>>>>>>>>>>>> in a straightforward and SQL canonical way is to >>> > simply >>> > > > > > >>>>> expose >>> > > > > > >>>>>>>> the >>> > > > > > >>>>>>>>>>> state >>> > > > > > >>>>>>>>>>>>> metadata as a separate table. This is a metadata >>> > table >>> > > > > > >>>> but >>> > > > > > >>>>> you >>> > > > > > >>>>>>>> can >>> > > > > > >>>>>>>>>> also >>> > > > > > >>>>>>>>>>>>> think of it as data table, it makes no practical >>> > > > > > >>>> difference >>> > > > > > >>>>>>> here. >>> > > > > > >>>>>>>>>>>>> >>> > > > > > >>>>>>>>>>>>> Once we have a catalog later, the catalog can >>> offer >>> > > > > > >> this >>> > > > > > >>>>> table >>> > > > > > >>>>>>>> out >>> > > > > > >>>>>>>>> of >>> > > > > > >>>>>>>>>>> the >>> > > > > > >>>>>>>>>>>>> box, the same way databases provide metadata >>> tables. >>> > > > > > >> For >>> > > > > > >>>>> this >>> > > > > > >>>>>>> to >>> > > > > > >>>>>>>>> work >>> > > > > > >>>>>>>>>>>>> however we need another, simpler connector that >>> > creates >>> > > > > > >>>> this >>> > > > > > >>>>>>>> table. >>> > > > > > >>>>>>>>>>>>> >>> > > > > > >>>>>>>>>>>>> +1 for state metadata as a separate >>> connector/table, >>> > > > > > >>>> instead >>> > > > > > >>>>>>> of >>> > > > > > >>>>>>>>>> adding >>> > > > > > >>>>>>>>>>>>> virtual columns and adhoc catalog metadata that >>> is >>> > hard >>> > > > > > >>>> to >>> > > > > > >>>>> use >>> > > > > > >>>>>>>> in a >>> > > > > > >>>>>>>>>>> large >>> > > > > > >>>>>>>>>>>>> number of queries. >>> > > > > > >>>>>>>>>>>>> >>> > > > > > >>>>>>>>>>>>> Cheers, >>> > > > > > >>>>>>>>>>>>> Gyula >>> > > > > > >>>>>>>>>>>>> >>> > > > > > >>>>>>>>>>>>> On Mon, Mar 17, 2025 at 12:44β―PM Gabor Somogyi < >>> > > > > > >>>>>>>>>>>> gabor.g.somo...@gmail.com> >>> > > > > > >>>>>>>>>>>>> wrote: >>> > > > > > >>>>>>>>>>>>> >>> > > > > > >>>>>>>>>>>>>> 1. State TTL for Value Columns >>> > > > > > >>>>>>>>>>>>>> >>> > > > > > >>>>>>>>>>>>>>> Iβm planning on adding this, and we may >>> collaborate >>> > > > > > >>>> on >>> > > > > > >>>>> it >>> > > > > > >>>>>>> in >>> > > > > > >>>>>>>>> the >>> > > > > > >>>>>>>>>>>>> future. >>> > > > > > >>>>>>>>>>>>>> >>> > > > > > >>>>>>>>>>>>>> +1 on this, just ping me. >>> > > > > > >>>>>>>>>>>>>> >>> > > > > > >>>>>>>>>>>>>> 2. Metadata Table vs. Metadata Column >>> > > > > > >>>>>>>>>>>>>> >>> > > > > > >>>>>>>>>>>>>> After some code digging and POC all I can say >>> that >>> > > > > > >> with >>> > > > > > >>>>>>> heavy >>> > > > > > >>>>>>>>>> effort >>> > > > > > >>>>>>>>>>> we >>> > > > > > >>>>>>>>>>>>> can >>> > > > > > >>>>>>>>>>>>>> maybe add such changes that we're able to show >>> > > > > > >> metadata >>> > > > > > >>>>> of a >>> > > > > > >>>>>>>>>>> savepoint >>> > > > > > >>>>>>>>>>>>> from >>> > > > > > >>>>>>>>>>>>>> catalog. >>> > > > > > >>>>>>>>>>>>>> I'm not against that but from user perspective >>> this >>> > > > > > >> has >>> > > > > > >>>>>>> limited >>> > > > > > >>>>>>>>>>> value, >>> > > > > > >>>>>>>>>>>>> let >>> > > > > > >>>>>>>>>>>>>> me explain why. >>> > > > > > >>>>>>>>>>>>>> >>> > > > > > >>>>>>>>>>>>>> From high level perspective I see the following >>> > > > > > >> which I >>> > > > > > >>>>> see >>> > > > > > >>>>>>>>>> agreement >>> > > > > > >>>>>>>>>>>> on: >>> > > > > > >>>>>>>>>>>>>> * We should have a catalog which is >>> representing one >>> > > > > > >> or >>> > > > > > >>>>> more >>> > > > > > >>>>>>>> jobs >>> > > > > > >>>>>>>>>>>>> savepoint >>> > > > > > >>>>>>>>>>>>>> data set (future plan) >>> > > > > > >>>>>>>>>>>>>> * Savepoints should be able to be registered in >>> the >>> > > > > > >>>>> catalog >>> > > > > > >>>>>>>> which >>> > > > > > >>>>>>>>>> are >>> > > > > > >>>>>>>>>>>>> then >>> > > > > > >>>>>>>>>>>>>> databases (future plan) >>> > > > > > >>>>>>>>>>>>>> * There must be a possiblity to create tables >>> from >>> > > > > > >>>>> databases >>> > > > > > >>>>>>>>> where >>> > > > > > >>>>>>>>>>>> users >>> > > > > > >>>>>>>>>>>>>> can read state data (exists already) >>> > > > > > >>>>>>>>>>>>>> >>> > > > > > >>>>>>>>>>>>>> In terms of metadata, If I understand correctly >>> then >>> > > > > > >>>> the >>> > > > > > >>>>>>>>> suggested >>> > > > > > >>>>>>>>>>>>> approach >>> > > > > > >>>>>>>>>>>>>> would be to access >>> > > > > > >>>>>>>>>>>>>> it from the catalog describe command, right? >>> Adding >>> > > > > > >>>> that >>> > > > > > >>>>>>> info >>> > > > > > >>>>>>>>> when >>> > > > > > >>>>>>>>>>>>> specific >>> > > > > > >>>>>>>>>>>>>> database describe command >>> > > > > > >>>>>>>>>>>>>> is executed could be done. >>> > > > > > >>>>>>>>>>>>>> >>> > > > > > >>>>>>>>>>>>>> The question is for instance how can users >>> create >>> > > > > > >> such >>> > > > > > >>>> a >>> > > > > > >>>>>>> logic >>> > > > > > >>>>>>>>> that >>> > > > > > >>>>>>>>>>>> tells >>> > > > > > >>>>>>>>>>>>>> them what is >>> > > > > > >>>>>>>>>>>>>> the difference between multiple savepoints? >>> > > > > > >>>>>>>>>>>>>> Just to give some examples: >>> > > > > > >>>>>>>>>>>>>> * per operator size changes between savepoints >>> > > > > > >>>>>>>>>>>>>> * show values from operator data where state >>> size >>> > > > > > >>>> reaches >>> > > > > > >>>>> a >>> > > > > > >>>>>>>>>> boundary >>> > > > > > >>>>>>>>>>>>>> * in general "find which checkpoint ruined >>> things" >>> > is >>> > > > > > >>>>> quite >>> > > > > > >>>>>>>>> common >>> > > > > > >>>>>>>>>>>>> pattern >>> > > > > > >>>>>>>>>>>>>> What I would like to highlight here is that from >>> > > > > > >> Flink >>> > > > > > >>>>>>> point of >>> > > > > > >>>>>>>>>> view >>> > > > > > >>>>>>>>>>>> the >>> > > > > > >>>>>>>>>>>>>> metadata can be >>> > > > > > >>>>>>>>>>>>>> considered as a static side output information >>> but >>> > > > > > >> for >>> > > > > > >>>>> users >>> > > > > > >>>>>>>>> these >>> > > > > > >>>>>>>>>>>> values >>> > > > > > >>>>>>>>>>>>>> are actual real data >>> > > > > > >>>>>>>>>>>>>> where logic is planned to build around. >>> > > > > > >>>>>>>>>>>>>> >>> > > > > > >>>>>>>>>>>>>>> The metadata is more like one-time information >>> > > > > > >>>> instead >>> > > > > > >>>>> of >>> > > > > > >>>>>>> a >>> > > > > > >>>>>>>>>>> streaming >>> > > > > > >>>>>>>>>>>>>> data that changes all >>> > > > > > >>>>>>>>>>>>>> the time, so a single connector seems to be an >>> > > > > > >>>> overkill. >>> > > > > > >>>>>>>>>>>>>> >>> > > > > > >>>>>>>>>>>>>> State data is also static within a savepoint and >>> > > > > > >> that's >>> > > > > > >>>>> the >>> > > > > > >>>>>>>>> reason >>> > > > > > >>>>>>>>>>> why >>> > > > > > >>>>>>>>>>>>> the >>> > > > > > >>>>>>>>>>>>>> state processor API is working in batch mode. >>> > > > > > >>>>>>>>>>>>>> When we handle multiple checkpoints in a >>> streaming >>> > > > > > >>>> fashion >>> > > > > > >>>>>>> then >>> > > > > > >>>>>>>>>> this >>> > > > > > >>>>>>>>>>>> can >>> > > > > > >>>>>>>>>>>>> be >>> > > > > > >>>>>>>>>>>>>> viewed from another angle. >>> > > > > > >>>>>>>>>>>>>> >>> > > > > > >>>>>>>>>>>>>> We can come up with more lightweight solution >>> other >>> > > > > > >>>> than a >>> > > > > > >>>>>>> new >>> > > > > > >>>>>>>>>>>> connector >>> > > > > > >>>>>>>>>>>>>> but enforcing users to parse the catalog >>> > > > > > >>>>>>>>>>>>>> describe command output in order to compare >>> multiple >>> > > > > > >>>>>>> savepoints >>> > > > > > >>>>>>>>>>> doesn't >>> > > > > > >>>>>>>>>>>>>> sound smooth user experience. >>> > > > > > >>>>>>>>>>>>>> Honestly I've no other idea how exposing >>> metadata as >>> > > > > > >>>> real >>> > > > > > >>>>>>> user >>> > > > > > >>>>>>>>> data >>> > > > > > >>>>>>>>>>> so >>> > > > > > >>>>>>>>>>>>>> waiting on other approaches. >>> > > > > > >>>>>>>>>>>>>> >>> > > > > > >>>>>>>>>>>>>> BR, >>> > > > > > >>>>>>>>>>>>>> G >>> > > > > > >>>>>>>>>>>>>> >>> > > > > > >>>>>>>>>>>>>> >>> > > > > > >>>>>>>>>>>>>> On Thu, Mar 13, 2025 at 2:44β―AM Shengkai Fang < >>> > > > > > >>>>>>>> fskm...@gmail.com >>> > > > > > >>>>>>>>>> >>> > > > > > >>>>>>>>>>>> wrote: >>> > > > > > >>>>>>>>>>>>>> >>> > > > > > >>>>>>>>>>>>>>> Looking forward to hearing the good news! >>> > > > > > >>>>>>>>>>>>>>> >>> > > > > > >>>>>>>>>>>>>>> Best, >>> > > > > > >>>>>>>>>>>>>>> Shengkai >>> > > > > > >>>>>>>>>>>>>>> >>> > > > > > >>>>>>>>>>>>>>> Gabor Somogyi <gabor.g.somo...@gmail.com> >>> > > > > > >>>> δΊ2025εΉ΄3ζ12ζ₯ε¨δΈ >>> > > > > > >>>>>>>>> 22:24ειοΌ >>> > > > > > >>>>>>>>>>>>>>> >>> > > > > > >>>>>>>>>>>>>>>> Thanks for both the valuable input! >>> > > > > > >>>>>>>>>>>>>>>> >>> > > > > > >>>>>>>>>>>>>>>> Let me take a closer look at the suggestions, >>> > > > > > >> like >>> > > > > > >>>> the >>> > > > > > >>>>>>>>> Catalog >>> > > > > > >>>>>>>>>>>>>>> capabilities >>> > > > > > >>>>>>>>>>>>>>>> and possibility of embedding TypeInformation >>> or >>> > > > > > >>>>>>>>>>>>>>>> StateDescriptor metadata directly into the raw >>> > > > > > >>>> state >>> > > > > > >>>>>>>> files... >>> > > > > > >>>>>>>>>>>>>>>> >>> > > > > > >>>>>>>>>>>>>>>> BR, >>> > > > > > >>>>>>>>>>>>>>>> G >>> > > > > > >>>>>>>>>>>>>>>> >>> > > > > > >>>>>>>>>>>>>>>> >>> > > > > > >>>>>>>>>>>>>>>> On Wed, Mar 12, 2025 at 8:17β―AM Shengkai Fang >>> < >>> > > > > > >>>>>>>>>> fskm...@gmail.com >>> > > > > > >>>>>>>>>>>> >>> > > > > > >>>>>>>>>>>>>> wrote: >>> > > > > > >>>>>>>>>>>>>>>> >>> > > > > > >>>>>>>>>>>>>>>>> Thanks for Zakelly's clarification. >>> > > > > > >>>>>>>>>>>>>>>>> >>> > > > > > >>>>>>>>>>>>>>>>> 1. State TTL for Value Columns >>> > > > > > >>>>>>>>>>>>>>>>> >>> > > > > > >>>>>>>>>>>>>>>>> +1 to delay the discussion about this. >>> > > > > > >>>>>>>>>>>>>>>>> >>> > > > > > >>>>>>>>>>>>>>>>> 2. Metadata Table vs. Metadata Column >>> > > > > > >>>>>>>>>>>>>>>>> >>> > > > > > >>>>>>>>>>>>>>>>> Iβd like to share my perspective on the State >>> > > > > > >>>>> Catalog >>> > > > > > >>>>>>>>>> proposal. >>> > > > > > >>>>>>>>>>>>> While >>> > > > > > >>>>>>>>>>>>>>>>> introducing this capability is beneficial, >>> > > > > > >> there >>> > > > > > >>>> is >>> > > > > > >>>>> a >>> > > > > > >>>>>>>>>> blocker: >>> > > > > > >>>>>>>>>>>> the >>> > > > > > >>>>>>>>>>>>>>>> current >>> > > > > > >>>>>>>>>>>>>>>>> StateBackend architecture does not permit >>> > > > > > >>>> operators >>> > > > > > >>>>> to >>> > > > > > >>>>>>>>> encode >>> > > > > > >>>>>>>>>>>>>>>>> TypeInformation into the stateβit only >>> > > > > > >> preserves >>> > > > > > >>>> the >>> > > > > > >>>>>>>>>>> Serializer. >>> > > > > > >>>>>>>>>>>>> This >>> > > > > > >>>>>>>>>>>>>>>>> limitation creates an asymmetry, as operators >>> > > > > > >>>> alone >>> > > > > > >>>>>>>> retain >>> > > > > > >>>>>>>>>>>>> knowledge >>> > > > > > >>>>>>>>>>>>>> of >>> > > > > > >>>>>>>>>>>>>>>> the >>> > > > > > >>>>>>>>>>>>>>>>> data structureβs schema. >>> > > > > > >>>>>>>>>>>>>>>>> >>> > > > > > >>>>>>>>>>>>>>>>> To address this, I suggest allowing operators >>> > > > > > >> to >>> > > > > > >>>>> embed >>> > > > > > >>>>>>>>>>>>>> TypeInformation >>> > > > > > >>>>>>>>>>>>>>> or >>> > > > > > >>>>>>>>>>>>>>>>> StateDescriptor metadata directly into the >>> raw >>> > > > > > >>>> state >>> > > > > > >>>>>>>> files. >>> > > > > > >>>>>>>>>>> Such >>> > > > > > >>>>>>>>>>>> a >>> > > > > > >>>>>>>>>>>>>>> design >>> > > > > > >>>>>>>>>>>>>>>>> would enable the Catalog to: >>> > > > > > >>>>>>>>>>>>>>>>> >>> > > > > > >>>>>>>>>>>>>>>>> 1. Parse state files and programmatically >>> > > > > > >> derive >>> > > > > > >>>> the >>> > > > > > >>>>>>>> schema >>> > > > > > >>>>>>>>>> and >>> > > > > > >>>>>>>>>>>>>>>> structural >>> > > > > > >>>>>>>>>>>>>>>>> guarantees for each state. >>> > > > > > >>>>>>>>>>>>>>>>> 2. Leverage existing Flink Table utilities, >>> > > > > > >> such >>> > > > > > >>>> as >>> > > > > > >>>>>>>>>>>>>>>>> LegacyTypeInfoDataTypeConverter (in >>> > > > > > >>>>>>>>>>>>>>> org.apache.flink.table.types.utils), >>> > > > > > >>>>>>>>>>>>>>>> to >>> > > > > > >>>>>>>>>>>>>>>>> bridge TypeInformation and DataType >>> > > > > > >> conversions. >>> > > > > > >>>>>>>>>>>>>>>>> >>> > > > > > >>>>>>>>>>>>>>>>> If we can not store the TypeInformation or >>> > > > > > >>>>>>>> StateDescriptor >>> > > > > > >>>>>>>>>> into >>> > > > > > >>>>>>>>>>>> the >>> > > > > > >>>>>>>>>>>>>> raw >>> > > > > > >>>>>>>>>>>>>>>>> state files, I am +1 for this FLIP to use >>> > > > > > >>>> metadata >>> > > > > > >>>>>>> column >>> > > > > > >>>>>>>>> to >>> > > > > > >>>>>>>>>>>>> retrieve >>> > > > > > >>>>>>>>>>>>>>>>> information. >>> > > > > > >>>>>>>>>>>>>>>>> >>> > > > > > >>>>>>>>>>>>>>>>> Best, >>> > > > > > >>>>>>>>>>>>>>>>> Shengkai >>> > > > > > >>>>>>>>>>>>>>>>> >>> > > > > > >>>>>>>>>>>>>>>>> >>> > > > > > >>>>>>>>>>>>>>>>> >>> > > > > > >>>>>>>>>>>>>>>>> >>> > > > > > >>>>>>>>>>>>>>>>> Zakelly Lan <zakelly....@gmail.com> >>> > > > > > >>>> δΊ2025εΉ΄3ζ12ζ₯ε¨δΈ >>> > > > > > >>>>>>>> 12:43ειοΌ >>> > > > > > >>>>>>>>>>>>>>>>> >>> > > > > > >>>>>>>>>>>>>>>>>> Hi Gabor and Shengkai, >>> > > > > > >>>>>>>>>>>>>>>>>> >>> > > > > > >>>>>>>>>>>>>>>>>> Thanks for sharing your thoughts! This is a >>> > > > > > >>>> long >>> > > > > > >>>>>>>>> discussion >>> > > > > > >>>>>>>>>>> and >>> > > > > > >>>>>>>>>>>>>> sorry >>> > > > > > >>>>>>>>>>>>>>>> for >>> > > > > > >>>>>>>>>>>>>>>>>> the late reply (I'm busy catching up with >>> > > > > > >>>> release >>> > > > > > >>>>>>> 2.0 >>> > > > > > >>>>>>>>> these >>> > > > > > >>>>>>>>>>>>> days). >>> > > > > > >>>>>>>>>>>>>>>>>> >>> > > > > > >>>>>>>>>>>>>>>>>> 1. State TTL for Value Columns >>> > > > > > >>>>>>>>>>>>>>>>>> >>> > > > > > >>>>>>>>>>>>>>>>>> >>> > > > > > >>>>>>>>>>>>>>>>>> Let me first clarify your thoughts to ensure >>> > > > > > >> I >>> > > > > > >>>>>>>> understand >>> > > > > > >>>>>>>>>>>>>> correctly. >>> > > > > > >>>>>>>>>>>>>>>>> IIUC, >>> > > > > > >>>>>>>>>>>>>>>>>> there is no persistent configuration for >>> > > > > > >> state >>> > > > > > >>>> TTL >>> > > > > > >>>>>>> in >>> > > > > > >>>>>>>> the >>> > > > > > >>>>>>>>>>>>>> checkpoint. >>> > > > > > >>>>>>>>>>>>>>>>> While >>> > > > > > >>>>>>>>>>>>>>>>>> you can infer that TTL is enabled by reading >>> > > > > > >>>> the >>> > > > > > >>>>>>>>>> serializer, >>> > > > > > >>>>>>>>>>>> the >>> > > > > > >>>>>>>>>>>>>>>>> checkpoint >>> > > > > > >>>>>>>>>>>>>>>>>> itself only stores the last access time for >>> > > > > > >>>> each >>> > > > > > >>>>>>> value. >>> > > > > > >>>>>>>>> So >>> > > > > > >>>>>>>>>>> the >>> > > > > > >>>>>>>>>>>>> only >>> > > > > > >>>>>>>>>>>>>>>> thing >>> > > > > > >>>>>>>>>>>>>>>>>> we can show is the last access time for each >>> > > > > > >>>>> value. >>> > > > > > >>>>>>> But >>> > > > > > >>>>>>>>> it >>> > > > > > >>>>>>>>>> is >>> > > > > > >>>>>>>>>>>> not >>> > > > > > >>>>>>>>>>>>>>>>> required >>> > > > > > >>>>>>>>>>>>>>>>>> for all state backends to store this, as >>> they >>> > > > > > >>>> may >>> > > > > > >>>>>>>>> directly >>> > > > > > >>>>>>>>>>>> store >>> > > > > > >>>>>>>>>>>>>> the >>> > > > > > >>>>>>>>>>>>>>>>>> expired time. This will also increase the >>> > > > > > >>>>>>> difficulty of >>> > > > > > >>>>>>>>>>>>>>> implementation >>> > > > > > >>>>>>>>>>>>>>>> & >>> > > > > > >>>>>>>>>>>>>>>>>> maintenance. >>> > > > > > >>>>>>>>>>>>>>>>>> >>> > > > > > >>>>>>>>>>>>>>>>>> This once again reiterates the importance of >>> > > > > > >>>>> unified >>> > > > > > >>>>>>>>>> metadata >>> > > > > > >>>>>>>>>>>> for >>> > > > > > >>>>>>>>>>>>>>>>>> checkpoints. Iβm planning on adding this, >>> and >>> > > > > > >>>> we >>> > > > > > >>>>> may >>> > > > > > >>>>>>>>>>>> collaborate >>> > > > > > >>>>>>>>>>>>> on >>> > > > > > >>>>>>>>>>>>>>> it >>> > > > > > >>>>>>>>>>>>>>>> in >>> > > > > > >>>>>>>>>>>>>>>>>> the future. >>> > > > > > >>>>>>>>>>>>>>>>>> >>> > > > > > >>>>>>>>>>>>>>>>>> 2. Metadata Table vs. Metadata Column >>> > > > > > >>>>>>>>>>>>>>>>>> >>> > > > > > >>>>>>>>>>>>>>>>>> >>> > > > > > >>>>>>>>>>>>>>>>>> I'm not in favor of adding a new connector >>> > > > > > >> for >>> > > > > > >>>>>>>> metadata. >>> > > > > > >>>>>>>>>> The >>> > > > > > >>>>>>>>>>>>>> metadata >>> > > > > > >>>>>>>>>>>>>>>> is >>> > > > > > >>>>>>>>>>>>>>>>>> more like one-time information instead of a >>> > > > > > >>>>>>> streaming >>> > > > > > >>>>>>>>> data >>> > > > > > >>>>>>>>>>> that >>> > > > > > >>>>>>>>>>>>>>> changes >>> > > > > > >>>>>>>>>>>>>>>>> all >>> > > > > > >>>>>>>>>>>>>>>>>> the time, so a single connector seems to be >>> > > > > > >> an >>> > > > > > >>>>>>>> overkill. >>> > > > > > >>>>>>>>> It >>> > > > > > >>>>>>>>>>> is >>> > > > > > >>>>>>>>>>>>> not >>> > > > > > >>>>>>>>>>>>>>> easy >>> > > > > > >>>>>>>>>>>>>>>>> to >>> > > > > > >>>>>>>>>>>>>>>>>> withdraw a connector if we have a better >>> > > > > > >>>> solution >>> > > > > > >>>>> in >>> > > > > > >>>>>>>>>> future. >>> > > > > > >>>>>>>>>>>> I'm >>> > > > > > >>>>>>>>>>>>>> not >>> > > > > > >>>>>>>>>>>>>>>>>> familiar with current Catalog capabilities, >>> > > > > > >>>> and if >>> > > > > > >>>>>>> it >>> > > > > > >>>>>>>>> could >>> > > > > > >>>>>>>>>>>>> extract >>> > > > > > >>>>>>>>>>>>>>> and >>> > > > > > >>>>>>>>>>>>>>>>>> show some operator-level information from >>> > > > > > >>>>> savepoint, >>> > > > > > >>>>>>>> that >>> > > > > > >>>>>>>>>>> would >>> > > > > > >>>>>>>>>>>>> be >>> > > > > > >>>>>>>>>>>>>>>> great. >>> > > > > > >>>>>>>>>>>>>>>>>> >>> > > > > > >>>>>>>>>>>>>>>>>> If the Catalog can't do that, I would >>> > > > > > >> consider >>> > > > > > >>>> the >>> > > > > > >>>>>>>>> current >>> > > > > > >>>>>>>>>>> FLIP >>> > > > > > >>>>>>>>>>>>> to >>> > > > > > >>>>>>>>>>>>>>> be a >>> > > > > > >>>>>>>>>>>>>>>>>> compromise solution. >>> > > > > > >>>>>>>>>>>>>>>>>> >>> > > > > > >>>>>>>>>>>>>>>>>> And if we have that unified metadata for >>> > > > > > >>>>>>>>>> checkpoint/savepoint >>> > > > > > >>>>>>>>>>>> in >>> > > > > > >>>>>>>>>>>>>>>> future, >>> > > > > > >>>>>>>>>>>>>>>>> we >>> > > > > > >>>>>>>>>>>>>>>>>> may directly register savepoint in catalog, >>> > > > > > >> and >>> > > > > > >>>>>>> create >>> > > > > > >>>>>>>> a >>> > > > > > >>>>>>>>>>> source >>> > > > > > >>>>>>>>>>>>>>> without >>> > > > > > >>>>>>>>>>>>>>>>>> specifying complex columns, as well as >>> > > > > > >> describe >>> > > > > > >>>>> the >>> > > > > > >>>>>>>>>> savepoint >>> > > > > > >>>>>>>>>>>>>> catalog >>> > > > > > >>>>>>>>>>>>>>>> to >>> > > > > > >>>>>>>>>>>>>>>>>> get the metadata. That's a good solution in >>> > > > > > >> my >>> > > > > > >>>>> mind. >>> > > > > > >>>>>>>>>>>>>>>>>> >>> > > > > > >>>>>>>>>>>>>>>>>> >>> > > > > > >>>>>>>>>>>>>>>>>> Best, >>> > > > > > >>>>>>>>>>>>>>>>>> Zakelly >>> > > > > > >>>>>>>>>>>>>>>>>> >>> > > > > > >>>>>>>>>>>>>>>>>> On Wed, Mar 12, 2025 at 10:35β―AM Shengkai >>> > > > > > >> Fang >>> > > > > > >>>> < >>> > > > > > >>>>>>>>>>>>> fskm...@gmail.com> >>> > > > > > >>>>>>>>>>>>>>>>> wrote: >>> > > > > > >>>>>>>>>>>>>>>>>> >>> > > > > > >>>>>>>>>>>>>>>>>>> Hi Gabor, >>> > > > > > >>>>>>>>>>>>>>>>>>> >>> > > > > > >>>>>>>>>>>>>>>>>>>> 2. Adding a new connector with >>> > > > > > >>>>>>> `savepoint-metadata` >>> > > > > > >>>>>>>>>>>>>>>>>>> >>> > > > > > >>>>>>>>>>>>>>>>>>> I would argue against introducing a new >>> > > > > > >>>>> connector >>> > > > > > >>>>>>>> type >>> > > > > > >>>>>>>>>>> named >>> > > > > > >>>>>>>>>>>>>>>>>>> savepoint-metadata, as the existing Catalog >>> > > > > > >>>>>>> mechanism >>> > > > > > >>>>>>>>> can >>> > > > > > >>>>>>>>>>>>>>> inherently >>> > > > > > >>>>>>>>>>>>>>>>>>> provide the necessary connector factory >>> > > > > > >>>>>>> capabilities. >>> > > > > > >>>>>>>>>> Iβve >>> > > > > > >>>>>>>>>>>>>> detailed >>> > > > > > >>>>>>>>>>>>>>>>> this >>> > > > > > >>>>>>>>>>>>>>>>>>> proposal in branch[1]. Please take a moment >>> > > > > > >>>> to >>> > > > > > >>>>>>> review >>> > > > > > >>>>>>>>> it. >>> > > > > > >>>>>>>>>>>>>>>>>>> >>> > > > > > >>>>>>>>>>>>>>>>>>> If we introduce a connector named >>> > > > > > >>>>>>>> `savepoint-metadata`, >>> > > > > > >>>>>>>>>> it >>> > > > > > >>>>>>>>>>>>> means >>> > > > > > >>>>>>>>>>>>>>> user >>> > > > > > >>>>>>>>>>>>>>>>> can >>> > > > > > >>>>>>>>>>>>>>>>>>> create a temporary table with connector >>> > > > > > >>>>>>>>>>> `savepoint-metadata` >>> > > > > > >>>>>>>>>>>>> and >>> > > > > > >>>>>>>>>>>>>>> the >>> > > > > > >>>>>>>>>>>>>>>>>>> connector needs to check whether table >>> > > > > > >>>> schema is >>> > > > > > >>>>>>> same >>> > > > > > >>>>>>>>> to >>> > > > > > >>>>>>>>>>> the >>> > > > > > >>>>>>>>>>>>>> schema >>> > > > > > >>>>>>>>>>>>>>>> we >>> > > > > > >>>>>>>>>>>>>>>>>>> proposed in the FLIP. On the other hand, >>> > > > > > >> it's >>> > > > > > >>>>> not >>> > > > > > >>>>>>>> easy >>> > > > > > >>>>>>>>>> work >>> > > > > > >>>>>>>>>>>> for >>> > > > > > >>>>>>>>>>>>>>>> others >>> > > > > > >>>>>>>>>>>>>>>>> to >>> > > > > > >>>>>>>>>>>>>>>>>>> users a metadata table with same schema. >>> > > > > > >>>>>>>>>>>>>>>>>>> >>> > > > > > >>>>>>>>>>>>>>>>>>> [1] >>> > > > > > >>>>>>>>>>>>>>>>>>> >>> > > > > > >>>>>>>>>>>>>>>>>>> >>> > > > > > >>>>>>>>>>>>>>>>>> >>> > > > > > >>>>>>>>>>>>>>>>> >>> > > > > > >>>>>>>>>>>>>>>> >>> > > > > > >>>>>>>>>>>>>>> >>> > > > > > >>>>>>>>>>>>>> >>> > > > > > >>>>>>>>>>>>> >>> > > > > > >>>>>>>>>>>> >>> > > > > > >>>>>>>>>>> >>> > > > > > >>>>>>>>>> >>> > > > > > >>>>>>>>> >>> > > > > > >>>>>>>> >>> > > > > > >>>>>>> >>> > > > > > >>>>> >>> > > > > > >>>> >>> > > > > > >> >>> > > > > > >>> > > > > >>> > > > >>> > > >>> > >>> https://github.com/apache/flink/compare/master...fsk119:flink:state-metadata?expand=1#diff-712a7bc92fe46c405fb0e61b475bb2a005cb7a72bab7df28bbb92744bcb5f465R63 >>> > > > > > >>>>>>>>>>>>>>>>>>> >>> > > > > > >>>>>>>>>>>>>>>>>>> Best, >>> > > > > > >>>>>>>>>>>>>>>>>>> Shengkai >>> > > > > > >>>>>>>>>>>>>>>>>>> >>> > > > > > >>>>>>>>>>>>>>>>>>> Gabor Somogyi <gabor.g.somo...@gmail.com> >>> > > > > > >>>>>>>>> δΊ2025εΉ΄3ζ11ζ₯ε¨δΊ >>> > > > > > >>>>>>>>>>>>> 16:56ειοΌ >>> > > > > > >>>>>>>>>>>>>>>>>>> >>> > > > > > >>>>>>>>>>>>>>>>>>>> Hi Shengkai, >>> > > > > > >>>>>>>>>>>>>>>>>>>> >>> > > > > > >>>>>>>>>>>>>>>>>>>>> 1. State TTL for Value Columns >>> > > > > > >>>>>>>>>>>>>>>>>>>> >>> > > > > > >>>>>>>>>>>>>>>>>>>> From directional perspective I agree your >>> > > > > > >>>> idea >>> > > > > > >>>>>>> how >>> > > > > > >>>>>>>> it >>> > > > > > >>>>>>>>>> can >>> > > > > > >>>>>>>>>>>> be >>> > > > > > >>>>>>>>>>>>>>>>>> implemented. >>> > > > > > >>>>>>>>>>>>>>>>>>>> Previously I've mentioned that TTL >>> > > > > > >>>> information >>> > > > > > >>>>>>> is >>> > > > > > >>>>>>>> not >>> > > > > > >>>>>>>>>>>> exposed >>> > > > > > >>>>>>>>>>>>>> on >>> > > > > > >>>>>>>>>>>>>>>> the >>> > > > > > >>>>>>>>>>>>>>>>>>> state >>> > > > > > >>>>>>>>>>>>>>>>>>>> processor API (which the SQL state >>> > > > > > >>>> connector >>> > > > > > >>>>>>> uses >>> > > > > > >>>>>>>> to >>> > > > > > >>>>>>>>>> read >>> > > > > > >>>>>>>>>>>>> data) >>> > > > > > >>>>>>>>>>>>>>>>>>>> and unless somebody show me the opposite >>> > > > > > >>>> this >>> > > > > > >>>>>>> FLIP >>> > > > > > >>>>>>>> is >>> > > > > > >>>>>>>>>> not >>> > > > > > >>>>>>>>>>>>> going >>> > > > > > >>>>>>>>>>>>>>> to >>> > > > > > >>>>>>>>>>>>>>>>>>> address >>> > > > > > >>>>>>>>>>>>>>>>>>>> this to avoid feature creep. Our users >>> > > > > > >> are >>> > > > > > >>>>> also >>> > > > > > >>>>>>>>>>> interested >>> > > > > > >>>>>>>>>>>> in >>> > > > > > >>>>>>>>>>>>>> TTL >>> > > > > > >>>>>>>>>>>>>>>> so >>> > > > > > >>>>>>>>>>>>>>>>>>>> sooner or later we're going to expose it, >>> > > > > > >>>> this >>> > > > > > >>>>>>> is >>> > > > > > >>>>>>>>>> matter >>> > > > > > >>>>>>>>>>> of >>> > > > > > >>>>>>>>>>>>>>>>> scheduling. >>> > > > > > >>>>>>>>>>>>>>>>>>>> >>> > > > > > >>>>>>>>>>>>>>>>>>>>> 2. Adding a new connector with >>> > > > > > >>>>>>>> `savepoint-metadata` >>> > > > > > >>>>>>>>>>>>>>>>>>>> >>> > > > > > >>>>>>>>>>>>>>>>>>>> Not sure I understand your point at all >>> > > > > > >>>>> related >>> > > > > > >>>>>>>>>>>> StateCatalog. >>> > > > > > >>>>>>>>>>>>>>> First >>> > > > > > >>>>>>>>>>>>>>>>> of >>> > > > > > >>>>>>>>>>>>>>>>>>> all >>> > > > > > >>>>>>>>>>>>>>>>>>>> I can't agree more that StateCatalog is >>> > > > > > >>>> needed >>> > > > > > >>>>>>> and >>> > > > > > >>>>>>>>> is a >>> > > > > > >>>>>>>>>>>>> planned >>> > > > > > >>>>>>>>>>>>>>>>>> building >>> > > > > > >>>>>>>>>>>>>>>>>>>> block in an upcoming >>> > > > > > >>>>>>>>>>>>>>>>>>>> FLIP but not sure how can it help now? No >>> > > > > > >>>>> matter >>> > > > > > >>>>>>>>> what, >>> > > > > > >>>>>>>>>>> your >>> > > > > > >>>>>>>>>>>>>>>> knowledge >>> > > > > > >>>>>>>>>>>>>>>>>> is >>> > > > > > >>>>>>>>>>>>>>>>>>>> essential when we add StateCatalog. Let >>> > > > > > >> me >>> > > > > > >>>>>>> expose >>> > > > > > >>>>>>>> my >>> > > > > > >>>>>>>>>>>>>>> understanding >>> > > > > > >>>>>>>>>>>>>>>> in >>> > > > > > >>>>>>>>>>>>>>>>>>> this >>> > > > > > >>>>>>>>>>>>>>>>>>>> area: >>> > > > > > >>>>>>>>>>>>>>>>>>>> * First we need create table statements >>> > > > > > >> to >>> > > > > > >>>>>>> access >>> > > > > > >>>>>>>>> state >>> > > > > > >>>>>>>>>>>> data >>> > > > > > >>>>>>>>>>>>>> and >>> > > > > > >>>>>>>>>>>>>>>>>> metadata >>> > > > > > >>>>>>>>>>>>>>>>>>>> * When we have that then we can add >>> > > > > > >>>>> StateCatalog >>> > > > > > >>>>>>>>> which >>> > > > > > >>>>>>>>>>>> could >>> > > > > > >>>>>>>>>>>>>>>>>> potentially >>> > > > > > >>>>>>>>>>>>>>>>>>>> ease the life of users by for ex. giving >>> > > > > > >>>>>>>>> off-the-shelf >>> > > > > > >>>>>>>>>>>> tables >>> > > > > > >>>>>>>>>>>>>>>> without >>> > > > > > >>>>>>>>>>>>>>>>>>>> sweating with create table statements >>> > > > > > >>>>>>>>>>>>>>>>>>>> >>> > > > > > >>>>>>>>>>>>>>>>>>>> User expectations: >>> > > > > > >>>>>>>>>>>>>>>>>>>> * See state data (this is fulfilled with >>> > > > > > >>>> the >>> > > > > > >>>>>>>> existing >>> > > > > > >>>>>>>>>>>>>> connector) >>> > > > > > >>>>>>>>>>>>>>>>>>>> * See metadata about state data like TTL >>> > > > > > >>>> (this >>> > > > > > >>>>>>> can >>> > > > > > >>>>>>>> be >>> > > > > > >>>>>>>>>>> added >>> > > > > > >>>>>>>>>>>>> as >>> > > > > > >>>>>>>>>>>>>>>>> metadata >>> > > > > > >>>>>>>>>>>>>>>>>>>> column as you suggested since it belongs >>> > > > > > >> to >>> > > > > > >>>>> the >>> > > > > > >>>>>>>> data) >>> > > > > > >>>>>>>>>>>>>>>>>>>> * See metadata about operators (this can >>> > > > > > >> be >>> > > > > > >>>>>>> added >>> > > > > > >>>>>>>>> from >>> > > > > > >>>>>>>>>>>>>>>>>>> savepoint-metadata) >>> > > > > > >>>>>>>>>>>>>>>>>>>> >>> > > > > > >>>>>>>>>>>>>>>>>>>> Important to highlight that state data >>> > > > > > >>>> table >>> > > > > > >>>>>>> format >>> > > > > > >>>>>>>>>>> differs >>> > > > > > >>>>>>>>>>>>>> from >>> > > > > > >>>>>>>>>>>>>>>>> state >>> > > > > > >>>>>>>>>>>>>>>>>>>> metadata table format. Namely one table >>> > > > > > >> has >>> > > > > > >>>>> rows >>> > > > > > >>>>>>>> for >>> > > > > > >>>>>>>>>>> state >>> > > > > > >>>>>>>>>>>>>> values >>> > > > > > >>>>>>>>>>>>>>>> and >>> > > > > > >>>>>>>>>>>>>>>>>>>> another has rows for operators, right? >>> > > > > > >>>>>>>>>>>>>>>>>>>> I think that's the reason why you've >>> > > > > > >>>>> pinpointed >>> > > > > > >>>>>>> out >>> > > > > > >>>>>>>>>> that >>> > > > > > >>>>>>>>>>>> the >>> > > > > > >>>>>>>>>>>>>>>>> suggested >>> > > > > > >>>>>>>>>>>>>>>>>>>> metadata columns are somewhat clunky. >>> > > > > > >>>>>>>>>>>>>>>>>>>> >>> > > > > > >>>>>>>>>>>>>>>>>>>> As a conclusion I agree to add >>> > > > > > >>>>> ${state-name}_ttl >>> > > > > > >>>>>>>>>> metadata >>> > > > > > >>>>>>>>>>>>>> column >>> > > > > > >>>>>>>>>>>>>>>>> later >>> > > > > > >>>>>>>>>>>>>>>>>> on >>> > > > > > >>>>>>>>>>>>>>>>>>>> since it belongs to the state value and >>> > > > > > >>>>> adding a >>> > > > > > >>>>>>>> new >>> > > > > > >>>>>>>>>>> table >>> > > > > > >>>>>>>>>>>>> type >>> > > > > > >>>>>>>>>>>>>>>> (like >>> > > > > > >>>>>>>>>>>>>>>>>> you >>> > > > > > >>>>>>>>>>>>>>>>>>>> suggested similar to PG [1]) >>> > > > > > >>>>>>>>>>>>>>>>>>>> for metadata. Please see how Spark does >>> > > > > > >>>> that >>> > > > > > >>>>> too >>> > > > > > >>>>>>>> [2]. >>> > > > > > >>>>>>>>>>>>>>>>>>>> >>> > > > > > >>>>>>>>>>>>>>>>>>>> If you have better approach then please >>> > > > > > >>>>>>> elaborate >>> > > > > > >>>>>>>>> with >>> > > > > > >>>>>>>>>>> more >>> > > > > > >>>>>>>>>>>>>>> details >>> > > > > > >>>>>>>>>>>>>>>>> and >>> > > > > > >>>>>>>>>>>>>>>>>>>> help me to understand your point. >>> > > > > > >>>>>>>>>>>>>>>>>>>> >>> > > > > > >>>>>>>>>>>>>>>>>>>>> Up until now we've seen even in TB >>> > > > > > >>>>> savepoints >>> > > > > > >>>>>>>> that >>> > > > > > >>>>>>>>>> the >>> > > > > > >>>>>>>>>>>>> number >>> > > > > > >>>>>>>>>>>>>>> of >>> > > > > > >>>>>>>>>>>>>>>>> keys >>> > > > > > >>>>>>>>>>>>>>>>>>> can >>> > > > > > >>>>>>>>>>>>>>>>>>>>> be extremely huge but not the per key >>> > > > > > >>>> state >>> > > > > > >>>>>>>> itself. >>> > > > > > >>>>>>>>>>>>>>>>>>>>> But again, this is a good feature as-is >>> > > > > > >>>> and >>> > > > > > >>>>>>> can >>> > > > > > >>>>>>>> be >>> > > > > > >>>>>>>>>>>> handled >>> > > > > > >>>>>>>>>>>>>> in a >>> > > > > > >>>>>>>>>>>>>>>>>>> separate >>> > > > > > >>>>>>>>>>>>>>>>>>>>> jira. >>> > > > > > >>>>>>>>>>>>>>>>>>>> >>> > > > > > >>>>>>>>>>>>>>>>>>>> I've just created >>> > > > > > >>>>>>>>>>>>>>>> >>> > > > > > >> https://issues.apache.org/jira/browse/FLINK-37456. >>> > > > > > >>>>>>>>>>>>>>>>>>>> >>> > > > > > >>>>>>>>>>>>>>>>>>>> [1] >>> > > > > > >>>>>>>>>>>>>> >>> > > > > > >>>>> >>> https://www.postgresql.org/docs/current/view-pg-tables.html >>> > > > > > >>>>>>>>>>>>>>>>>>>> [2] >>> > > > > > >>>>>>>>>>>>>>>>>>>> >>> > > > > > >>>>>>>>>>>>>>>>>>>> >>> > > > > > >>>>>>>>>>>>>>>>>>> >>> > > > > > >>>>>>>>>>>>>>>>>> >>> > > > > > >>>>>>>>>>>>>>>>> >>> > > > > > >>>>>>>>>>>>>>>> >>> > > > > > >>>>>>>>>>>>>>> >>> > > > > > >>>>>>>>>>>>>> >>> > > > > > >>>>>>>>>>>>> >>> > > > > > >>>>>>>>>>>> >>> > > > > > >>>>>>>>>>> >>> > > > > > >>>>>>>>>> >>> > > > > > >>>>>>>>> >>> > > > > > >>>>>>>> >>> > > > > > >>>>>>> >>> > > > > > >>>>> >>> > > > > > >>>> >>> > > > > > >> >>> > > > > > >>> > > > > >>> > > > >>> > > >>> > >>> https://www.databricks.com/blog/announcing-state-reader-api-new-statestore-data-source >>> > > > > > >>>>>>>>>>>>>>>>>>>> >>> > > > > > >>>>>>>>>>>>>>>>>>>> BR, >>> > > > > > >>>>>>>>>>>>>>>>>>>> G >>> > > > > > >>>>>>>>>>>>>>>>>>>> >>> > > > > > >>>>>>>>>>>>>>>>>>>> >>> > > > > > >>>>>>>>>>>>>>>>>>>> On Tue, Mar 11, 2025 at 3:55β―AM Shengkai >>> > > > > > >>>> Fang >>> > > > > > >>>>> < >>> > > > > > >>>>>>>>>>>>>> fskm...@gmail.com >>> > > > > > >>>>>>>>>>>>>>>> >>> > > > > > >>>>>>>>>>>>>>>>>> wrote: >>> > > > > > >>>>>>>>>>>>>>>>>>>> >>> > > > > > >>>>>>>>>>>>>>>>>>>>> Hi, Gabor. Thanks for your response. >>> > > > > > >>>>>>>>>>>>>>>>>>>>> >>> > > > > > >>>>>>>>>>>>>>>>>>>>>> 1. State TTL for Value Columns >>> > > > > > >>>>>>>>>>>>>>>>>>>>> >>> > > > > > >>>>>>>>>>>>>>>>>>>>> Thank you for addressing the >>> > > > > > >> limitations >>> > > > > > >>>>> here. >>> > > > > > >>>>>>>>>>> However, I >>> > > > > > >>>>>>>>>>>>>>> believe >>> > > > > > >>>>>>>>>>>>>>>>> it >>> > > > > > >>>>>>>>>>>>>>>>>>>> would >>> > > > > > >>>>>>>>>>>>>>>>>>>>> be beneficial to further clarify the >>> > > > > > >> API >>> > > > > > >>>> in >>> > > > > > >>>>>>> this >>> > > > > > >>>>>>>>> FLIP >>> > > > > > >>>>>>>>>>>>>> regarding >>> > > > > > >>>>>>>>>>>>>>>> how >>> > > > > > >>>>>>>>>>>>>>>>>>> users >>> > > > > > >>>>>>>>>>>>>>>>>>>>> can specify the TTL column. >>> > > > > > >>>>>>>>>>>>>>>>>>>>> >>> > > > > > >>>>>>>>>>>>>>>>>>>>> One potential approach that comes to >>> > > > > > >>>> mind is >>> > > > > > >>>>>>>> using >>> > > > > > >>>>>>>>> a >>> > > > > > >>>>>>>>>>>>>>> standardized >>> > > > > > >>>>>>>>>>>>>>>>>>> naming >>> > > > > > >>>>>>>>>>>>>>>>>>>>> convention such as ${state-name}_ttl >>> > > > > > >> for >>> > > > > > >>>> the >>> > > > > > >>>>>>>>> metadata >>> > > > > > >>>>>>>>>>>>> column >>> > > > > > >>>>>>>>>>>>>>> that >>> > > > > > >>>>>>>>>>>>>>>>>>> defines >>> > > > > > >>>>>>>>>>>>>>>>>>>>> the TTL value. In terms of >>> > > > > > >>>> implementation, >>> > > > > > >>>>> the >>> > > > > > >>>>>>>>>>>>>>>> listReadableMetadata >>> > > > > > >>>>>>>>>>>>>>>>>>>>> function could: >>> > > > > > >>>>>>>>>>>>>>>>>>>>> >>> > > > > > >>>>>>>>>>>>>>>>>>>>> 1. Read the tableβs columns and >>> > > > > > >>>>> configuration, >>> > > > > > >>>>>>>>>>>>>>>>>>>>> 2. Extract all defined state names, and >>> > > > > > >>>>>>>>>>>>>>>>>>>>> 3. Return a structured list of metadata >>> > > > > > >>>>>>> entries >>> > > > > > >>>>>>>>>>> formatted >>> > > > > > >>>>>>>>>>>>> as >>> > > > > > >>>>>>>>>>>>>>>>>>>>> ${state-name}_ttl. >>> > > > > > >>>>>>>>>>>>>>>>>>>>> >>> > > > > > >>>>>>>>>>>>>>>>>>>>> WDYT? >>> > > > > > >>>>>>>>>>>>>>>>>>>>> >>> > > > > > >>>>>>>>>>>>>>>>>>>>>> 2. Adding a new connector with >>> > > > > > >>>>>>>>> `savepoint-metadata` >>> > > > > > >>>>>>>>>>>>>>>>>>>>> >>> > > > > > >>>>>>>>>>>>>>>>>>>>> Introducing a new connector type at >>> > > > > > >> this >>> > > > > > >>>>> stage >>> > > > > > >>>>>>>> may >>> > > > > > >>>>>>>>>>>>>>> unnecessarily >>> > > > > > >>>>>>>>>>>>>>>>>>>> complicate >>> > > > > > >>>>>>>>>>>>>>>>>>>>> the system. Given that every table >>> > > > > > >>>> already >>> > > > > > >>>>>>>> belongs >>> > > > > > >>>>>>>>>> to a >>> > > > > > >>>>>>>>>>>>>>> Catalog, >>> > > > > > >>>>>>>>>>>>>>>>>> which >>> > > > > > >>>>>>>>>>>>>>>>>>> is >>> > > > > > >>>>>>>>>>>>>>>>>>>>> designed to provide a Factory for >>> > > > > > >>>> building >>> > > > > > >>>>>>> source >>> > > > > > >>>>>>>>> or >>> > > > > > >>>>>>>>>>> sink >>> > > > > > >>>>>>>>>>>>>>>>>> connectors, I >>> > > > > > >>>>>>>>>>>>>>>>>>>>> propose integrating a dedicated >>> > > > > > >>>> StateCatalog >>> > > > > > >>>>>>>>> instead. >>> > > > > > >>>>>>>>>>>> This >>> > > > > > >>>>>>>>>>>>>>>> approach >>> > > > > > >>>>>>>>>>>>>>>>>>> would >>> > > > > > >>>>>>>>>>>>>>>>>>>>> allow us to: >>> > > > > > >>>>>>>>>>>>>>>>>>>>> >>> > > > > > >>>>>>>>>>>>>>>>>>>>> 1. Leverage the Catalogβs existing >>> > > > > > >>>>>>> capabilities >>> > > > > > >>>>>>>> to >>> > > > > > >>>>>>>>>>> manage >>> > > > > > >>>>>>>>>>>>> TTL >>> > > > > > >>>>>>>>>>>>>>>>>> metadata >>> > > > > > >>>>>>>>>>>>>>>>>>>>> (e.g., state names and TTL logic) >>> > > > > > >> without >>> > > > > > >>>>>>>>> duplicating >>> > > > > > >>>>>>>>>>>>>>>>> functionality. >>> > > > > > >>>>>>>>>>>>>>>>>>>>> 2. Provide a unified interface for >>> > > > > > >>>> connector >>> > > > > > >>>>>>>>>>>> instantiation >>> > > > > > >>>>>>>>>>>>>> and >>> > > > > > >>>>>>>>>>>>>>>>>> metadata >>> > > > > > >>>>>>>>>>>>>>>>>>>>> handling through the Catalogβs Factory >>> > > > > > >>>>>>> pattern. >>> > > > > > >>>>>>>>>>>>>>>>>>>>> >>> > > > > > >>>>>>>>>>>>>>>>>>>>> Would this design decision better align >>> > > > > > >>>> with >>> > > > > > >>>>>>> our >>> > > > > > >>>>>>>>>>>>>> architectureβs >>> > > > > > >>>>>>>>>>>>>>>>>>>>> extensibility and reduce redundancy? >>> > > > > > >>>>>>>>>>>>>>>>>>>>> >>> > > > > > >>>>>>>>>>>>>>>>>>>>>> Up until now we've seen even in TB >>> > > > > > >>>>>>> savepoints >>> > > > > > >>>>>>>>> that >>> > > > > > >>>>>>>>>>> the >>> > > > > > >>>>>>>>>>>>>> number >>> > > > > > >>>>>>>>>>>>>>>> of >>> > > > > > >>>>>>>>>>>>>>>>>> keys >>> > > > > > >>>>>>>>>>>>>>>>>>>> can >>> > > > > > >>>>>>>>>>>>>>>>>>>>>> be extremely huge but not the per key >>> > > > > > >>>>> state >>> > > > > > >>>>>>>>> itself. >>> > > > > > >>>>>>>>>>>>>>>>>>>>>> But again, this is a good feature >>> > > > > > >> as-is >>> > > > > > >>>>> and >>> > > > > > >>>>>>> can >>> > > > > > >>>>>>>>> be >>> > > > > > >>>>>>>>>>>>> handled >>> > > > > > >>>>>>>>>>>>>>> in a >>> > > > > > >>>>>>>>>>>>>>>>>>>> separate >>> > > > > > >>>>>>>>>>>>>>>>>>>>>> jira. >>> > > > > > >>>>>>>>>>>>>>>>>>>>> >>> > > > > > >>>>>>>>>>>>>>>>>>>>> +1 for a separate jira. >>> > > > > > >>>>>>>>>>>>>>>>>>>>> >>> > > > > > >>>>>>>>>>>>>>>>>>>>> Best, >>> > > > > > >>>>>>>>>>>>>>>>>>>>> Shengkai >>> > > > > > >>>>>>>>>>>>>>>>>>>>> >>> > > > > > >>>>>>>>>>>>>>>>>>>>> Gabor Somogyi < >>> > > > > > >> gabor.g.somo...@gmail.com >>> > > > > > >>>>> >>> > > > > > >>>>>>>>>>> δΊ2025εΉ΄3ζ10ζ₯ε¨δΈ >>> > > > > > >>>>>>>>>>>>>>> 19:05ειοΌ >>> > > > > > >>>>>>>>>>>>>>>>>>>>> >>> > > > > > >>>>>>>>>>>>>>>>>>>>>> Hi Shengkai, >>> > > > > > >>>>>>>>>>>>>>>>>>>>>> >>> > > > > > >>>>>>>>>>>>>>>>>>>>>> Please see my comments inline. >>> > > > > > >>>>>>>>>>>>>>>>>>>>>> >>> > > > > > >>>>>>>>>>>>>>>>>>>>>> BR, >>> > > > > > >>>>>>>>>>>>>>>>>>>>>> G >>> > > > > > >>>>>>>>>>>>>>>>>>>>>> >>> > > > > > >>>>>>>>>>>>>>>>>>>>>> >>> > > > > > >>>>>>>>>>>>>>>>>>>>>> On Mon, Mar 3, 2025 at 7:07β―AM >>> > > > > > >> Shengkai >>> > > > > > >>>>>>> Fang < >>> > > > > > >>>>>>>>>>>>>>>> fskm...@gmail.com> >>> > > > > > >>>>>>>>>>>>>>>>>>>> wrote: >>> > > > > > >>>>>>>>>>>>>>>>>>>>>> >>> > > > > > >>>>>>>>>>>>>>>>>>>>>>> Hi, Gabor. Thanks for your the >>> > > > > > >> FLIP. >>> > > > > > >>>> I >>> > > > > > >>>>>>> have >>> > > > > > >>>>>>>>> some >>> > > > > > >>>>>>>>>>>>>> questions >>> > > > > > >>>>>>>>>>>>>>>>> about >>> > > > > > >>>>>>>>>>>>>>>>>>> the >>> > > > > > >>>>>>>>>>>>>>>>>>>>>> FLIP: >>> > > > > > >>>>>>>>>>>>>>>>>>>>>>> >>> > > > > > >>>>>>>>>>>>>>>>>>>>>>> 1. State TTL for Value Columns >>> > > > > > >>>>>>>>>>>>>>>>>>>>>>> How can users retrieve the state >>> > > > > > >> TTL >>> > > > > > >>>>>>>>>> (Time-to-Live) >>> > > > > > >>>>>>>>>>>> for >>> > > > > > >>>>>>>>>>>>>>> each >>> > > > > > >>>>>>>>>>>>>>>>>> value >>> > > > > > >>>>>>>>>>>>>>>>>>>>>> column? >>> > > > > > >>>>>>>>>>>>>>>>>>>>>>> From my understanding of the >>> > > > > > >> current >>> > > > > > >>>>>>> design, >>> > > > > > >>>>>>>> it >>> > > > > > >>>>>>>>>>> seems >>> > > > > > >>>>>>>>>>>>>> that >>> > > > > > >>>>>>>>>>>>>>>> this >>> > > > > > >>>>>>>>>>>>>>>>>>>>>>> functionality is not supported. >>> > > > > > >> Could >>> > > > > > >>>>> you >>> > > > > > >>>>>>>>> clarify >>> > > > > > >>>>>>>>>>> if >>> > > > > > >>>>>>>>>>>>>> there >>> > > > > > >>>>>>>>>>>>>>>> are >>> > > > > > >>>>>>>>>>>>>>>>>>> plans >>> > > > > > >>>>>>>>>>>>>>>>>>>> to >>> > > > > > >>>>>>>>>>>>>>>>>>>>>>> address this limitation? >>> > > > > > >>>>>>>>>>>>>>>>>>>>>>> >>> > > > > > >>>>>>>>>>>>>>>>>>>>>> >>> > > > > > >>>>>>>>>>>>>>>>>>>>>> Since the state processor API is not >>> > > > > > >>>> yet >>> > > > > > >>>>>>>> exposing >>> > > > > > >>>>>>>>>>> this >>> > > > > > >>>>>>>>>>>>>>>>> information >>> > > > > > >>>>>>>>>>>>>>>>>>> this >>> > > > > > >>>>>>>>>>>>>>>>>>>>>> would require several steps. >>> > > > > > >>>>>>>>>>>>>>>>>>>>>> First, the state processor API >>> > > > > > >> support >>> > > > > > >>>>>>> needs to >>> > > > > > >>>>>>>>> be >>> > > > > > >>>>>>>>>>>> added >>> > > > > > >>>>>>>>>>>>>>> which >>> > > > > > >>>>>>>>>>>>>>>>> can >>> > > > > > >>>>>>>>>>>>>>>>>> be >>> > > > > > >>>>>>>>>>>>>>>>>>>>> then >>> > > > > > >>>>>>>>>>>>>>>>>>>>>> exposed on the SQL API. >>> > > > > > >>>>>>>>>>>>>>>>>>>>>> This is definitely a future >>> > > > > > >> improvement >>> > > > > > >>>>>>> which >>> > > > > > >>>>>>>> is >>> > > > > > >>>>>>>>>>> useful >>> > > > > > >>>>>>>>>>>>> and >>> > > > > > >>>>>>>>>>>>>>> can >>> > > > > > >>>>>>>>>>>>>>>>> be >>> > > > > > >>>>>>>>>>>>>>>>>>>>> handled >>> > > > > > >>>>>>>>>>>>>>>>>>>>>> in a separate jira. >>> > > > > > >>>>>>>>>>>>>>>>>>>>>> >>> > > > > > >>>>>>>>>>>>>>>>>>>>>> >>> > > > > > >>>>>>>>>>>>>>>>>>>>>>> 2. Metadata Table vs. Metadata >>> > > > > > >> Column >>> > > > > > >>>>>>>>>>>>>>>>>>>>>>> The metadata information described >>> > > > > > >> in >>> > > > > > >>>>> the >>> > > > > > >>>>>>>> FLIP >>> > > > > > >>>>>>>>>>>> appears >>> > > > > > >>>>>>>>>>>>> to >>> > > > > > >>>>>>>>>>>>>>> be >>> > > > > > >>>>>>>>>>>>>>>>>>> intended >>> > > > > > >>>>>>>>>>>>>>>>>>>>> to >>> > > > > > >>>>>>>>>>>>>>>>>>>>>>> describe the state files stored at >>> > > > > > >> a >>> > > > > > >>>>>>> specific >>> > > > > > >>>>>>>>>>>> location. >>> > > > > > >>>>>>>>>>>>>> To >>> > > > > > >>>>>>>>>>>>>>>> me, >>> > > > > > >>>>>>>>>>>>>>>>>> this >>> > > > > > >>>>>>>>>>>>>>>>>>>>>> concept >>> > > > > > >>>>>>>>>>>>>>>>>>>>>>> aligns more closely with system >>> > > > > > >>>> tables >>> > > > > > >>>>>>> like >>> > > > > > >>>>>>>>>>> pg_tables >>> > > > > > >>>>>>>>>>>>> in >>> > > > > > >>>>>>>>>>>>>>>>>> PostgreSQL >>> > > > > > >>>>>>>>>>>>>>>>>>>> [1] >>> > > > > > >>>>>>>>>>>>>>>>>>>>>> or >>> > > > > > >>>>>>>>>>>>>>>>>>>>>>> the INFORMATION_SCHEMA in MySQL >>> > > > > > >> [2]. >>> > > > > > >>>>>>>>>>>>>>>>>>>>>>> >>> > > > > > >>>>>>>>>>>>>>>>>>>>>> >>> > > > > > >>>>>>>>>>>>>>>>>>>>>> Adding a new connector with >>> > > > > > >>>>>>>> `savepoint-metadata` >>> > > > > > >>>>>>>>>> is a >>> > > > > > >>>>>>>>>>>>>>>> possibility >>> > > > > > >>>>>>>>>>>>>>>>>>> where >>> > > > > > >>>>>>>>>>>>>>>>>>>>> we >>> > > > > > >>>>>>>>>>>>>>>>>>>>>> can create such functionality. >>> > > > > > >>>>>>>>>>>>>>>>>>>>>> I'm not against that, just want to >>> > > > > > >>>> have a >>> > > > > > >>>>>>>> common >>> > > > > > >>>>>>>>>>>>> agreement >>> > > > > > >>>>>>>>>>>>>>> that >>> > > > > > >>>>>>>>>>>>>>>>> we >>> > > > > > >>>>>>>>>>>>>>>>>>>> would >>> > > > > > >>>>>>>>>>>>>>>>>>>>>> like to move that direction. >>> > > > > > >>>>>>>>>>>>>>>>>>>>>> (As a side note not just PG but Spark >>> > > > > > >>>> also >>> > > > > > >>>>>>> has >>> > > > > > >>>>>>>>>>> similar >>> > > > > > >>>>>>>>>>>>>>> approach >>> > > > > > >>>>>>>>>>>>>>>>>> and I >>> > > > > > >>>>>>>>>>>>>>>>>>>>>> basically like the idea). >>> > > > > > >>>>>>>>>>>>>>>>>>>>>> If we would go that direction >>> > > > > > >> savepoint >>> > > > > > >>>>>>>> metadata >>> > > > > > >>>>>>>>>> can >>> > > > > > >>>>>>>>>>> be >>> > > > > > >>>>>>>>>>>>>>> reached >>> > > > > > >>>>>>>>>>>>>>>>> in >>> > > > > > >>>>>>>>>>>>>>>>>> a >>> > > > > > >>>>>>>>>>>>>>>>>>>> way >>> > > > > > >>>>>>>>>>>>>>>>>>>>>> that one row would represent >>> > > > > > >>>>>>>>>>>>>>>>>>>>>> an operator with it's values >>> > > > > > >> something >>> > > > > > >>>>> like >>> > > > > > >>>>>>>> this: >>> > > > > > >>>>>>>>>>>>>>>>>>>>>> >>> > > > > > >>>>>>>>>>>>>>>>>>>>>> >>> > > > > > >>>>>>>>>>>>>>>>>>>>>> >>> > > > > > >>>>>>>>>>>>>>>>>>>>> >>> > > > > > >>>>>>>>>>>>>>>>>>>> >>> > > > > > >>>>>>>>>>>>>>>>>>> >>> > > > > > >>>>>>>>>>>>>>>>>> >>> > > > > > >>>>>>>>>>>>>>>>> >>> > > > > > >>>>>>>>>>>>>>>> >>> > > > > > >>>>>>>>>>>>>>> >>> > > > > > >>>>>>>>>>>>>> >>> > > > > > >>>>>>>>>>>>> >>> > > > > > >>>>>>>>>>>> >>> > > > > > >>>>>>>>>>> >>> > > > > > >>>>>>>>>> >>> > > > > > >>>>>>>>> >>> > > > > > >>>>>>>> >>> > > > > > >>>>>>> >>> > > > > > >>>>> >>> > > > > > >>>> >>> > > > > > >> >>> > > > > > >>> > > > > >>> > > > >>> > > >>> > >>> βββββββββββ¬ββββββββββ¬ββββββββββ¬ββββββββββ¬ββββββββββ¬ββββββββββ¬ββββββββββ¬βββββββββ >>> > > > > > >>>>>>>>>>>>>>>>>>>>>> >>> > > > > > >>>>>>>>>>>>>>>>>>>>>> >>> > > > > > >>>>>>>>>>>>>>>>>>>>> >>> > > > > > >>>>>>>>>>>>>>>>>>>> >>> > > > > > >>>>>>>>>>>>>>>>>>> >>> > > > > > >>>>>>>>>>>>>>>>>> >>> > > > > > >>>>>>>>>>>>>>>>> >>> > > > > > >>>>>>>>>>>>>>>> >>> > > > > > >>>>>>>>>>>>>>> >>> > > > > > >>>>>>>>>>>>>> >>> > > > > > >>>>>>>>>>>>> >>> > > > > > >>>>>>>>>>>> >>> > > > > > >>>>>>>>>>> >>> > > > > > >>>>>>>>>> >>> > > > > > >>>>>>>>> >>> > > > > > >>>>>>>> >>> > > > > > >>>>>>> >>> > > > > > >>>>> >>> > > > > > >>>> >>> > > > > > >> >>> > > > > > >>> > > > > >>> > > > >>> > > >>> > >>> βoperatorNβoperatorUβoperatorHβparalleliβmaxParallβsubtaskStβcoordinatβtotalStaβ >>> > > > > > >>>>>>>>>>>>>>>>>>>>>> βame βid βash βsm >>> > > > > > >>>>>>> βelism >>> > > > > > >>>>>>>>>>>>>>>>>>>>>> βatesCountβorStateSiβtesSizeIβ >>> > > > > > >>>>>>>>>>>>>>>>>>>>>> β β β β >>> > > > > > >>>> β >>> > > > > > >>>>>>>>> β >>> > > > > > >>>>>>>>>>>>>>>>>>>>>> βzeInBytesβnBytes β >>> > > > > > >>>>>>>>>>>>>>>>>>>>>> >>> > > > > > >>>>>>>>>>>>>>>>>>>>>> >>> > > > > > >>>>>>>>>>>>>>>>>>>>> >>> > > > > > >>>>>>>>>>>>>>>>>>>> >>> > > > > > >>>>>>>>>>>>>>>>>>> >>> > > > > > >>>>>>>>>>>>>>>>>> >>> > > > > > >>>>>>>>>>>>>>>>> >>> > > > > > >>>>>>>>>>>>>>>> >>> > > > > > >>>>>>>>>>>>>>> >>> > > > > > >>>>>>>>>>>>>> >>> > > > > > >>>>>>>>>>>>> >>> > > > > > >>>>>>>>>>>> >>> > > > > > >>>>>>>>>>> >>> > > > > > >>>>>>>>>> >>> > > > > > >>>>>>>>> >>> > > > > > >>>>>>>> >>> > > > > > >>>>>>> >>> > > > > > >>>>> >>> > > > > > >>>> >>> > > > > > >> >>> > > > > > >>> > > > > >>> > > > >>> > > >>> > >>> βββββββββββΌββββββββββΌββββββββββΌββββββββββΌββββββββββΌββββββββββΌββββββββββΌβββββββββ€ >>> > > > > > >>>>>>>>>>>>>>>>>>>>>> βSource: βdatagen-sβ47aee9439β2 >>> > > > > > >>>>> β128 >>> > > > > > >>>>>>>>>> β2 >>> > > > > > >>>>>>>>>>>>>>> β16 >>> > > > > > >>>>>>>>>>>>>>>>>>>>>> β546 β >>> > > > > > >>>>>>>>>>>>>>>>>>>>>> βdatagen-sβource-uidβ4d6ea26e2β >>> > > > > > >>>> β >>> > > > > > >>>>>>>>> β >>> > > > > > >>>>>>>>>>>>>> β >>> > > > > > >>>>>>>>>>>>>>>>>>> β >>> > > > > > >>>>>>>>>>>>>>>>>>>>>> β >>> > > > > > >>>>>>>>>>>>>>>>>>>>>> βource β βd544bef0aβ >>> > > > > > >>>> β >>> > > > > > >>>>>>>>> β >>> > > > > > >>>>>>>>>>>>>> β >>> > > > > > >>>>>>>>>>>>>>>>>>> β >>> > > > > > >>>>>>>>>>>>>>>>>>>>>> β >>> > > > > > >>>>>>>>>>>>>>>>>>>>>> β β β37bb5 β >>> > > > > > >>>> β >>> > > > > > >>>>>>>>> β >>> > > > > > >>>>>>>>>>>>>> β >>> > > > > > >>>>>>>>>>>>>>>>>>> β >>> > > > > > >>>>>>>>>>>>>>>>>>>>>> β >>> > > > > > >>>>>>>>>>>>>>>>>>>>>> >>> > > > > > >>>>>>>>>>>>>>>>>>>>>> >>> > > > > > >>>>>>>>>>>>>>>>>>>>> >>> > > > > > >>>>>>>>>>>>>>>>>>>> >>> > > > > > >>>>>>>>>>>>>>>>>>> >>> > > > > > >>>>>>>>>>>>>>>>>> >>> > > > > > >>>>>>>>>>>>>>>>> >>> > > > > > >>>>>>>>>>>>>>>> >>> > > > > > >>>>>>>>>>>>>>> >>> > > > > > >>>>>>>>>>>>>> >>> > > > > > >>>>>>>>>>>>> >>> > > > > > >>>>>>>>>>>> >>> > > > > > >>>>>>>>>>> >>> > > > > > >>>>>>>>>> >>> > > > > > >>>>>>>>> >>> > > > > > >>>>>>>> >>> > > > > > >>>>>>> >>> > > > > > >>>>> >>> > > > > > >>>> >>> > > > > > >> >>> > > > > > >>> > > > > >>> > > > >>> > > >>> > >>> βββββββββββΌββββββββββΌββββββββββΌββββββββββΌββββββββββΌββββββββββΌββββββββββΌβββββββββ€ >>> > > > > > >>>>>>>>>>>>>>>>>>>>>> βlong-udf-βlong-udf-β6ed3f40bfβ2 >>> > > > > > >>>>> β128 >>> > > > > > >>>>>>>>>> β2 >>> > > > > > >>>>>>>>>>>>>>> β0 >>> > > > > > >>>>>>>>>>>>>>>>>>>> β0 >>> > > > > > >>>>>>>>>>>>>>>>>>>>>> β >>> > > > > > >>>>>>>>>>>>>>>>>>>>>> βwith-mastβwith-mastβf3c8dfcdfβ >>> > > > > > >>>> β >>> > > > > > >>>>>>>>> β >>> > > > > > >>>>>>>>>>>>>> β >>> > > > > > >>>>>>>>>>>>>>>>>>> β >>> > > > > > >>>>>>>>>>>>>>>>>>>>>> β >>> > > > > > >>>>>>>>>>>>>>>>>>>>>> βer-hook βer-hook-uβcb95128a1β >>> > > > > > >>>> β >>> > > > > > >>>>>>>>> β >>> > > > > > >>>>>>>>>>>>>> β >>> > > > > > >>>>>>>>>>>>>>>>>>> β >>> > > > > > >>>>>>>>>>>>>>>>>>>>>> β >>> > > > > > >>>>>>>>>>>>>>>>>>>>>> β βid β018f1 β >>> > > > > > >>>> β >>> > > > > > >>>>>>>>> β >>> > > > > > >>>>>>>>>>>>>> β >>> > > > > > >>>>>>>>>>>>>>>>>>> β >>> > > > > > >>>>>>>>>>>>>>>>>>>>>> β >>> > > > > > >>>>>>>>>>>>>>>>>>>>>> >>> > > > > > >>>>>>>>>>>>>>>>>>>>>> >>> > > > > > >>>>>>>>>>>>>>>>>>>>> >>> > > > > > >>>>>>>>>>>>>>>>>>>> >>> > > > > > >>>>>>>>>>>>>>>>>>> >>> > > > > > >>>>>>>>>>>>>>>>>> >>> > > > > > >>>>>>>>>>>>>>>>> >>> > > > > > >>>>>>>>>>>>>>>> >>> > > > > > >>>>>>>>>>>>>>> >>> > > > > > >>>>>>>>>>>>>> >>> > > > > > >>>>>>>>>>>>> >>> > > > > > >>>>>>>>>>>> >>> > > > > > >>>>>>>>>>> >>> > > > > > >>>>>>>>>> >>> > > > > > >>>>>>>>> >>> > > > > > >>>>>>>> >>> > > > > > >>>>>>> >>> > > > > > >>>>> >>> > > > > > >>>> >>> > > > > > >> >>> > > > > > >>> > > > > >>> > > > >>> > > >>> > >>> βββββββββββΌββββββββββΌββββββββββΌββββββββββΌββββββββββΌββββββββββΌββββββββββΌβββββββββ€ >>> > > > > > >>>>>>>>>>>>>>>>>>>>>> βvalue-proβvalue-proβca4f5fe9aβ2 >>> > > > > > >>>>> β128 >>> > > > > > >>>>>>>>>> β2 >>> > > > > > >>>>>>>>>>>>>>> β0 >>> > > > > > >>>>>>>>>>>>>>>>>>>>>> β40726 β >>> > > > > > >>>>>>>>>>>>>>>>>>>>>> βcess βcess-uid β637b656f0β >>> > > > > > >>>> β >>> > > > > > >>>>>>>>> β >>> > > > > > >>>>>>>>>>>>>> β >>> > > > > > >>>>>>>>>>>>>>>>>>> β >>> > > > > > >>>>>>>>>>>>>>>>>>>>>> β >>> > > > > > >>>>>>>>>>>>>>>>>>>>>> β β β9ea78b3e7β >>> > > > > > >>>> β >>> > > > > > >>>>>>>>> β >>> > > > > > >>>>>>>>>>>>>> β >>> > > > > > >>>>>>>>>>>>>>>>>>> β >>> > > > > > >>>>>>>>>>>>>>>>>>>>>> β >>> > > > > > >>>>>>>>>>>>>>>>>>>>>> β β βa15b9 β >>> > > > > > >>>> β >>> > > > > > >>>>>>>>> β >>> > > > > > >>>>>>>>>>>>>> β >>> > > > > > >>>>>>>>>>>>>>>>>>> β >>> > > > > > >>>>>>>>>>>>>>>>>>>>>> β >>> > > > > > >>>>>>>>>>>>>>>>>>>>>> >>> > > > > > >>>>>>>>>>>>>>>>>>>>>> >>> > > > > > >>>>>>>>>>>>>>>>>>>>> >>> > > > > > >>>>>>>>>>>>>>>>>>>> >>> > > > > > >>>>>>>>>>>>>>>>>>> >>> > > > > > >>>>>>>>>>>>>>>>>> >>> > > > > > >>>>>>>>>>>>>>>>> >>> > > > > > >>>>>>>>>>>>>>>> >>> > > > > > >>>>>>>>>>>>>>> >>> > > > > > >>>>>>>>>>>>>> >>> > > > > > >>>>>>>>>>>>> >>> > > > > > >>>>>>>>>>>> >>> > > > > > >>>>>>>>>>> >>> > > > > > >>>>>>>>>> >>> > > > > > >>>>>>>>> >>> > > > > > >>>>>>>> >>> > > > > > >>>>>>> >>> > > > > > >>>>> >>> > > > > > >>>> >>> > > > > > >> >>> > > > > > >>> > > > > >>> > > > >>> > > >>> > >>> βββββββββββΌββββββββββΌββββββββββΌββββββββββΌββββββββββΌββββββββββΌββββββββββΌβββββββββ€ >>> > > > > > >>>>>>>>>>>>>>>>>>>>>> >>> > > > > > >>>>>>>>>>>>>>>>>>>>>> This table can then be joined with >>> > > > > > >> the >>> > > > > > >>>>>>> actually >>> > > > > > >>>>>>>>>>>> existing >>> > > > > > >>>>>>>>>>>>>>>>>> `savepoint` >>> > > > > > >>>>>>>>>>>>>>>>>>>>>> connector created tables based on UID >>> > > > > > >>>> hash >>> > > > > > >>>>>>>> (which >>> > > > > > >>>>>>>>>> is >>> > > > > > >>>>>>>>>>>>> unique >>> > > > > > >>>>>>>>>>>>>>> and >>> > > > > > >>>>>>>>>>>>>>>>>>> always >>> > > > > > >>>>>>>>>>>>>>>>>>>>>> exists). >>> > > > > > >>>>>>>>>>>>>>>>>>>>>> This would mean that the already >>> > > > > > >>>> existing >>> > > > > > >>>>>>> table >>> > > > > > >>>>>>>>>> would >>> > > > > > >>>>>>>>>>>>> need >>> > > > > > >>>>>>>>>>>>>>>> only a >>> > > > > > >>>>>>>>>>>>>>>>>>>> single >>> > > > > > >>>>>>>>>>>>>>>>>>>>>> metadata column which is the UID >>> > > > > > >> hash. >>> > > > > > >>>>>>>>>>>>>>>>>>>>>> WDYT? >>> > > > > > >>>>>>>>>>>>>>>>>>>>>> @zakelly, plz share your thoughts >>> > > > > > >> too. >>> > > > > > >>>>>>>>>>>>>>>>>>>>>> >>> > > > > > >>>>>>>>>>>>>>>>>>>>>> >>> > > > > > >>>>>>>>>>>>>>>>>>>>>>> If we opt to use metadata columns, >>> > > > > > >>>> every >>> > > > > > >>>>>>>> record >>> > > > > > >>>>>>>>>> in >>> > > > > > >>>>>>>>>>>> the >>> > > > > > >>>>>>>>>>>>>>> table >>> > > > > > >>>>>>>>>>>>>>>>>> would >>> > > > > > >>>>>>>>>>>>>>>>>>>> end >>> > > > > > >>>>>>>>>>>>>>>>>>>>> up >>> > > > > > >>>>>>>>>>>>>>>>>>>>>>> having identical values for these >>> > > > > > >>>>> columns >>> > > > > > >>>>>>>>> (please >>> > > > > > >>>>>>>>>>>>> correct >>> > > > > > >>>>>>>>>>>>>>> me >>> > > > > > >>>>>>>>>>>>>>>> if >>> > > > > > >>>>>>>>>>>>>>>>>> Iβm >>> > > > > > >>>>>>>>>>>>>>>>>>>>>>> mistaken). On the other hand, the >>> > > > > > >>>> state >>> > > > > > >>>>>>>>> connector >>> > > > > > >>>>>>>>>>>>>> requires >>> > > > > > >>>>>>>>>>>>>>>>> users >>> > > > > > >>>>>>>>>>>>>>>>>> to >>> > > > > > >>>>>>>>>>>>>>>>>>>>>> specify >>> > > > > > >>>>>>>>>>>>>>>>>>>>>>> an operator UID or operator UID >>> > > > > > >> hash, >>> > > > > > >>>>>>> after >>> > > > > > >>>>>>>>> which >>> > > > > > >>>>>>>>>>> it >>> > > > > > >>>>>>>>>>>>>>> outputs >>> > > > > > >>>>>>>>>>>>>>>>>>>>> user-defined >>> > > > > > >>>>>>>>>>>>>>>>>>>>>>> values in its records. This >>> > > > > > >> approach >>> > > > > > >>>>> feels >>> > > > > > >>>>>>>>>> somewhat >>> > > > > > >>>>>>>>>>>>>>> redundant >>> > > > > > >>>>>>>>>>>>>>>>> to >>> > > > > > >>>>>>>>>>>>>>>>>>> me. >>> > > > > > >>>>>>>>>>>>>>>>>>>>>>> >>> > > > > > >>>>>>>>>>>>>>>>>>>>>> >>> > > > > > >>>>>>>>>>>>>>>>>>>>>> If we would add a new >>> > > > > > >>>> `savepoint-metadata` >>> > > > > > >>>>>>>>>> connector >>> > > > > > >>>>>>>>>>>> then >>> > > > > > >>>>>>>>>>>>>>> this >>> > > > > > >>>>>>>>>>>>>>>>> can >>> > > > > > >>>>>>>>>>>>>>>>>> be >>> > > > > > >>>>>>>>>>>>>>>>>>>>>> addressed. >>> > > > > > >>>>>>>>>>>>>>>>>>>>>> On the other hand UID and UID hash >>> > > > > > >> are >>> > > > > > >>>>>>> having >>> > > > > > >>>>>>>>>>> either-or >>> > > > > > >>>>>>>>>>>>>>>>>> relationship >>> > > > > > >>>>>>>>>>>>>>>>>>>> from >>> > > > > > >>>>>>>>>>>>>>>>>>>>>> config perspective, >>> > > > > > >>>>>>>>>>>>>>>>>>>>>> so when a user provides the UID then >>> > > > > > >>>>> he/she >>> > > > > > >>>>>>> can >>> > > > > > >>>>>>>>> be >>> > > > > > >>>>>>>>>>>>>> interested >>> > > > > > >>>>>>>>>>>>>>>> in >>> > > > > > >>>>>>>>>>>>>>>>>> the >>> > > > > > >>>>>>>>>>>>>>>>>>>> hash >>> > > > > > >>>>>>>>>>>>>>>>>>>>>> for further calculations >>> > > > > > >>>>>>>>>>>>>>>>>>>>>> (the whole Flink internals are >>> > > > > > >>>> depending >>> > > > > > >>>>> on >>> > > > > > >>>>>>> the >>> > > > > > >>>>>>>>>>> hash). >>> > > > > > >>>>>>>>>>>>>>> Printing >>> > > > > > >>>>>>>>>>>>>>>>> out >>> > > > > > >>>>>>>>>>>>>>>>>>> the >>> > > > > > >>>>>>>>>>>>>>>>>>>>>> human readable UID >>> > > > > > >>>>>>>>>>>>>>>>>>>>>> is an explicit requirement from the >>> > > > > > >>>> user >>> > > > > > >>>>>>> side >>> > > > > > >>>>>>>>>> because >>> > > > > > >>>>>>>>>>>>>> hashes >>> > > > > > >>>>>>>>>>>>>>>> are >>> > > > > > >>>>>>>>>>>>>>>>>> not >>> > > > > > >>>>>>>>>>>>>>>>>>>>> human >>> > > > > > >>>>>>>>>>>>>>>>>>>>>> readable. >>> > > > > > >>>>>>>>>>>>>>>>>>>>>> >>> > > > > > >>>>>>>>>>>>>>>>>>>>>> >>> > > > > > >>>>>>>>>>>>>>>>>>>>>>> 3. Handling LIST and MAP States in >>> > > > > > >>>> the >>> > > > > > >>>>>>> State >>> > > > > > >>>>>>>>>>>> Connector >>> > > > > > >>>>>>>>>>>>>>>>>>>>>>> I have concerns about how the >>> > > > > > >> current >>> > > > > > >>>>>>> design >>> > > > > > >>>>>>>>>>> handles >>> > > > > > >>>>>>>>>>>>> LIST >>> > > > > > >>>>>>>>>>>>>>> and >>> > > > > > >>>>>>>>>>>>>>>>> MAP >>> > > > > > >>>>>>>>>>>>>>>>>>>>> states. >>> > > > > > >>>>>>>>>>>>>>>>>>>>>>> Specifically, the state connector >>> > > > > > >>>> uses >>> > > > > > >>>>>>> Flink >>> > > > > > >>>>>>>>>> SQLβs >>> > > > > > >>>>>>>>>>>> MAP >>> > > > > > >>>>>>>>>>>>>> and >>> > > > > > >>>>>>>>>>>>>>>>> ARRAY >>> > > > > > >>>>>>>>>>>>>>>>>>>> types, >>> > > > > > >>>>>>>>>>>>>>>>>>>>>>> which implies that it attempts to >>> > > > > > >>>> load >>> > > > > > >>>>>>> entire >>> > > > > > >>>>>>>>> MAP >>> > > > > > >>>>>>>>>>> or >>> > > > > > >>>>>>>>>>>>> LIST >>> > > > > > >>>>>>>>>>>>>>>>> states >>> > > > > > >>>>>>>>>>>>>>>>>>> into >>> > > > > > >>>>>>>>>>>>>>>>>>>>>>> memory. >>> > > > > > >>>>>>>>>>>>>>>>>>>>>>> >>> > > > > > >>>>>>>>>>>>>>>>>>>>>>> However, in many real-world >>> > > > > > >>>> scenarios, >>> > > > > > >>>>>>> these >>> > > > > > >>>>>>>>>> states >>> > > > > > >>>>>>>>>>>> can >>> > > > > > >>>>>>>>>>>>>>> grow >>> > > > > > >>>>>>>>>>>>>>>>> very >>> > > > > > >>>>>>>>>>>>>>>>>>>>> large. >>> > > > > > >>>>>>>>>>>>>>>>>>>>>>> Typically, the state API addresses >>> > > > > > >>>> this >>> > > > > > >>>>> by >>> > > > > > >>>>>>>>>>> providing >>> > > > > > >>>>>>>>>>>> an >>> > > > > > >>>>>>>>>>>>>>>>> iterator >>> > > > > > >>>>>>>>>>>>>>>>>> to >>> > > > > > >>>>>>>>>>>>>>>>>>>>>>> traverse elements within the state >>> > > > > > >>>>>>>>> incrementally. >>> > > > > > >>>>>>>>>>> Iβm >>> > > > > > >>>>>>>>>>>>>>> unsure >>> > > > > > >>>>>>>>>>>>>>>>>>> whether >>> > > > > > >>>>>>>>>>>>>>>>>>>>> Iβve >>> > > > > > >>>>>>>>>>>>>>>>>>>>>>> missed something in FLIP-496 or >>> > > > > > >>>>> FLIP-512, >>> > > > > > >>>>>>> but >>> > > > > > >>>>>>>>> it >>> > > > > > >>>>>>>>>>>> seems >>> > > > > > >>>>>>>>>>>>>> that >>> > > > > > >>>>>>>>>>>>>>>> the >>> > > > > > >>>>>>>>>>>>>>>>>>>> current >>> > > > > > >>>>>>>>>>>>>>>>>>>>>>> design might struggle with >>> > > > > > >>>> scalability >>> > > > > > >>>>> in >>> > > > > > >>>>>>>> such >>> > > > > > >>>>>>>>>>> cases. >>> > > > > > >>>>>>>>>>>>>>>>>>>>>>> >>> > > > > > >>>>>>>>>>>>>>>>>>>>>> >>> > > > > > >>>>>>>>>>>>>>>>>>>>>> You see it good, the current >>> > > > > > >>>>> implementation >>> > > > > > >>>>>>>> keeps >>> > > > > > >>>>>>>>>>> state >>> > > > > > >>>>>>>>>>>>>> for a >>> > > > > > >>>>>>>>>>>>>>>>>> single >>> > > > > > >>>>>>>>>>>>>>>>>>>> key >>> > > > > > >>>>>>>>>>>>>>>>>>>>> in >>> > > > > > >>>>>>>>>>>>>>>>>>>>>> memory. >>> > > > > > >>>>>>>>>>>>>>>>>>>>>> Back in the days we've considered >>> > > > > > >> this >>> > > > > > >>>>>>>> potential >>> > > > > > >>>>>>>>>>> issue >>> > > > > > >>>>>>>>>>>>> and >>> > > > > > >>>>>>>>>>>>>>>>>> concluded >>> > > > > > >>>>>>>>>>>>>>>>>>>> that >>> > > > > > >>>>>>>>>>>>>>>>>>>>>> this is not necessarily >>> > > > > > >>>>>>>>>>>>>>>>>>>>>> needed for the initial version and >>> > > > > > >> can >>> > > > > > >>>> be >>> > > > > > >>>>>>> done >>> > > > > > >>>>>>>>> as a >>> > > > > > >>>>>>>>>>>> later >>> > > > > > >>>>>>>>>>>>>>>>>>> improvement. >>> > > > > > >>>>>>>>>>>>>>>>>>>>>> >>> > > > > > >>>>>>>>>>>>>>>>>>>>>> Up until now we've seen even in TB >>> > > > > > >>>>>>> savepoints >>> > > > > > >>>>>>>>> that >>> > > > > > >>>>>>>>>>> the >>> > > > > > >>>>>>>>>>>>>> number >>> > > > > > >>>>>>>>>>>>>>>> of >>> > > > > > >>>>>>>>>>>>>>>>>> keys >>> > > > > > >>>>>>>>>>>>>>>>>>>> can >>> > > > > > >>>>>>>>>>>>>>>>>>>>>> be extremely huge but not the per key >>> > > > > > >>>>> state >>> > > > > > >>>>>>>>> itself. >>> > > > > > >>>>>>>>>>>>>>>>>>>>>> But again, this is a good feature >>> > > > > > >> as-is >>> > > > > > >>>>> and >>> > > > > > >>>>>>> can >>> > > > > > >>>>>>>>> be >>> > > > > > >>>>>>>>>>>>> handled >>> > > > > > >>>>>>>>>>>>>>> in a >>> > > > > > >>>>>>>>>>>>>>>>>>>> separate >>> > > > > > >>>>>>>>>>>>>>>>>>>>>> jira. >>> > > > > > >>>>>>>>>>>>>>>>>>>>>> >>> > > > > > >>>>>>>>>>>>>>>>>>>>>> >>> > > > > > >>>>>>>>>>>>>>>>>>>>>>> >>> > > > > > >>>>>>>>>>>>>>>>>>>>>>> Best, >>> > > > > > >>>>>>>>>>>>>>>>>>>>>>> Shengkai >>> > > > > > >>>>>>>>>>>>>>>>>>>>>>> >>> > > > > > >>>>>>>>>>>>>>>>>>>>>>> [1] >>> > > > > > >>>>>>>>>>>>>>>>> >>> > > > > > >>>>>>>>> >>> > > https://www.postgresql.org/docs/current/view-pg-tables.html >>> > > > > > >>>>>>>>>>>>>>>>>>>>>>> [2] >>> > > > > > >>>>>>>>>>>>>>>>>>>>>>> >>> > > > > > >>>>>>>>>>>>>>>>>>>>>>> >>> > > > > > >>>>>>>>>>>>>>>>>>>>>> >>> > > > > > >>>>>>>>>>>>>>>>>>>>> >>> > > > > > >>>>>>>>>>>>>>>>>>>> >>> > > > > > >>>>>>>>>>>>>>>>>>> >>> > > > > > >>>>>>>>>>>>>>>>>> >>> > > > > > >>>>>>>>>>>>>>>>> >>> > > > > > >>>>>>>>>>>>>>>> >>> > > > > > >>>>>>>>>>>>>>> >>> > > > > > >>>>>>>>>>>>>> >>> > > > > > >>>>>>>>>>>>> >>> > > > > > >>>>>>>>>>>> >>> > > > > > >>>>>>>>>>> >>> > > > > > >>>>>>>>>> >>> > > > > > >>>>>>>>> >>> > > > > > >>>>>>>> >>> > > > > > >>>>>>> >>> > > > > > >>>>> >>> > > > > > >>>> >>> > > > > > >> >>> > > > > > >>> > > > > >>> > > > >>> > > >>> > >>> https://dev.mysql.com/doc/refman/8.4/en/information-schema-tables-table.html >>> > > > > > >>>>>>>>>>>>>>>>>>>>>>> >>> > > > > > >>>>>>>>>>>>>>>>>>>>>>> Gabor Somogyi < >>> > > > > > >>>>> gabor.g.somo...@gmail.com> >>> > > > > > >>>>>>>>>>>> δΊ2025εΉ΄3ζ3ζ₯ε¨δΈ >>> > > > > > >>>>>>>>>>>>>>>>> 02:00ειοΌ >>> > > > > > >>>>>>>>>>>>>>>>>>>>>>> >>> > > > > > >>>>>>>>>>>>>>>>>>>>>>>> Hi Zakelly, >>> > > > > > >>>>>>>>>>>>>>>>>>>>>>>> >>> > > > > > >>>>>>>>>>>>>>>>>>>>>>>> In order to shoot for simplicity >>> > > > > > >>>>>>> `METADATA >>> > > > > > >>>>>>>>>>> VIRTUAL` >>> > > > > > >>>>>>>>>>>>> as >>> > > > > > >>>>>>>>>>>>>>> key >>> > > > > > >>>>>>>>>>>>>>>>>> words >>> > > > > > >>>>>>>>>>>>>>>>>>>> for >>> > > > > > >>>>>>>>>>>>>>>>>>>>>>>> definition is the target. >>> > > > > > >>>>>>>>>>>>>>>>>>>>>>>> When it's not super complex the >>> > > > > > >>>> latter >>> > > > > > >>>>>>> can >>> > > > > > >>>>>>>> be >>> > > > > > >>>>>>>>>>> added >>> > > > > > >>>>>>>>>>>>>> too. >>> > > > > > >>>>>>>>>>>>>>>>>>>>>>>> >>> > > > > > >>>>>>>>>>>>>>>>>>>>>>>> BR, >>> > > > > > >>>>>>>>>>>>>>>>>>>>>>>> G >>> > > > > > >>>>>>>>>>>>>>>>>>>>>>>> >>> > > > > > >>>>>>>>>>>>>>>>>>>>>>>> >>> > > > > > >>>>>>>>>>>>>>>>>>>>>>>> On Sun, Mar 2, 2025 at 3:37β―PM >>> > > > > > >>>> Zakelly >>> > > > > > >>>>>>> Lan >>> > > > > > >>>>>>>> < >>> > > > > > >>>>>>>>>>>>>>>>>>> zakelly....@gmail.com> >>> > > > > > >>>>>>>>>>>>>>>>>>>>>>> wrote: >>> > > > > > >>>>>>>>>>>>>>>>>>>>>>>> >>> > > > > > >>>>>>>>>>>>>>>>>>>>>>>>> Hi Gabor, >>> > > > > > >>>>>>>>>>>>>>>>>>>>>>>>> >>> > > > > > >>>>>>>>>>>>>>>>>>>>>>>>> +1 for this. >>> > > > > > >>>>>>>>>>>>>>>>>>>>>>>>> >>> > > > > > >>>>>>>>>>>>>>>>>>>>>>>>> Will the metadata column use >>> > > > > > >>>>> `METADATA >>> > > > > > >>>>>>>>>> VIRTUAL` >>> > > > > > >>>>>>>>>>>> as >>> > > > > > >>>>>>>>>>>>>> key >>> > > > > > >>>>>>>>>>>>>>>>> words >>> > > > > > >>>>>>>>>>>>>>>>>>> for >>> > > > > > >>>>>>>>>>>>>>>>>>>>>>>>> definition, or `METADATA FROM >>> > > > > > >> xxx >>> > > > > > >>>>>>>> VIRTUAL` >>> > > > > > >>>>>>>>>> for >>> > > > > > >>>>>>>>>>>>>>> renaming, >>> > > > > > >>>>>>>>>>>>>>>>> just >>> > > > > > >>>>>>>>>>>>>>>>>>>> like >>> > > > > > >>>>>>>>>>>>>>>>>>>>>> the >>> > > > > > >>>>>>>>>>>>>>>>>>>>>>>>> Kafka table? >>> > > > > > >>>>>>>>>>>>>>>>>>>>>>>>> >>> > > > > > >>>>>>>>>>>>>>>>>>>>>>>>> >>> > > > > > >>>>>>>>>>>>>>>>>>>>>>>>> Best, >>> > > > > > >>>>>>>>>>>>>>>>>>>>>>>>> Zakelly >>> > > > > > >>>>>>>>>>>>>>>>>>>>>>>>> >>> > > > > > >>>>>>>>>>>>>>>>>>>>>>>>> On Sat, Mar 1, 2025 at 1:31β―PM >>> > > > > > >>>> Gabor >>> > > > > > >>>>>>>>> Somogyi >>> > > > > > >>>>>>>>>> < >>> > > > > > >>>>>>>>>>>>>>>>>>>>>>> gabor.g.somo...@gmail.com> >>> > > > > > >>>>>>>>>>>>>>>>>>>>>>>>> wrote: >>> > > > > > >>>>>>>>>>>>>>>>>>>>>>>>> >>> > > > > > >>>>>>>>>>>>>>>>>>>>>>>>>> Hi All, >>> > > > > > >>>>>>>>>>>>>>>>>>>>>>>>>> >>> > > > > > >>>>>>>>>>>>>>>>>>>>>>>>>> I'd like to start a >>> > > > > > >> discussion >>> > > > > > >>>> of >>> > > > > > >>>>>>>>> FLIP-512: >>> > > > > > >>>>>>>>>>> Add >>> > > > > > >>>>>>>>>>>>>> meta >>> > > > > > >>>>>>>>>>>>>>>>>>>> information >>> > > > > > >>>>>>>>>>>>>>>>>>>>> to >>> > > > > > >>>>>>>>>>>>>>>>>>>>>>> SQL >>> > > > > > >>>>>>>>>>>>>>>>>>>>>>>>>> state connector [1]. >>> > > > > > >>>>>>>>>>>>>>>>>>>>>>>>>> Feel free to add your >>> > > > > > >> thoughts >>> > > > > > >>>> to >>> > > > > > >>>>>>> make >>> > > > > > >>>>>>>>> this >>> > > > > > >>>>>>>>>>>>> feature >>> > > > > > >>>>>>>>>>>>>>>>> better. >>> > > > > > >>>>>>>>>>>>>>>>>>>>>>>>>> >>> > > > > > >>>>>>>>>>>>>>>>>>>>>>>>>> [1] >>> > > > > > >>>>>>>>>>>>>>>>>>>>>>>>>> >>> > > > > > >>>>>>>>>>>>>>>>>>>>>>>>>> >>> > > > > > >>>>>>>>>>>>>>>>>>>>>>>>> >>> > > > > > >>>>>>>>>>>>>>>>>>>>>>>> >>> > > > > > >>>>>>>>>>>>>>>>>>>>>>> >>> > > > > > >>>>>>>>>>>>>>>>>>>>>> >>> > > > > > >>>>>>>>>>>>>>>>>>>>> >>> > > > > > >>>>>>>>>>>>>>>>>>>> >>> > > > > > >>>>>>>>>>>>>>>>>>> >>> > > > > > >>>>>>>>>>>>>>>>>> >>> > > > > > >>>>>>>>>>>>>>>>> >>> > > > > > >>>>>>>>>>>>>>>> >>> > > > > > >>>>>>>>>>>>>>> >>> > > > > > >>>>>>>>>>>>>> >>> > > > > > >>>>>>>>>>>>> >>> > > > > > >>>>>>>>>>>> >>> > > > > > >>>>>>>>>>> >>> > > > > > >>>>>>>>>> >>> > > > > > >>>>>>>>> >>> > > > > > >>>>>>>> >>> > > > > > >>>>>>> >>> > > > > > >>>>> >>> > > > > > >>>> >>> > > > > > >> >>> > > > > > >>> > > > > >>> > > > >>> > > >>> > >>> https://cwiki.apache.org/confluence/display/FLINK/FLIP-512%3A+Add+meta+information+to+SQL+state+connector >>> > > > > > >>>>>>>>>>>>>>>>>>>>>>>>>> >>> > > > > > >>>>>>>>>>>>>>>>>>>>>>>>>> BR, >>> > > > > > >>>>>>>>>>>>>>>>>>>>>>>>>> G >>> > > > > > >>>>>>>>>>>>>>>>>>>>>>>>>> >>> > > > > > >>>>>>>>>>>>>>>>>>>>>>>>> >>> > > > > > >>>>>>>>>>>>>>>>>>>>>>>> >>> > > > > > >>>>>>>>>>>>>>>>>>>>>>> >>> > > > > > >>>>>>>>>>>>>>>>>>>>>> >>> > > > > > >>>>>>>>>>>>>>>>>>>>> >>> > > > > > >>>>>>>>>>>>>>>>>>>> >>> > > > > > >>>>>>>>>>>>>>>>>>> >>> > > > > > >>>>>>>>>>>>>>>>>> >>> > > > > > >>>>>>>>>>>>>>>>> >>> > > > > > >>>>>>>>>>>>>>>> >>> > > > > > >>>>>>>>>>>>>>> >>> > > > > > >>>>>>>>>>>>>> >>> > > > > > >>>>>>>>>>>>> >>> > > > > > >>>>>>>>>>>> >>> > > > > > >>>>>>>>>>> >>> > > > > > >>>>>>>>>> >>> > > > > > >>>>>>>>> >>> > > > > > >>>>>>>> >>> > > > > > >>>>>>> >>> > > > > > >>>>>> >>> > > > > > >>>>> >>> > > > > > >>>> >>> > > > > > >>> >>> > > > > > >> >>> > > > > > >>> > > > > > >>> > > > > >>> > > > >>> > > >>> > >>> >>