Hi all, Given the simplicity, I also +1 for PTF or any other function implementation if PTF is not applicable for this.
I would like to raise a consideration regarding the usage implementation: > Would it be necessary to allow users to utilize the CREATE FUNCTION > statement for registering the PTF? I'd also suggest we make it built-in without registration. Currently, Flink SQL supports letting external systems register modules and > leverage these modules to centrally manage all function definitions. Given > this architectural approach, I’m curious if the plan involves introducing > additional functions in the future. If so, I would advocate for introducing > a dedicated state module to centralize such management. This would empower > users to: I can’t think of any further functions for now, but I'd +1 for a module if it could omit the registration. Best, Zakelly. On Fri, Mar 28, 2025 at 10:25 AM Shengkai Fang <fskm...@gmail.com> wrote: > One more question about the FLIP. > > I think the output schema is definitely a public API to users. If users > use the `CREATE FUNCTION` statement, is it means the class path is also a > public API to users. Alternatively, this is merely an experimental feature > and we don't have any promise about this function. > > Best, > Shengkai > > Shengkai Fang <fskm...@gmail.com> 于2025年3月28日周五 10:20写道: > >> +1 to use PTF. >> >> I would like to raise a consideration regarding the usage implementation: >> Would it be necessary to allow users to utilize the CREATE FUNCTION >> statement for registering the PTF? >> >> Currently, Flink SQL supports letting external systems register modules >> and leverage these modules to centrally manage all function definitions. >> Given this architectural approach, I’m curious if the plan involves >> introducing additional functions in the future. If so, I would advocate for >> introducing a dedicated state module to centralize such management. This >> would empower users to: >> >> 1. Simply execute the LOAD MODULE command to load the required module, and >> 2. Directly invoke read_metadata thereafter. >> >> For more details about the module, please refer to this document[1]. >> >> Best, >> Shengkai >> >> [1] >> https://nightlies.apache.org/flink/flink-docs-master/docs/dev/table/modules/ >> >> Gabor Somogyi <gabor.g.somo...@gmail.com> 于2025年3月28日周五 00:26写道: >> >>> Just found out that PTF in batch mode is not supported, plz see the dev >>> mailing about it [1]. >>> >>> [1] https://lists.apache.org/thread/ytm9m1qt4pq2q2gjngfktrn8vrlvkf07 >>> >>> BR, >>> G >>> >>> >>> On Thu, Mar 27, 2025 at 3:38 PM Gabor Somogyi <gabor.g.somo...@gmail.com >>> > >>> wrote: >>> >>> > In the meantime I've just updated the FLIP according to this to be >>> > optimistic 🙂 >>> > >>> > BR, >>> > G >>> > >>> > On Thu, Mar 27, 2025 at 2:15 PM Gabor Somogyi < >>> gabor.g.somo...@gmail.com> >>> > wrote: >>> > >>> >> Considering all the facts I also +1 on PTF. Even if something is >>> missing >>> >> we can add later. >>> >> >>> >> @Zakelly Lan <zakelly....@gmail.com> @Shengkai Fang are you also on >>> the >>> >> same page or have something to add? >>> >> >>> >> BR, >>> >> G >>> >> >>> >> >>> >> On Thu, Mar 27, 2025 at 1:50 PM Lincoln Lee <lincoln.8...@gmail.com> >>> >> wrote: >>> >> >>> >>> +1 for PTF >>> >>> >>> >>> > Is it possible to describe such function to see the column >>> names/types? >>> >>> >>> >>> Although Flink SQL does not directly support this feature, users can >>> >>> achieve >>> >>> similar results with the help of `explain` syntax, e.g. >>> >>> 'explain select * from read_state_metadata(...)' >>> >>> >>> >>> >>> >>> Best, >>> >>> Lincoln Lee >>> >>> >>> >>> >>> >>> Gyula Fóra <gyula.f...@gmail.com> 于2025年3月27日周四 20:41写道: >>> >>> >>> >>> > Hey! >>> >>> > >>> >>> > I think the PTF approach strikes a great balance in simplicity and >>> the >>> >>> > capabilities that we get out of it. >>> >>> > >>> >>> > I think this could be a completely viable alternative to the >>> dedicated >>> >>> > connector, +1. >>> >>> > >>> >>> > Cheers, >>> >>> > Gyula >>> >>> > >>> >>> > On Thu, Mar 27, 2025 at 10:37 AM Shengkai Fang <fskm...@gmail.com> >>> >>> wrote: >>> >>> > >>> >>> > > Hi, Gabor. >>> >>> > > >>> >>> > > > Do I understand correctly that this is 2.x only feature and we >>> >>> can't >>> >>> > > backport it to 1.x line >>> >>> > > >>> >>> > > Yes. PTF is only supported in 2.x verison. >>> >>> > > >>> >>> > > > Is it possible to describe such function to see the column >>> >>> names/types? >>> >>> > > >>> >>> > > Flink SQL doesn't support this feature, but postgres[2] or >>> mysql[1] >>> >>> has >>> >>> > > similar feature. >>> >>> > > >>> >>> > > [1] >>> >>> https://dev.mysql.com/doc/refman/8.4/en/show-create-procedure.html >>> >>> > > [2] >>> >>> > > >>> >>> > > >>> >>> > >>> >>> >>> https://stackoverflow.com/questions/6898453/show-the-code-of-a-function-procedure-and-trigger-in-postgresql >>> >>> > > >>> >>> > > Best, >>> >>> > > Shengkai >>> >>> > > >>> >>> > > >>> >>> > > Gabor Somogyi <gabor.g.somo...@gmail.com> 于2025年3月27日周四 16:25写道: >>> >>> > > >>> >>> > > > Hi Shengkai, >>> >>> > > > >>> >>> > > > Thanks for your effort with the example, this looks promising. >>> >>> > > > I like the fact that users wouldn't need to sweat with complex >>> >>> create >>> >>> > > table >>> >>> > > > statements. >>> >>> > > > >>> >>> > > > Couple of questions: >>> >>> > > > * Do I understand correctly that this is 2.x only feature and >>> we >>> >>> can't >>> >>> > > > backport it to 1.x line? >>> >>> > > > I'm not intended to do any backport, just would like to know >>> the >>> >>> > > technical >>> >>> > > > constraints. >>> >>> > > > * Is it possible to describe such function to see the column >>> >>> > names/types? >>> >>> > > > >>> >>> > > > BR, >>> >>> > > > G >>> >>> > > > >>> >>> > > > >>> >>> > > > On Thu, Mar 27, 2025 at 3:17 AM Shengkai Fang < >>> fskm...@gmail.com> >>> >>> > wrote: >>> >>> > > > >>> >>> > > > > Many thanks for your reminder, Leonard. Here's the link I >>> >>> > mentioned[1]. >>> >>> > > > > >>> >>> > > > > Best, >>> >>> > > > > Shengkai >>> >>> > > > > >>> >>> > > > > [1] https://github.com/apache/flink/pull/26358 >>> >>> > > > > >>> >>> > > > > Leonard Xu <xbjt...@gmail.com> 于2025年3月27日周四 10:05写道: >>> >>> > > > > >>> >>> > > > > > Your link is broken, Shengkai >>> >>> > > > > > >>> >>> > > > > > Best, >>> >>> > > > > > Leonard >>> >>> > > > > > >>> >>> > > > > > > 2025年3月27日 10:01,Shengkai Fang <fskm...@gmail.com> 写道: >>> >>> > > > > > > >>> >>> > > > > > > Hi, All. >>> >>> > > > > > > >>> >>> > > > > > > I write a simple demo to illustrate my idea. Hope this >>> helps. >>> >>> > > > > > > >>> >>> > > > > > > Best, >>> >>> > > > > > > Shengkai >>> >>> > > > > > > >>> >>> > > > > > > >>> >>> > > > > > >>> >>> > > > > >>> >>> > > > >>> >>> > > >>> >>> > >>> >>> >>> https://github.com/apache/flink/compare/master...fsk119:flink:example?expand=1 >>> >>> > > > > > > >>> >>> > > > > > > Gabor Somogyi <gabor.g.somo...@gmail.com> 于2025年3月26日周三 >>> >>> 15:54写道: >>> >>> > > > > > > >>> >>> > > > > > >>> I'm fine with a seperate SQL connector for metadata, so >>> >>> maybe >>> >>> > we >>> >>> > > > > could >>> >>> > > > > > >> update the FLIP about our discussion? >>> >>> > > > > > >> >>> >>> > > > > > >> Sorry, I've forgotten this part. Yeah, no matter we >>> choose >>> >>> I'm >>> >>> > > going >>> >>> > > > > to >>> >>> > > > > > >> update the FLIP. >>> >>> > > > > > >> >>> >>> > > > > > >> G >>> >>> > > > > > >> >>> >>> > > > > > >> >>> >>> > > > > > >> On Wed, Mar 26, 2025 at 8:51 AM Gabor Somogyi < >>> >>> > > > > > gabor.g.somo...@gmail.com> >>> >>> > > > > > >> wrote: >>> >>> > > > > > >> >>> >>> > > > > > >>> Hi All, >>> >>> > > > > > >>> >>> >>> > > > > > >>> I've also lack of the knowledge of PTF so I've read >>> just >>> >>> the >>> >>> > > > > motivation >>> >>> > > > > > >>> part: >>> >>> > > > > > >>> >>> >>> > > > > > >>> "The SQL 2016 standard introduced a way of defining >>> custom >>> >>> SQL >>> >>> > > > > > operators >>> >>> > > > > > >>> defined by ISO/IEC 19075-7:2021 (Part 7: Polymorphic >>> table >>> >>> > > > > functions). >>> >>> > > > > > >>> ~200 pages define how this new kind of function can >>> >>> consume and >>> >>> > > > > produce >>> >>> > > > > > >>> tables with various execution properties. >>> >>> > > > > > >>> Unfortunately, this part of the standard is not >>> publicly >>> >>> > > > available." >>> >>> > > > > > >>> >>> >>> > > > > > >>> Of course we can take a look at some examples but do we >>> >>> really >>> >>> > > want >>> >>> > > > > to >>> >>> > > > > > >>> expose state data with this construct >>> >>> > > > > > >>> which is described in ~200 pages and part of the >>> standard >>> >>> is >>> >>> > not >>> >>> > > > > > publicly >>> >>> > > > > > >>> available? 🙂 >>> >>> > > > > > >>> I mean the dataset is couple of rows and the use-case >>> is >>> >>> join >>> >>> > > with >>> >>> > > > > > >> another >>> >>> > > > > > >>> table like with state data. >>> >>> > > > > > >>> If somebody can give advantages I would buy that but >>> from >>> >>> my >>> >>> > > > limited >>> >>> > > > > > >>> understanding this would be an overkill here. >>> >>> > > > > > >>> >>> >>> > > > > > >>> BR, >>> >>> > > > > > >>> G >>> >>> > > > > > >>> >>> >>> > > > > > >>> >>> >>> > > > > > >>> On Wed, Mar 26, 2025 at 8:28 AM Gyula Fóra < >>> >>> > gyula.f...@gmail.com >>> >>> > > > >>> >>> > > > > > wrote: >>> >>> > > > > > >>> >>> >>> > > > > > >>>> Hi Zakelly , Shengkai! >>> >>> > > > > > >>>> >>> >>> > > > > > >>>> I don't know too much about PTFs, it would be >>> interesting >>> >>> to >>> >>> > see >>> >>> > > > how >>> >>> > > > > > the >>> >>> > > > > > >>>> usage would look in practice. >>> >>> > > > > > >>>> >>> >>> > > > > > >>>> Do you have some mockup/example in mind how the PTF >>> would >>> >>> look >>> >>> > > for >>> >>> > > > > > >> example >>> >>> > > > > > >>>> when want to: >>> >>> > > > > > >>>> - Simply display/aggregate whats in the metadata >>> >>> > > > > > >>>> - Join keyed state with some metadata columns >>> >>> > > > > > >>>> >>> >>> > > > > > >>>> Thanks >>> >>> > > > > > >>>> Gyula >>> >>> > > > > > >>>> >>> >>> > > > > > >>>> On Wed, Mar 26, 2025 at 7:33 AM Zakelly Lan < >>> >>> > > > zakelly....@gmail.com> >>> >>> > > > > > >>>> wrote: >>> >>> > > > > > >>>> >>> >>> > > > > > >>>>> Hi everyone, >>> >>> > > > > > >>>>> >>> >>> > > > > > >>>>> I'm fine with a seperate SQL connector for metadata, >>> so >>> >>> maybe >>> >>> > > we >>> >>> > > > > > could >>> >>> > > > > > >>>>> update the FLIP about our discussion? And Shengkai >>> >>> provides a >>> >>> > > PTF >>> >>> > > > > > >>>>> implementation, does that also meet the requirement? >>> >>> > > > > > >>>>> >>> >>> > > > > > >>>>> >>> >>> > > > > > >>>>> Best, >>> >>> > > > > > >>>>> Zakelly >>> >>> > > > > > >>>>> >>> >>> > > > > > >>>>> On Thu, Mar 20, 2025 at 4:47 PM Gabor Somogyi < >>> >>> > > > > > >>>> gabor.g.somo...@gmail.com> >>> >>> > > > > > >>>>> wrote: >>> >>> > > > > > >>>>> >>> >>> > > > > > >>>>>> Hi All, >>> >>> > > > > > >>>>>> >>> >>> > > > > > >>>>>> @Zakelly: Gyula summarised it correctly what I >>> meant so >>> >>> > please >>> >>> > > > > treat >>> >>> > > > > > >>>> the >>> >>> > > > > > >>>>>> content as mine. >>> >>> > > > > > >>>>>> As an addition I'm not against to add CLI at all, >>> I'm >>> >>> just >>> >>> > > > stating >>> >>> > > > > > >>>> that >>> >>> > > > > > >>>>> in >>> >>> > > > > > >>>>>> some cases like this, users would like to have >>> >>> > > > > > >>>>>> a self-serving solution where they can provide SQL >>> >>> > statements >>> >>> > > > > which >>> >>> > > > > > >>>> can >>> >>> > > > > > >>>>>> trigger alerts automatically. >>> >>> > > > > > >>>>>> >>> >>> > > > > > >>>>>> My personal opinion is that CLI would be beneficial >>> for >>> >>> > > several >>> >>> > > > > > >>>> cases. A >>> >>> > > > > > >>>>>> good example is when users want to restart job >>> >>> > > > > > >>>>>> from specific Kafka offsets which are persisted in a >>> >>> > > savepoint. >>> >>> > > > > For >>> >>> > > > > > >>>> such >>> >>> > > > > > >>>>>> scenario users are more than happy since they >>> >>> > > > > > >>>>>> expect manual intervention with full control. So >>> all in >>> >>> all >>> >>> > > one >>> >>> > > > > can >>> >>> > > > > > >>>> count >>> >>> > > > > > >>>>>> on my +1 when CLI FLIP would come up... >>> >>> > > > > > >>>>>> >>> >>> > > > > > >>>>>> BR, >>> >>> > > > > > >>>>>> G >>> >>> > > > > > >>>>>> >>> >>> > > > > > >>>>>> >>> >>> > > > > > >>>>>> On Thu, Mar 20, 2025 at 8:20 AM Gyula Fóra < >>> >>> > > > gyula.f...@gmail.com> >>> >>> > > > > > >>>> wrote: >>> >>> > > > > > >>>>>> >>> >>> > > > > > >>>>>>> Hi! >>> >>> > > > > > >>>>>>> >>> >>> > > > > > >>>>>>> @Zakelly Lan <zakelly....@gmail.com> >>> >>> > > > > > >>>>>>> I think what Gabor means is that users want to have >>> >>> > > predefined >>> >>> > > > > SQL >>> >>> > > > > > >>>>> scripts >>> >>> > > > > > >>>>>>> to perform state analysis tasks to debug/identify >>> >>> problems. >>> >>> > > > > > >>>>>>> Such as write a SQL script that joins the metadata >>> >>> table >>> >>> > with >>> >>> > > > the >>> >>> > > > > > >>>> state >>> >>> > > > > > >>>>>>> and >>> >>> > > > > > >>>>>>> do some analytics on it. >>> >>> > > > > > >>>>>>> >>> >>> > > > > > >>>>>>> If we have a meta table then the SQL script that >>> can do >>> >>> > this >>> >>> > > is >>> >>> > > > > > >> fixed >>> >>> > > > > > >>>>> and >>> >>> > > > > > >>>>>>> users can trigger this on demand by simply >>> providing a >>> >>> new >>> >>> > > > > > >> savepoint >>> >>> > > > > > >>>>> path. >>> >>> > > > > > >>>>>>> >>> >>> > > > > > >>>>>>> If we have a different mechanism to extract >>> metadata >>> >>> that >>> >>> > is >>> >>> > > > not >>> >>> > > > > > >> SQL >>> >>> > > > > > >>>>>>> native >>> >>> > > > > > >>>>>>> then manual steps need to be executed and a custom >>> SQL >>> >>> > script >>> >>> > > > > would >>> >>> > > > > > >>>> need >>> >>> > > > > > >>>>>>> to >>> >>> > > > > > >>>>>>> be written that adds the manually extracted >>> metadata >>> >>> into >>> >>> > the >>> >>> > > > > > >> script. >>> >>> > > > > > >>>>>>> >>> >>> > > > > > >>>>>>> Cheers, >>> >>> > > > > > >>>>>>> Gyula >>> >>> > > > > > >>>>>>> >>> >>> > > > > > >>>>>>> On Thu, Mar 20, 2025 at 4:32 AM Zakelly Lan < >>> >>> > > > > zakelly....@gmail.com >>> >>> > > > > > >>> >>> >>> > > > > > >>>>>>> wrote: >>> >>> > > > > > >>>>>>> >>> >>> > > > > > >>>>>>>> Hi all, >>> >>> > > > > > >>>>>>>> >>> >>> > > > > > >>>>>>>> Thanks for your answers! Getting everyone aligned >>> on >>> >>> this >>> >>> > > > topic >>> >>> > > > > > >> is >>> >>> > > > > > >>>>>>>> challenging, but it’s definitely worth the effort >>> >>> since it >>> >>> > > > will >>> >>> > > > > > >>>> help >>> >>> > > > > > >>>>>>>> streamline things moving forward. >>> >>> > > > > > >>>>>>>> >>> >>> > > > > > >>>>>>>> @Gabor are you saying that users are using some >>> >>> scripts to >>> >>> > > > > define >>> >>> > > > > > >>>> the >>> >>> > > > > > >>>>>>> SQL >>> >>> > > > > > >>>>>>>> metadata connector and get the information, >>> right? If >>> >>> so, >>> >>> > > > would >>> >>> > > > > a >>> >>> > > > > > >>>> CLI >>> >>> > > > > > >>>>>>> tool >>> >>> > > > > > >>>>>>>> be more convenient? It's easy to invoke and can >>> get >>> >>> the >>> >>> > > result >>> >>> > > > > > >>>>> swiftly. >>> >>> > > > > > >>>>>>> And >>> >>> > > > > > >>>>>>>> there should be some other systems to track the >>> >>> checkpoint >>> >>> > > > > > >> lineage >>> >>> > > > > > >>>> and >>> >>> > > > > > >>>>>>>> analyze if there are outliers in metadata (e.g. >>> state >>> >>> size >>> >>> > > of >>> >>> > > > > one >>> >>> > > > > > >>>>>>> operator) >>> >>> > > > > > >>>>>>>> right? Well, maybe I missed something so please >>> >>> correct me >>> >>> > > if >>> >>> > > > > I'm >>> >>> > > > > > >>>>> wrong. >>> >>> > > > > > >>>>>>>> >>> >>> > > > > > >>>>>>>> I think the overall vision in Flink SQL is to >>> provide >>> >>> a >>> >>> > SQL >>> >>> > > > > > >> native >>> >>> > > > > > >>>>>>>>> environment where we can serve complex use-cases >>> >>> like you >>> >>> > > > would >>> >>> > > > > > >>>>> expect >>> >>> > > > > > >>>>>>>> in a >>> >>> > > > > > >>>>>>>>> regular database. >>> >>> > > > > > >>>>>>>> >>> >>> > > > > > >>>>>>>> >>> >>> > > > > > >>>>>>>> @Gyula Well, this is a good point. From the >>> >>> perspective of >>> >>> > > > > > >>>>> comprehensive >>> >>> > > > > > >>>>>>>> SQL experience, I'd +1 for treating metadata as >>> data. >>> >>> > > > Although I >>> >>> > > > > > >>>> doubt >>> >>> > > > > > >>>>>>> if >>> >>> > > > > > >>>>>>>> there is a need for processing metadata, I won't >>> be >>> >>> > against >>> >>> > > a >>> >>> > > > > > >>>> separate >>> >>> > > > > > >>>>>>>> connector. >>> >>> > > > > > >>>>>>>> >>> >>> > > > > > >>>>>>>> Regarding the CLI tool, I still think it’s worth >>> >>> > > implementing. >>> >>> > > > > > >>>> Such a >>> >>> > > > > > >>>>>>> tool >>> >>> > > > > > >>>>>>>> could provide savepoint information before >>> resuming >>> >>> from a >>> >>> > > > > > >>>> savepoint, >>> >>> > > > > > >>>>>>> which >>> >>> > > > > > >>>>>>>> would enhance the user experience in CLI-based >>> >>> workflows. >>> >>> > It >>> >>> > > > > > >> would >>> >>> > > > > > >>>> be >>> >>> > > > > > >>>>>>> good >>> >>> > > > > > >>>>>>>> if someone could implement this feature. We >>> shouldn’t >>> >>> > worry >>> >>> > > > > about >>> >>> > > > > > >>>>>>> whether >>> >>> > > > > > >>>>>>>> this tool might be retired in the future. >>> Regardless >>> >>> of >>> >>> > the >>> >>> > > > > > >>>> SQL-based >>> >>> > > > > > >>>>>>>> solution we eventually adopt, this capability will >>> >>> remain >>> >>> > > > > > >> essential >>> >>> > > > > > >>>>> for >>> >>> > > > > > >>>>>>> CLI >>> >>> > > > > > >>>>>>>> users. This is another topic. >>> >>> > > > > > >>>>>>>> >>> >>> > > > > > >>>>>>>> >>> >>> > > > > > >>>>>>>> Best, >>> >>> > > > > > >>>>>>>> Zakelly >>> >>> > > > > > >>>>>>>> >>> >>> > > > > > >>>>>>>> >>> >>> > > > > > >>>>>>>> On Thu, Mar 20, 2025 at 10:37 AM Shengkai Fang < >>> >>> > > > > > >> fskm...@gmail.com> >>> >>> > > > > > >>>>>>> wrote: >>> >>> > > > > > >>>>>>>> >>> >>> > > > > > >>>>>>>>> Hi. >>> >>> > > > > > >>>>>>>>> >>> >>> > > > > > >>>>>>>>> After reading the doc[1], I think Spark provides >>> a >>> >>> > function >>> >>> > > > for >>> >>> > > > > > >>>>> users >>> >>> > > > > > >>>>>>> to >>> >>> > > > > > >>>>>>>>> consume the metadata from the savepoint. In >>> Flink >>> >>> SQL, >>> >>> > > > similar >>> >>> > > > > > >>>>>>>>> functionality is implemented through Polymorphic >>> >>> Table >>> >>> > > > > > >> Functions >>> >>> > > > > > >>>>>>> (PTF) as >>> >>> > > > > > >>>>>>>>> proposed in FLIP-440[2]. Below is a code >>> example[3] >>> >>> > > > > > >> illustrating >>> >>> > > > > > >>>>> this >>> >>> > > > > > >>>>>>>>> concept: >>> >>> > > > > > >>>>>>>>> >>> >>> > > > > > >>>>>>>>> ``` >>> >>> > > > > > >>>>>>>>> public static class ScalarArgsFunction extends >>> >>> > > > > > >>>>>>>>> TestProcessTableFunctionBase { >>> >>> > > > > > >>>>>>>>> public void eval(Integer i, Boolean b) { >>> >>> > > > > > >>>>>>>>> collectObjects(i, b); >>> >>> > > > > > >>>>>>>>> } >>> >>> > > > > > >>>>>>>>> } >>> >>> > > > > > >>>>>>>>> ``` >>> >>> > > > > > >>>>>>>>> >>> >>> > > > > > >>>>>>>>> ``` >>> >>> > > > > > >>>>>>>>> INSERT INTO sink SELECT * FROM f(i => 42, b => >>> >>> > CAST('TRUE' >>> >>> > > AS >>> >>> > > > > > >>>>>>> BOOLEAN)) >>> >>> > > > > > >>>>>>>>> `` >>> >>> > > > > > >>>>>>>>> >>> >>> > > > > > >>>>>>>>> So we can add a builtin function named >>> >>> > > `read_state_metadata` >>> >>> > > > to >>> >>> > > > > > >>>> read >>> >>> > > > > > >>>>>>>>> savepoint data. >>> >>> > > > > > >>>>>>>>> >>> >>> > > > > > >>>>>>>>> Best, >>> >>> > > > > > >>>>>>>>> Shengkai >>> >>> > > > > > >>>>>>>>> >>> >>> > > > > > >>>>>>>>> [1] >>> >>> > > > > > >>>>>>>>> >>> >>> > > > > > >>>>>>>>> >>> >>> > > > > > >>>>>>>> >>> >>> > > > > > >>>>>>> >>> >>> > > > > > >>>>> >>> >>> > > > > > >>>> >>> >>> > > > > > >> >>> >>> > > > > > >>> >>> > > > > >>> >>> > > > >>> >>> > > >>> >>> > >>> >>> >>> https://docs.databricks.com/aws/en/structured-streaming/read-state?language=SQL >>> >>> > > > > > >>>>>>>>> [2] >>> >>> > > > > > >>>>>>>>> >>> >>> > > > > > >>>>>>>> >>> >>> > > > > > >>>>>>> >>> >>> > > > > > >>>>> >>> >>> > > > > > >>>> >>> >>> > > > > > >> >>> >>> > > > > > >>> >>> > > > > >>> >>> > > > >>> >>> > > >>> >>> > >>> >>> >>> https://cwiki.apache.org/confluence/pages/viewpage.action?pageId=298781093 >>> >>> > > > > > >>>>>>>>> [3] >>> >>> > > > > > >>>>>>>>> >>> >>> > > > > > >>>>>>>>> >>> >>> > > > > > >>>>>>>> >>> >>> > > > > > >>>>>>> >>> >>> > > > > > >>>>> >>> >>> > > > > > >>>> >>> >>> > > > > > >> >>> >>> > > > > > >>> >>> > > > > >>> >>> > > > >>> >>> > > >>> >>> > >>> >>> >>> https://github.com/apache/flink/blob/master/flink-table/flink-table-planner/src/test/java/org/apache/flink/table/planner/plan/nodes/exec/stream/ProcessTableFunctionTestPrograms.java#L140 >>> >>> > > > > > >>>>>>>>> >>> >>> > > > > > >>>>>>>>> Gyula Fóra <gyula.f...@gmail.com> 于2025年3月19日周三 >>> >>> 18:37写道: >>> >>> > > > > > >>>>>>>>> >>> >>> > > > > > >>>>>>>>>> Hi All! >>> >>> > > > > > >>>>>>>>>> >>> >>> > > > > > >>>>>>>>>> Thank you for the answers and concerns from >>> >>> everyone. >>> >>> > > > > > >>>>>>>>>> >>> >>> > > > > > >>>>>>>>>> On the CLI vs State Metadata Connector/Table >>> >>> question I >>> >>> > > > would >>> >>> > > > > > >>>> also >>> >>> > > > > > >>>>>>> like >>> >>> > > > > > >>>>>>>>> to >>> >>> > > > > > >>>>>>>>>> step back a little and look at the bigger >>> picture. >>> >>> > > > > > >>>>>>>>>> >>> >>> > > > > > >>>>>>>>>> I think the overall vision in Flink SQL is to >>> >>> provide a >>> >>> > > SQL >>> >>> > > > > > >>>> native >>> >>> > > > > > >>>>>>>>>> environment where we can serve complex use-cases >>> >>> like >>> >>> > you >>> >>> > > > > > >> would >>> >>> > > > > > >>>>>>> expect >>> >>> > > > > > >>>>>>>>> in a >>> >>> > > > > > >>>>>>>>>> regular database. >>> >>> > > > > > >>>>>>>>>> Most features, developments in the recent years >>> have >>> >>> > gone >>> >>> > > > > > >> this >>> >>> > > > > > >>>>> way. >>> >>> > > > > > >>>>>>>>>> >>> >>> > > > > > >>>>>>>>>> The State Metadata Table would be a natural and >>> >>> > > > > > >> straightforward >>> >>> > > > > > >>>>> fit >>> >>> > > > > > >>>>>>>> here. >>> >>> > > > > > >>>>>>>>>> So from my side, +1 for that. >>> >>> > > > > > >>>>>>>>>> >>> >>> > > > > > >>>>>>>>>> However I could understand if we are not ready >>> to >>> >>> add a >>> >>> > > new >>> >>> > > > > > >>>>>>>>>> connector/format due to maintenance concerns >>> (and in >>> >>> > > general >>> >>> > > > > > >>>>> concern >>> >>> > > > > > >>>>>>>>> about >>> >>> > > > > > >>>>>>>>>> the design). >>> >>> > > > > > >>>>>>>>>> If that's the issue then we should spend more >>> time >>> >>> on >>> >>> > the >>> >>> > > > > > >>>> design >>> >>> > > > > > >>>>> to >>> >>> > > > > > >>>>>>> get >>> >>> > > > > > >>>>>>>>>> comfortable with the approach and seek feedback >>> >>> from the >>> >>> > > > > > >> wider >>> >>> > > > > > >>>>>>>> community >>> >>> > > > > > >>>>>>>>>> >>> >>> > > > > > >>>>>>>>>> I am -1 for the CLI/tooling approach as that >>> will >>> >>> not >>> >>> > > > provide >>> >>> > > > > > >>>> the >>> >>> > > > > > >>>>>>>>>> featureset we are looking for that is not >>> already >>> >>> > covered >>> >>> > > by >>> >>> > > > > > >>>> the >>> >>> > > > > > >>>>>>> Java >>> >>> > > > > > >>>>>>>>>> connector. And that approach would come with the >>> >>> same >>> >>> > > > > > >>>> maintenance >>> >>> > > > > > >>>>>>>>>> implications. >>> >>> > > > > > >>>>>>>>>> >>> >>> > > > > > >>>>>>>>>> Cheers >>> >>> > > > > > >>>>>>>>>> Gyula >>> >>> > > > > > >>>>>>>>>> >>> >>> > > > > > >>>>>>>>>> >>> >>> > > > > > >>>>>>>>>> On Wed, Mar 19, 2025 at 11:24 AM Gabor Somogyi < >>> >>> > > > > > >>>>>>>>> gabor.g.somo...@gmail.com> >>> >>> > > > > > >>>>>>>>>> wrote: >>> >>> > > > > > >>>>>>>>>> >>> >>> > > > > > >>>>>>>>>>> Hi Zaklely, Shengkai >>> >>> > > > > > >>>>>>>>>>> >>> >>> > > > > > >>>>>>>>>>> Several topics are going on so adding gist >>> answers >>> >>> to >>> >>> > > them. >>> >>> > > > > > >>>> When >>> >>> > > > > > >>>>>>> some >>> >>> > > > > > >>>>>>>>>> topic >>> >>> > > > > > >>>>>>>>>>> is not touched please highlight it. >>> >>> > > > > > >>>>>>>>>>> >>> >>> > > > > > >>>>>>>>>>> @Shengkai: I've read through all the previous >>> FLIPs >>> >>> > > related >>> >>> > > > > > >>>>>>> catalogs >>> >>> > > > > > >>>>>>>>> and >>> >>> > > > > > >>>>>>>>>> if >>> >>> > > > > > >>>>>>>>>>> we would like to keep the concepts there >>> >>> > > > > > >>>>>>>>>>> then one-to-one mapping relationship between >>> >>> savepoint >>> >>> > > and >>> >>> > > > > > >>>>> catalog >>> >>> > > > > > >>>>>>>> is a >>> >>> > > > > > >>>>>>>>>>> reasonable direction. In short I'm happy that >>> >>> > > > > > >>>>>>>>>>> you've highlighted this and agree as a whole. >>> I've >>> >>> > > written >>> >>> > > > > > >> it >>> >>> > > > > > >>>>> down >>> >>> > > > > > >>>>>>>>>>> previously, just want to double confirm that >>> state >>> >>> > > catalog >>> >>> > > > > > >> is >>> >>> > > > > > >>>>>>>>>>> essential and planned. When we reach this point >>> >>> then >>> >>> > your >>> >>> > > > > > >>>> input >>> >>> > > > > > >>>>> is >>> >>> > > > > > >>>>>>>> more >>> >>> > > > > > >>>>>>>>>>> than welcome. >>> >>> > > > > > >>>>>>>>>>> >>> >>> > > > > > >>>>>>>>>>> @Zakelly: We've tried the CLI and separate >>> library >>> >>> > > > > > >> approaches >>> >>> > > > > > >>>>> with >>> >>> > > > > > >>>>>>>>> users >>> >>> > > > > > >>>>>>>>>>> already and these are not something which is >>> >>> welcome >>> >>> > > > > > >> because >>> >>> > > > > > >>>> of >>> >>> > > > > > >>>>>>> the >>> >>> > > > > > >>>>>>>>>>> following: >>> >>> > > > > > >>>>>>>>>>> * Users want to have automated tasks and not >>> manual >>> >>> > > > > > >>>> CLI/library >>> >>> > > > > > >>>>>>>> output >>> >>> > > > > > >>>>>>>>>>> parsing. This can be hacked around but our >>> >>> experience >>> >>> > is >>> >>> > > > > > >>>>> negative >>> >>> > > > > > >>>>>>> on >>> >>> > > > > > >>>>>>>>> this >>> >>> > > > > > >>>>>>>>>>> because it's just brittle. >>> >>> > > > > > >>>>>>>>>>> * From development perspective It's way much >>> bigger >>> >>> > > effort >>> >>> > > > > > >>>> than >>> >>> > > > > > >>>>> a >>> >>> > > > > > >>>>>>>>>> connector >>> >>> > > > > > >>>>>>>>>>> (hard to test, packaging/version handling is >>> and >>> >>> extra >>> >>> > > > > > >> layer >>> >>> > > > > > >>>> of >>> >>> > > > > > >>>>>>>>>> complexity, >>> >>> > > > > > >>>>>>>>>>> external FS authentication is pain for users, >>> >>> expecting >>> >>> > > > > > >> them >>> >>> > > > > > >>>> to >>> >>> > > > > > >>>>>>>>> download >>> >>> > > > > > >>>>>>>>>>> savepoints also) >>> >>> > > > > > >>>>>>>>>>> * Purely personal opinion but if we would find >>> >>> better >>> >>> > > ways >>> >>> > > > > > >>>> later >>> >>> > > > > > >>>>>>> then >>> >>> > > > > > >>>>>>>>>>> retire a CLI is not more lightweight than >>> retire a >>> >>> > > > > > >> connector >>> >>> > > > > > >>>>>>>>>>> >>> >>> > > > > > >>>>>>>>>>>> It would be great if you give some examples >>> on how >>> >>> > user >>> >>> > > > > > >>>> could >>> >>> > > > > > >>>>>>>>> leverage >>> >>> > > > > > >>>>>>>>>>> the separate connector to process the metadata. >>> >>> > > > > > >>>>>>>>>>> >>> >>> > > > > > >>>>>>>>>>> The most simplest cases: >>> >>> > > > > > >>>>>>>>>>> * give me the overgroving state uids >>> >>> > > > > > >>>>>>>>>>> * give me the not known (new or renamed) state >>> uids >>> >>> > > > > > >>>>>>>>>>> * give me the state uids where state size >>> >>> drastically >>> >>> > > > > > >> dropped >>> >>> > > > > > >>>>>>> compare >>> >>> > > > > > >>>>>>>>> to >>> >>> > > > > > >>>>>>>>>> a >>> >>> > > > > > >>>>>>>>>>> previous savepoint (accidental state loss) >>> >>> > > > > > >>>>>>>>>>> >>> >>> > > > > > >>>>>>>>>>> Since it was mentioned: as a general offtopic >>> >>> teaser, >>> >>> > > yeah >>> >>> > > > > > >> it >>> >>> > > > > > >>>>>>> would >>> >>> > > > > > >>>>>>>> be >>> >>> > > > > > >>>>>>>>>> good >>> >>> > > > > > >>>>>>>>>>> to have some sort of checkpoint/savepoint >>> lineage >>> >>> or >>> >>> > > > > > >> however >>> >>> > > > > > >>>> we >>> >>> > > > > > >>>>>>> call >>> >>> > > > > > >>>>>>>>> it. >>> >>> > > > > > >>>>>>>>>>> Since we've not yet reached this point there >>> are no >>> >>> > > > > > >> technical >>> >>> > > > > > >>>>>>>> details, >>> >>> > > > > > >>>>>>>>>> it's >>> >>> > > > > > >>>>>>>>>>> more like a vision. It's a common pattern that >>> >>> > > > > > >>>>>>>>>>> jobs are physically running but somehow the >>> state >>> >>> > > > > > >> processing >>> >>> > > > > > >>>> is >>> >>> > > > > > >>>>>>> stuck >>> >>> > > > > > >>>>>>>>> and >>> >>> > > > > > >>>>>>>>>>> it would be good to add some way to find it out >>> >>> > > > > > >>>> automatically. >>> >>> > > > > > >>>>>>>>>>> The important saying here is automation and not >>> >>> manual >>> >>> > > > > > >>>>> evaluation >>> >>> > > > > > >>>>>>>> since >>> >>> > > > > > >>>>>>>>>>> handling 10k+ jobs is just not allowing that. >>> >>> > > > > > >>>>>>>>>>> >>> >>> > > > > > >>>>>>>>>>> BR, >>> >>> > > > > > >>>>>>>>>>> G >>> >>> > > > > > >>>>>>>>>>> >>> >>> > > > > > >>>>>>>>>>> >>> >>> > > > > > >>>>>>>>>>> On Wed, Mar 19, 2025 at 6:46 AM Shengkai Fang < >>> >>> > > > > > >>>>> fskm...@gmail.com> >>> >>> > > > > > >>>>>>>>> wrote: >>> >>> > > > > > >>>>>>>>>>> >>> >>> > > > > > >>>>>>>>>>>> Hi, All. >>> >>> > > > > > >>>>>>>>>>>> >>> >>> > > > > > >>>>>>>>>>>> About State Catalog, I want to share more >>> thoughts >>> >>> > about >>> >>> > > > > > >>>> this. >>> >>> > > > > > >>>>>>>>>>>> >>> >>> > > > > > >>>>>>>>>>>> In the initial design concept, I understood >>> that a >>> >>> > > > > > >>>> savepoint >>> >>> > > > > > >>>>>>> and a >>> >>> > > > > > >>>>>>>>>> state >>> >>> > > > > > >>>>>>>>>>>> catalog have a one-to-one mapping >>> relationship. >>> >>> Each >>> >>> > > > > > >>>> operator >>> >>> > > > > > >>>>>>>>>> corresponds >>> >>> > > > > > >>>>>>>>>>>> to a database, and the state of each operator >>> is >>> >>> > > > > > >>>> represented >>> >>> > > > > > >>>>> as >>> >>> > > > > > >>>>>>>>>>> individual >>> >>> > > > > > >>>>>>>>>>>> tables. The rationale behind this design is: >>> >>> > > > > > >>>>>>>>>>>> >>> >>> > > > > > >>>>>>>>>>>> *State Diversity*: An operator may involve >>> >>> multiple >>> >>> > > types >>> >>> > > > > > >>>> of >>> >>> > > > > > >>>>>>>> states. >>> >>> > > > > > >>>>>>>>>> For >>> >>> > > > > > >>>>>>>>>>>> example, in our VVR design, a "multi-join" >>> >>> operator >>> >>> > uses >>> >>> > > > > > >>>> keyed >>> >>> > > > > > >>>>>>>> states >>> >>> > > > > > >>>>>>>>>> for >>> >>> > > > > > >>>>>>>>>>>> two input streams and a broadcast state for >>> the >>> >>> third >>> >>> > > > > > >>>> stream. >>> >>> > > > > > >>>>>>> This >>> >>> > > > > > >>>>>>>>>> makes >>> >>> > > > > > >>>>>>>>>>> it >>> >>> > > > > > >>>>>>>>>>>> challenging to represent all states of an >>> operator >>> >>> > > > > > >> within a >>> >>> > > > > > >>>>>>> single >>> >>> > > > > > >>>>>>>>>> table. >>> >>> > > > > > >>>>>>>>>>>> *Scalability*: Internally, an operator might >>> have >>> >>> > > > > > >> multiple >>> >>> > > > > > >>>>> keyed >>> >>> > > > > > >>>>>>>>> states >>> >>> > > > > > >>>>>>>>>>>> (e.g., value state and list state). However, >>> large >>> >>> > list >>> >>> > > > > > >>>> states >>> >>> > > > > > >>>>>>> may >>> >>> > > > > > >>>>>>>>> not >>> >>> > > > > > >>>>>>>>>>> fit >>> >>> > > > > > >>>>>>>>>>>> entirely in memory. To address this, we >>> recommend >>> >>> > > > > > >>>> implementing >>> >>> > > > > > >>>>>>> each >>> >>> > > > > > >>>>>>>>>> state >>> >>> > > > > > >>>>>>>>>>>> as a separate table. >>> >>> > > > > > >>>>>>>>>>>> >>> >>> > > > > > >>>>>>>>>>>> To resolve the loosely coupled relationships >>> >>> between >>> >>> > > > > > >>>> operator >>> >>> > > > > > >>>>>>>> states, >>> >>> > > > > > >>>>>>>>>> we >>> >>> > > > > > >>>>>>>>>>>> propose embedding predefined views within the >>> >>> catalog. >>> >>> > > > > > >>>> These >>> >>> > > > > > >>>>>>> views >>> >>> > > > > > >>>>>>>>>>> simplify >>> >>> > > > > > >>>>>>>>>>>> user understanding of operator >>> implementations and >>> >>> > > > > > >> provide >>> >>> > > > > > >>>> a >>> >>> > > > > > >>>>>>> more >>> >>> > > > > > >>>>>>>>>>> intuitive >>> >>> > > > > > >>>>>>>>>>>> perspective. For instance, a join operator may >>> >>> have >>> >>> > > > > > >>>> multiple >>> >>> > > > > > >>>>>>> state >>> >>> > > > > > >>>>>>>>>>>> implementations (depending on whether the >>> join key >>> >>> > > > > > >> includes >>> >>> > > > > > >>>>>>> unique >>> >>> > > > > > >>>>>>>>>>>> attributes), but users primarily care about >>> the >>> >>> data >>> >>> > > > > > >>>>> associated >>> >>> > > > > > >>>>>>>> with >>> >>> > > > > > >>>>>>>>> a >>> >>> > > > > > >>>>>>>>>>>> specific join key across input streams. >>> >>> > > > > > >>>>>>>>>>>> >>> >>> > > > > > >>>>>>>>>>>> Returning to the one-to-one mapping between >>> >>> savepoints >>> >>> > > > > > >> and >>> >>> > > > > > >>>>>>>> catalogs, >>> >>> > > > > > >>>>>>>>> we >>> >>> > > > > > >>>>>>>>>>> aim >>> >>> > > > > > >>>>>>>>>>>> to manage multiple user state catalogs >>> through a >>> >>> > catalog >>> >>> > > > > > >>>>> store. >>> >>> > > > > > >>>>>>>> When >>> >>> > > > > > >>>>>>>>> a >>> >>> > > > > > >>>>>>>>>>> user >>> >>> > > > > > >>>>>>>>>>>> triggers a savepoint for a job on the >>> platform: >>> >>> > > > > > >>>>>>>>>>>> >>> >>> > > > > > >>>>>>>>>>>> 1. The platform sends a REST request to the >>> >>> > JobManager. >>> >>> > > > > > >>>>>>>>>>>> 2. Simultaneously, it registers a new state >>> >>> catalog in >>> >>> > > > > > >> the >>> >>> > > > > > >>>>>>> catalog >>> >>> > > > > > >>>>>>>>>> store, >>> >>> > > > > > >>>>>>>>>>>> enabling immediate analysis of state data on >>> the >>> >>> > > > > > >> platform. >>> >>> > > > > > >>>>>>>>>>>> 3. Deleting a savepoint would also trigger the >>> >>> removal >>> >>> > > of >>> >>> > > > > > >>>> its >>> >>> > > > > > >>>>>>>>>> associated >>> >>> > > > > > >>>>>>>>>>>> catalog. >>> >>> > > > > > >>>>>>>>>>>> >>> >>> > > > > > >>>>>>>>>>>> This vision assumes that states are >>> >>> self-describing or >>> >>> > > > > > >>>> that a >>> >>> > > > > > >>>>>>> state >>> >>> > > > > > >>>>>>>>>>>> metaservice is introduced to analyze savepoint >>> >>> > > > > > >> structures. >>> >>> > > > > > >>>>>>>>>>>> >>> >>> > > > > > >>>>>>>>>>>>> How can users create logic to identify >>> >>> differences >>> >>> > > > > > >>>> between >>> >>> > > > > > >>>>>>>> multiple >>> >>> > > > > > >>>>>>>>>>>> savepoints? >>> >>> > > > > > >>>>>>>>>>>> >>> >>> > > > > > >>>>>>>>>>>> Since savepoints and state catalogs are >>> one-to-one >>> >>> > > > > > >> mapped, >>> >>> > > > > > >>>>> users >>> >>> > > > > > >>>>>>>> can >>> >>> > > > > > >>>>>>>>>>> query >>> >>> > > > > > >>>>>>>>>>>> metadata via their respective catalogs. For >>> >>> example: >>> >>> > > > > > >>>>>>>>>>>> >>> >>> > > > > > >>>>>>>>>>>> 1. >>> >>> > > > > > >>>>> >>> >>> `savepoint-${id}`.`system`.`metadata_table`.`<operator-name>` >>> >>> > > > > > >>>>>>>>>> provides >>> >>> > > > > > >>>>>>>>>>>> operator-specific metadata (e.g., state size, >>> >>> type). >>> >>> > > > > > >>>>>>>>>>>> 2. Comparing metadata tables (e.g., schema >>> >>> versions, >>> >>> > > > > > >> state >>> >>> > > > > > >>>>> entry >>> >>> > > > > > >>>>>>>>>> counts) >>> >>> > > > > > >>>>>>>>>>>> across catalogs reveals structural or >>> quantitative >>> >>> > > > > > >>>>> differences. >>> >>> > > > > > >>>>>>>>>>>> 3. For deeper analysis, users could write SQL >>> >>> queries >>> >>> > to >>> >>> > > > > > >>>>> compare >>> >>> > > > > > >>>>>>>>>> specific >>> >>> > > > > > >>>>>>>>>>>> state partitions or leverage the metaservice >>> to >>> >>> track >>> >>> > > > > > >> state >>> >>> > > > > > >>>>>>>> evolution >>> >>> > > > > > >>>>>>>>>>>> (e.g., added/removed operators, modified state >>> >>> > > > > > >>>>> configurations). >>> >>> > > > > > >>>>>>>>>>>> >>> >>> > > > > > >>>>>>>>>>>> If we plan to introduce a state catalog in the >>> >>> > future, I >>> >>> > > > > > >>>> would >>> >>> > > > > > >>>>>>> lean >>> >>> > > > > > >>>>>>>>>>> toward >>> >>> > > > > > >>>>>>>>>>>> using metadata tables. If a utility tool can >>> >>> address >>> >>> > the >>> >>> > > > > > >>>>>>> challenges >>> >>> > > > > > >>>>>>>>> we >>> >>> > > > > > >>>>>>>>>>>> face, could we avoid introducing an additional >>> >>> > > connector? >>> >>> > > > > > >>>>>>>>>>>> >>> >>> > > > > > >>>>>>>>>>>> Best, >>> >>> > > > > > >>>>>>>>>>>> Shengkai >>> >>> > > > > > >>>>>>>>>>>> >>> >>> > > > > > >>>>>>>>>>>> Gyula Fóra <gyula.f...@gmail.com> >>> 于2025年3月17日周一 >>> >>> > > 20:25写道: >>> >>> > > > > > >>>>>>>>>>>> >>> >>> > > > > > >>>>>>>>>>>>> Hi All! >>> >>> > > > > > >>>>>>>>>>>>> >>> >>> > > > > > >>>>>>>>>>>>> Without going into too much detail here are >>> my 2 >>> >>> > cents >>> >>> > > > > > >>>>>>> regarding >>> >>> > > > > > >>>>>>>>> the >>> >>> > > > > > >>>>>>>>>>>>> virtual column / catalog metadata / table >>> >>> (connector) >>> >>> > > > > > >>>>>>> discussion >>> >>> > > > > > >>>>>>>>> for >>> >>> > > > > > >>>>>>>>>>> the >>> >>> > > > > > >>>>>>>>>>>>> State metadata. >>> >>> > > > > > >>>>>>>>>>>>> >>> >>> > > > > > >>>>>>>>>>>>> State metadata such as the types of states, >>> their >>> >>> > > > > > >>>>> properties, >>> >>> > > > > > >>>>>>>>> names, >>> >>> > > > > > >>>>>>>>>>>> sizes >>> >>> > > > > > >>>>>>>>>>>>> etc are all valuable information that can be >>> >>> used to >>> >>> > > > > > >>>> enrich >>> >>> > > > > > >>>>>>> the >>> >>> > > > > > >>>>>>>>>>>>> computations we do on state. >>> >>> > > > > > >>>>>>>>>>>>> We can either analyze it standalone (such as >>> >>> discover >>> >>> > > > > > >>>>>>> anomalies, >>> >>> > > > > > >>>>>>>>> for >>> >>> > > > > > >>>>>>>>>>>> large >>> >>> > > > > > >>>>>>>>>>>>> jobs with many states), across multiple >>> >>> savepoints >>> >>> > > > > > >>>> (discover >>> >>> > > > > > >>>>>>> how >>> >>> > > > > > >>>>>>>>>> state >>> >>> > > > > > >>>>>>>>>>>>> changed over time) or by joining it with >>> keyed or >>> >>> > > > > > >>>> non-keyed >>> >>> > > > > > >>>>>>> state >>> >>> > > > > > >>>>>>>>>> data >>> >>> > > > > > >>>>>>>>>>> to >>> >>> > > > > > >>>>>>>>>>>>> serve more complex queries on the state. >>> >>> > > > > > >>>>>>>>>>>>> >>> >>> > > > > > >>>>>>>>>>>>> The only solution that seems to serve all >>> these >>> >>> > > > > > >> use-cases >>> >>> > > > > > >>>>> and >>> >>> > > > > > >>>>>>>>>>>> requirements >>> >>> > > > > > >>>>>>>>>>>>> in a straightforward and SQL canonical way >>> is to >>> >>> > simply >>> >>> > > > > > >>>>> expose >>> >>> > > > > > >>>>>>>> the >>> >>> > > > > > >>>>>>>>>>> state >>> >>> > > > > > >>>>>>>>>>>>> metadata as a separate table. This is a >>> metadata >>> >>> > table >>> >>> > > > > > >>>> but >>> >>> > > > > > >>>>> you >>> >>> > > > > > >>>>>>>> can >>> >>> > > > > > >>>>>>>>>> also >>> >>> > > > > > >>>>>>>>>>>>> think of it as data table, it makes no >>> practical >>> >>> > > > > > >>>> difference >>> >>> > > > > > >>>>>>> here. >>> >>> > > > > > >>>>>>>>>>>>> >>> >>> > > > > > >>>>>>>>>>>>> Once we have a catalog later, the catalog can >>> >>> offer >>> >>> > > > > > >> this >>> >>> > > > > > >>>>> table >>> >>> > > > > > >>>>>>>> out >>> >>> > > > > > >>>>>>>>> of >>> >>> > > > > > >>>>>>>>>>> the >>> >>> > > > > > >>>>>>>>>>>>> box, the same way databases provide metadata >>> >>> tables. >>> >>> > > > > > >> For >>> >>> > > > > > >>>>> this >>> >>> > > > > > >>>>>>> to >>> >>> > > > > > >>>>>>>>> work >>> >>> > > > > > >>>>>>>>>>>>> however we need another, simpler connector >>> that >>> >>> > creates >>> >>> > > > > > >>>> this >>> >>> > > > > > >>>>>>>> table. >>> >>> > > > > > >>>>>>>>>>>>> >>> >>> > > > > > >>>>>>>>>>>>> +1 for state metadata as a separate >>> >>> connector/table, >>> >>> > > > > > >>>> instead >>> >>> > > > > > >>>>>>> of >>> >>> > > > > > >>>>>>>>>> adding >>> >>> > > > > > >>>>>>>>>>>>> virtual columns and adhoc catalog metadata >>> that >>> >>> is >>> >>> > hard >>> >>> > > > > > >>>> to >>> >>> > > > > > >>>>> use >>> >>> > > > > > >>>>>>>> in a >>> >>> > > > > > >>>>>>>>>>> large >>> >>> > > > > > >>>>>>>>>>>>> number of queries. >>> >>> > > > > > >>>>>>>>>>>>> >>> >>> > > > > > >>>>>>>>>>>>> Cheers, >>> >>> > > > > > >>>>>>>>>>>>> Gyula >>> >>> > > > > > >>>>>>>>>>>>> >>> >>> > > > > > >>>>>>>>>>>>> On Mon, Mar 17, 2025 at 12:44 PM Gabor >>> Somogyi < >>> >>> > > > > > >>>>>>>>>>>> gabor.g.somo...@gmail.com> >>> >>> > > > > > >>>>>>>>>>>>> wrote: >>> >>> > > > > > >>>>>>>>>>>>> >>> >>> > > > > > >>>>>>>>>>>>>> 1. State TTL for Value Columns >>> >>> > > > > > >>>>>>>>>>>>>> >>> >>> > > > > > >>>>>>>>>>>>>>> I’m planning on adding this, and we may >>> >>> collaborate >>> >>> > > > > > >>>> on >>> >>> > > > > > >>>>> it >>> >>> > > > > > >>>>>>> in >>> >>> > > > > > >>>>>>>>> the >>> >>> > > > > > >>>>>>>>>>>>> future. >>> >>> > > > > > >>>>>>>>>>>>>> >>> >>> > > > > > >>>>>>>>>>>>>> +1 on this, just ping me. >>> >>> > > > > > >>>>>>>>>>>>>> >>> >>> > > > > > >>>>>>>>>>>>>> 2. Metadata Table vs. Metadata Column >>> >>> > > > > > >>>>>>>>>>>>>> >>> >>> > > > > > >>>>>>>>>>>>>> After some code digging and POC all I can >>> say >>> >>> that >>> >>> > > > > > >> with >>> >>> > > > > > >>>>>>> heavy >>> >>> > > > > > >>>>>>>>>> effort >>> >>> > > > > > >>>>>>>>>>> we >>> >>> > > > > > >>>>>>>>>>>>> can >>> >>> > > > > > >>>>>>>>>>>>>> maybe add such changes that we're able to >>> show >>> >>> > > > > > >> metadata >>> >>> > > > > > >>>>> of a >>> >>> > > > > > >>>>>>>>>>> savepoint >>> >>> > > > > > >>>>>>>>>>>>> from >>> >>> > > > > > >>>>>>>>>>>>>> catalog. >>> >>> > > > > > >>>>>>>>>>>>>> I'm not against that but from user >>> perspective >>> >>> this >>> >>> > > > > > >> has >>> >>> > > > > > >>>>>>> limited >>> >>> > > > > > >>>>>>>>>>> value, >>> >>> > > > > > >>>>>>>>>>>>> let >>> >>> > > > > > >>>>>>>>>>>>>> me explain why. >>> >>> > > > > > >>>>>>>>>>>>>> >>> >>> > > > > > >>>>>>>>>>>>>> From high level perspective I see the >>> following >>> >>> > > > > > >> which I >>> >>> > > > > > >>>>> see >>> >>> > > > > > >>>>>>>>>> agreement >>> >>> > > > > > >>>>>>>>>>>> on: >>> >>> > > > > > >>>>>>>>>>>>>> * We should have a catalog which is >>> >>> representing one >>> >>> > > > > > >> or >>> >>> > > > > > >>>>> more >>> >>> > > > > > >>>>>>>> jobs >>> >>> > > > > > >>>>>>>>>>>>> savepoint >>> >>> > > > > > >>>>>>>>>>>>>> data set (future plan) >>> >>> > > > > > >>>>>>>>>>>>>> * Savepoints should be able to be >>> registered in >>> >>> the >>> >>> > > > > > >>>>> catalog >>> >>> > > > > > >>>>>>>> which >>> >>> > > > > > >>>>>>>>>> are >>> >>> > > > > > >>>>>>>>>>>>> then >>> >>> > > > > > >>>>>>>>>>>>>> databases (future plan) >>> >>> > > > > > >>>>>>>>>>>>>> * There must be a possiblity to create >>> tables >>> >>> from >>> >>> > > > > > >>>>> databases >>> >>> > > > > > >>>>>>>>> where >>> >>> > > > > > >>>>>>>>>>>> users >>> >>> > > > > > >>>>>>>>>>>>>> can read state data (exists already) >>> >>> > > > > > >>>>>>>>>>>>>> >>> >>> > > > > > >>>>>>>>>>>>>> In terms of metadata, If I understand >>> correctly >>> >>> then >>> >>> > > > > > >>>> the >>> >>> > > > > > >>>>>>>>> suggested >>> >>> > > > > > >>>>>>>>>>>>> approach >>> >>> > > > > > >>>>>>>>>>>>>> would be to access >>> >>> > > > > > >>>>>>>>>>>>>> it from the catalog describe command, right? >>> >>> Adding >>> >>> > > > > > >>>> that >>> >>> > > > > > >>>>>>> info >>> >>> > > > > > >>>>>>>>> when >>> >>> > > > > > >>>>>>>>>>>>> specific >>> >>> > > > > > >>>>>>>>>>>>>> database describe command >>> >>> > > > > > >>>>>>>>>>>>>> is executed could be done. >>> >>> > > > > > >>>>>>>>>>>>>> >>> >>> > > > > > >>>>>>>>>>>>>> The question is for instance how can users >>> >>> create >>> >>> > > > > > >> such >>> >>> > > > > > >>>> a >>> >>> > > > > > >>>>>>> logic >>> >>> > > > > > >>>>>>>>> that >>> >>> > > > > > >>>>>>>>>>>> tells >>> >>> > > > > > >>>>>>>>>>>>>> them what is >>> >>> > > > > > >>>>>>>>>>>>>> the difference between multiple savepoints? >>> >>> > > > > > >>>>>>>>>>>>>> Just to give some examples: >>> >>> > > > > > >>>>>>>>>>>>>> * per operator size changes between >>> savepoints >>> >>> > > > > > >>>>>>>>>>>>>> * show values from operator data where state >>> >>> size >>> >>> > > > > > >>>> reaches >>> >>> > > > > > >>>>> a >>> >>> > > > > > >>>>>>>>>> boundary >>> >>> > > > > > >>>>>>>>>>>>>> * in general "find which checkpoint ruined >>> >>> things" >>> >>> > is >>> >>> > > > > > >>>>> quite >>> >>> > > > > > >>>>>>>>> common >>> >>> > > > > > >>>>>>>>>>>>> pattern >>> >>> > > > > > >>>>>>>>>>>>>> What I would like to highlight here is that >>> from >>> >>> > > > > > >> Flink >>> >>> > > > > > >>>>>>> point of >>> >>> > > > > > >>>>>>>>>> view >>> >>> > > > > > >>>>>>>>>>>> the >>> >>> > > > > > >>>>>>>>>>>>>> metadata can be >>> >>> > > > > > >>>>>>>>>>>>>> considered as a static side output >>> information >>> >>> but >>> >>> > > > > > >> for >>> >>> > > > > > >>>>> users >>> >>> > > > > > >>>>>>>>> these >>> >>> > > > > > >>>>>>>>>>>> values >>> >>> > > > > > >>>>>>>>>>>>>> are actual real data >>> >>> > > > > > >>>>>>>>>>>>>> where logic is planned to build around. >>> >>> > > > > > >>>>>>>>>>>>>> >>> >>> > > > > > >>>>>>>>>>>>>>> The metadata is more like one-time >>> information >>> >>> > > > > > >>>> instead >>> >>> > > > > > >>>>> of >>> >>> > > > > > >>>>>>> a >>> >>> > > > > > >>>>>>>>>>> streaming >>> >>> > > > > > >>>>>>>>>>>>>> data that changes all >>> >>> > > > > > >>>>>>>>>>>>>> the time, so a single connector seems to be >>> an >>> >>> > > > > > >>>> overkill. >>> >>> > > > > > >>>>>>>>>>>>>> >>> >>> > > > > > >>>>>>>>>>>>>> State data is also static within a >>> savepoint and >>> >>> > > > > > >> that's >>> >>> > > > > > >>>>> the >>> >>> > > > > > >>>>>>>>> reason >>> >>> > > > > > >>>>>>>>>>> why >>> >>> > > > > > >>>>>>>>>>>>> the >>> >>> > > > > > >>>>>>>>>>>>>> state processor API is working in batch >>> mode. >>> >>> > > > > > >>>>>>>>>>>>>> When we handle multiple checkpoints in a >>> >>> streaming >>> >>> > > > > > >>>> fashion >>> >>> > > > > > >>>>>>> then >>> >>> > > > > > >>>>>>>>>> this >>> >>> > > > > > >>>>>>>>>>>> can >>> >>> > > > > > >>>>>>>>>>>>> be >>> >>> > > > > > >>>>>>>>>>>>>> viewed from another angle. >>> >>> > > > > > >>>>>>>>>>>>>> >>> >>> > > > > > >>>>>>>>>>>>>> We can come up with more lightweight >>> solution >>> >>> other >>> >>> > > > > > >>>> than a >>> >>> > > > > > >>>>>>> new >>> >>> > > > > > >>>>>>>>>>>> connector >>> >>> > > > > > >>>>>>>>>>>>>> but enforcing users to parse the catalog >>> >>> > > > > > >>>>>>>>>>>>>> describe command output in order to compare >>> >>> multiple >>> >>> > > > > > >>>>>>> savepoints >>> >>> > > > > > >>>>>>>>>>> doesn't >>> >>> > > > > > >>>>>>>>>>>>>> sound smooth user experience. >>> >>> > > > > > >>>>>>>>>>>>>> Honestly I've no other idea how exposing >>> >>> metadata as >>> >>> > > > > > >>>> real >>> >>> > > > > > >>>>>>> user >>> >>> > > > > > >>>>>>>>> data >>> >>> > > > > > >>>>>>>>>>> so >>> >>> > > > > > >>>>>>>>>>>>>> waiting on other approaches. >>> >>> > > > > > >>>>>>>>>>>>>> >>> >>> > > > > > >>>>>>>>>>>>>> BR, >>> >>> > > > > > >>>>>>>>>>>>>> G >>> >>> > > > > > >>>>>>>>>>>>>> >>> >>> > > > > > >>>>>>>>>>>>>> >>> >>> > > > > > >>>>>>>>>>>>>> On Thu, Mar 13, 2025 at 2:44 AM Shengkai >>> Fang < >>> >>> > > > > > >>>>>>>> fskm...@gmail.com >>> >>> > > > > > >>>>>>>>>> >>> >>> > > > > > >>>>>>>>>>>> wrote: >>> >>> > > > > > >>>>>>>>>>>>>> >>> >>> > > > > > >>>>>>>>>>>>>>> Looking forward to hearing the good news! >>> >>> > > > > > >>>>>>>>>>>>>>> >>> >>> > > > > > >>>>>>>>>>>>>>> Best, >>> >>> > > > > > >>>>>>>>>>>>>>> Shengkai >>> >>> > > > > > >>>>>>>>>>>>>>> >>> >>> > > > > > >>>>>>>>>>>>>>> Gabor Somogyi <gabor.g.somo...@gmail.com> >>> >>> > > > > > >>>> 于2025年3月12日周三 >>> >>> > > > > > >>>>>>>>> 22:24写道: >>> >>> > > > > > >>>>>>>>>>>>>>> >>> >>> > > > > > >>>>>>>>>>>>>>>> Thanks for both the valuable input! >>> >>> > > > > > >>>>>>>>>>>>>>>> >>> >>> > > > > > >>>>>>>>>>>>>>>> Let me take a closer look at the >>> suggestions, >>> >>> > > > > > >> like >>> >>> > > > > > >>>> the >>> >>> > > > > > >>>>>>>>> Catalog >>> >>> > > > > > >>>>>>>>>>>>>>> capabilities >>> >>> > > > > > >>>>>>>>>>>>>>>> and possibility of embedding >>> TypeInformation >>> >>> or >>> >>> > > > > > >>>>>>>>>>>>>>>> StateDescriptor metadata directly into >>> the raw >>> >>> > > > > > >>>> state >>> >>> > > > > > >>>>>>>> files... >>> >>> > > > > > >>>>>>>>>>>>>>>> >>> >>> > > > > > >>>>>>>>>>>>>>>> BR, >>> >>> > > > > > >>>>>>>>>>>>>>>> G >>> >>> > > > > > >>>>>>>>>>>>>>>> >>> >>> > > > > > >>>>>>>>>>>>>>>> >>> >>> > > > > > >>>>>>>>>>>>>>>> On Wed, Mar 12, 2025 at 8:17 AM Shengkai >>> Fang >>> >>> < >>> >>> > > > > > >>>>>>>>>> fskm...@gmail.com >>> >>> > > > > > >>>>>>>>>>>> >>> >>> > > > > > >>>>>>>>>>>>>> wrote: >>> >>> > > > > > >>>>>>>>>>>>>>>> >>> >>> > > > > > >>>>>>>>>>>>>>>>> Thanks for Zakelly's clarification. >>> >>> > > > > > >>>>>>>>>>>>>>>>> >>> >>> > > > > > >>>>>>>>>>>>>>>>> 1. State TTL for Value Columns >>> >>> > > > > > >>>>>>>>>>>>>>>>> >>> >>> > > > > > >>>>>>>>>>>>>>>>> +1 to delay the discussion about this. >>> >>> > > > > > >>>>>>>>>>>>>>>>> >>> >>> > > > > > >>>>>>>>>>>>>>>>> 2. Metadata Table vs. Metadata Column >>> >>> > > > > > >>>>>>>>>>>>>>>>> >>> >>> > > > > > >>>>>>>>>>>>>>>>> I’d like to share my perspective on the >>> State >>> >>> > > > > > >>>>> Catalog >>> >>> > > > > > >>>>>>>>>> proposal. >>> >>> > > > > > >>>>>>>>>>>>> While >>> >>> > > > > > >>>>>>>>>>>>>>>>> introducing this capability is >>> beneficial, >>> >>> > > > > > >> there >>> >>> > > > > > >>>> is >>> >>> > > > > > >>>>> a >>> >>> > > > > > >>>>>>>>>> blocker: >>> >>> > > > > > >>>>>>>>>>>> the >>> >>> > > > > > >>>>>>>>>>>>>>>> current >>> >>> > > > > > >>>>>>>>>>>>>>>>> StateBackend architecture does not permit >>> >>> > > > > > >>>> operators >>> >>> > > > > > >>>>> to >>> >>> > > > > > >>>>>>>>> encode >>> >>> > > > > > >>>>>>>>>>>>>>>>> TypeInformation into the state—it only >>> >>> > > > > > >> preserves >>> >>> > > > > > >>>> the >>> >>> > > > > > >>>>>>>>>>> Serializer. >>> >>> > > > > > >>>>>>>>>>>>> This >>> >>> > > > > > >>>>>>>>>>>>>>>>> limitation creates an asymmetry, as >>> operators >>> >>> > > > > > >>>> alone >>> >>> > > > > > >>>>>>>> retain >>> >>> > > > > > >>>>>>>>>>>>> knowledge >>> >>> > > > > > >>>>>>>>>>>>>> of >>> >>> > > > > > >>>>>>>>>>>>>>>> the >>> >>> > > > > > >>>>>>>>>>>>>>>>> data structure’s schema. >>> >>> > > > > > >>>>>>>>>>>>>>>>> >>> >>> > > > > > >>>>>>>>>>>>>>>>> To address this, I suggest allowing >>> operators >>> >>> > > > > > >> to >>> >>> > > > > > >>>>> embed >>> >>> > > > > > >>>>>>>>>>>>>> TypeInformation >>> >>> > > > > > >>>>>>>>>>>>>>> or >>> >>> > > > > > >>>>>>>>>>>>>>>>> StateDescriptor metadata directly into >>> the >>> >>> raw >>> >>> > > > > > >>>> state >>> >>> > > > > > >>>>>>>> files. >>> >>> > > > > > >>>>>>>>>>> Such >>> >>> > > > > > >>>>>>>>>>>> a >>> >>> > > > > > >>>>>>>>>>>>>>> design >>> >>> > > > > > >>>>>>>>>>>>>>>>> would enable the Catalog to: >>> >>> > > > > > >>>>>>>>>>>>>>>>> >>> >>> > > > > > >>>>>>>>>>>>>>>>> 1. Parse state files and programmatically >>> >>> > > > > > >> derive >>> >>> > > > > > >>>> the >>> >>> > > > > > >>>>>>>> schema >>> >>> > > > > > >>>>>>>>>> and >>> >>> > > > > > >>>>>>>>>>>>>>>> structural >>> >>> > > > > > >>>>>>>>>>>>>>>>> guarantees for each state. >>> >>> > > > > > >>>>>>>>>>>>>>>>> 2. Leverage existing Flink Table >>> utilities, >>> >>> > > > > > >> such >>> >>> > > > > > >>>> as >>> >>> > > > > > >>>>>>>>>>>>>>>>> LegacyTypeInfoDataTypeConverter (in >>> >>> > > > > > >>>>>>>>>>>>>>> org.apache.flink.table.types.utils), >>> >>> > > > > > >>>>>>>>>>>>>>>> to >>> >>> > > > > > >>>>>>>>>>>>>>>>> bridge TypeInformation and DataType >>> >>> > > > > > >> conversions. >>> >>> > > > > > >>>>>>>>>>>>>>>>> >>> >>> > > > > > >>>>>>>>>>>>>>>>> If we can not store the TypeInformation >>> or >>> >>> > > > > > >>>>>>>> StateDescriptor >>> >>> > > > > > >>>>>>>>>> into >>> >>> > > > > > >>>>>>>>>>>> the >>> >>> > > > > > >>>>>>>>>>>>>> raw >>> >>> > > > > > >>>>>>>>>>>>>>>>> state files, I am +1 for this FLIP to use >>> >>> > > > > > >>>> metadata >>> >>> > > > > > >>>>>>> column >>> >>> > > > > > >>>>>>>>> to >>> >>> > > > > > >>>>>>>>>>>>> retrieve >>> >>> > > > > > >>>>>>>>>>>>>>>>> information. >>> >>> > > > > > >>>>>>>>>>>>>>>>> >>> >>> > > > > > >>>>>>>>>>>>>>>>> Best, >>> >>> > > > > > >>>>>>>>>>>>>>>>> Shengkai >>> >>> > > > > > >>>>>>>>>>>>>>>>> >>> >>> > > > > > >>>>>>>>>>>>>>>>> >>> >>> > > > > > >>>>>>>>>>>>>>>>> >>> >>> > > > > > >>>>>>>>>>>>>>>>> >>> >>> > > > > > >>>>>>>>>>>>>>>>> Zakelly Lan <zakelly....@gmail.com> >>> >>> > > > > > >>>> 于2025年3月12日周三 >>> >>> > > > > > >>>>>>>> 12:43写道: >>> >>> > > > > > >>>>>>>>>>>>>>>>> >>> >>> > > > > > >>>>>>>>>>>>>>>>>> Hi Gabor and Shengkai, >>> >>> > > > > > >>>>>>>>>>>>>>>>>> >>> >>> > > > > > >>>>>>>>>>>>>>>>>> Thanks for sharing your thoughts! This >>> is a >>> >>> > > > > > >>>> long >>> >>> > > > > > >>>>>>>>> discussion >>> >>> > > > > > >>>>>>>>>>> and >>> >>> > > > > > >>>>>>>>>>>>>> sorry >>> >>> > > > > > >>>>>>>>>>>>>>>> for >>> >>> > > > > > >>>>>>>>>>>>>>>>>> the late reply (I'm busy catching up >>> with >>> >>> > > > > > >>>> release >>> >>> > > > > > >>>>>>> 2.0 >>> >>> > > > > > >>>>>>>>> these >>> >>> > > > > > >>>>>>>>>>>>> days). >>> >>> > > > > > >>>>>>>>>>>>>>>>>> >>> >>> > > > > > >>>>>>>>>>>>>>>>>> 1. State TTL for Value Columns >>> >>> > > > > > >>>>>>>>>>>>>>>>>> >>> >>> > > > > > >>>>>>>>>>>>>>>>>> >>> >>> > > > > > >>>>>>>>>>>>>>>>>> Let me first clarify your thoughts to >>> ensure >>> >>> > > > > > >> I >>> >>> > > > > > >>>>>>>> understand >>> >>> > > > > > >>>>>>>>>>>>>> correctly. >>> >>> > > > > > >>>>>>>>>>>>>>>>> IIUC, >>> >>> > > > > > >>>>>>>>>>>>>>>>>> there is no persistent configuration for >>> >>> > > > > > >> state >>> >>> > > > > > >>>> TTL >>> >>> > > > > > >>>>>>> in >>> >>> > > > > > >>>>>>>> the >>> >>> > > > > > >>>>>>>>>>>>>> checkpoint. >>> >>> > > > > > >>>>>>>>>>>>>>>>> While >>> >>> > > > > > >>>>>>>>>>>>>>>>>> you can infer that TTL is enabled by >>> reading >>> >>> > > > > > >>>> the >>> >>> > > > > > >>>>>>>>>> serializer, >>> >>> > > > > > >>>>>>>>>>>> the >>> >>> > > > > > >>>>>>>>>>>>>>>>> checkpoint >>> >>> > > > > > >>>>>>>>>>>>>>>>>> itself only stores the last access time >>> for >>> >>> > > > > > >>>> each >>> >>> > > > > > >>>>>>> value. >>> >>> > > > > > >>>>>>>>> So >>> >>> > > > > > >>>>>>>>>>> the >>> >>> > > > > > >>>>>>>>>>>>> only >>> >>> > > > > > >>>>>>>>>>>>>>>> thing >>> >>> > > > > > >>>>>>>>>>>>>>>>>> we can show is the last access time for >>> each >>> >>> > > > > > >>>>> value. >>> >>> > > > > > >>>>>>> But >>> >>> > > > > > >>>>>>>>> it >>> >>> > > > > > >>>>>>>>>> is >>> >>> > > > > > >>>>>>>>>>>> not >>> >>> > > > > > >>>>>>>>>>>>>>>>> required >>> >>> > > > > > >>>>>>>>>>>>>>>>>> for all state backends to store this, as >>> >>> they >>> >>> > > > > > >>>> may >>> >>> > > > > > >>>>>>>>> directly >>> >>> > > > > > >>>>>>>>>>>> store >>> >>> > > > > > >>>>>>>>>>>>>> the >>> >>> > > > > > >>>>>>>>>>>>>>>>>> expired time. This will also increase >>> the >>> >>> > > > > > >>>>>>> difficulty of >>> >>> > > > > > >>>>>>>>>>>>>>> implementation >>> >>> > > > > > >>>>>>>>>>>>>>>> & >>> >>> > > > > > >>>>>>>>>>>>>>>>>> maintenance. >>> >>> > > > > > >>>>>>>>>>>>>>>>>> >>> >>> > > > > > >>>>>>>>>>>>>>>>>> This once again reiterates the >>> importance of >>> >>> > > > > > >>>>> unified >>> >>> > > > > > >>>>>>>>>> metadata >>> >>> > > > > > >>>>>>>>>>>> for >>> >>> > > > > > >>>>>>>>>>>>>>>>>> checkpoints. I’m planning on adding >>> this, >>> >>> and >>> >>> > > > > > >>>> we >>> >>> > > > > > >>>>> may >>> >>> > > > > > >>>>>>>>>>>> collaborate >>> >>> > > > > > >>>>>>>>>>>>> on >>> >>> > > > > > >>>>>>>>>>>>>>> it >>> >>> > > > > > >>>>>>>>>>>>>>>> in >>> >>> > > > > > >>>>>>>>>>>>>>>>>> the future. >>> >>> > > > > > >>>>>>>>>>>>>>>>>> >>> >>> > > > > > >>>>>>>>>>>>>>>>>> 2. Metadata Table vs. Metadata Column >>> >>> > > > > > >>>>>>>>>>>>>>>>>> >>> >>> > > > > > >>>>>>>>>>>>>>>>>> >>> >>> > > > > > >>>>>>>>>>>>>>>>>> I'm not in favor of adding a new >>> connector >>> >>> > > > > > >> for >>> >>> > > > > > >>>>>>>> metadata. >>> >>> > > > > > >>>>>>>>>> The >>> >>> > > > > > >>>>>>>>>>>>>> metadata >>> >>> > > > > > >>>>>>>>>>>>>>>> is >>> >>> > > > > > >>>>>>>>>>>>>>>>>> more like one-time information instead >>> of a >>> >>> > > > > > >>>>>>> streaming >>> >>> > > > > > >>>>>>>>> data >>> >>> > > > > > >>>>>>>>>>> that >>> >>> > > > > > >>>>>>>>>>>>>>> changes >>> >>> > > > > > >>>>>>>>>>>>>>>>> all >>> >>> > > > > > >>>>>>>>>>>>>>>>>> the time, so a single connector seems >>> to be >>> >>> > > > > > >> an >>> >>> > > > > > >>>>>>>> overkill. >>> >>> > > > > > >>>>>>>>> It >>> >>> > > > > > >>>>>>>>>>> is >>> >>> > > > > > >>>>>>>>>>>>> not >>> >>> > > > > > >>>>>>>>>>>>>>> easy >>> >>> > > > > > >>>>>>>>>>>>>>>>> to >>> >>> > > > > > >>>>>>>>>>>>>>>>>> withdraw a connector if we have a better >>> >>> > > > > > >>>> solution >>> >>> > > > > > >>>>> in >>> >>> > > > > > >>>>>>>>>> future. >>> >>> > > > > > >>>>>>>>>>>> I'm >>> >>> > > > > > >>>>>>>>>>>>>> not >>> >>> > > > > > >>>>>>>>>>>>>>>>>> familiar with current Catalog >>> capabilities, >>> >>> > > > > > >>>> and if >>> >>> > > > > > >>>>>>> it >>> >>> > > > > > >>>>>>>>> could >>> >>> > > > > > >>>>>>>>>>>>> extract >>> >>> > > > > > >>>>>>>>>>>>>>> and >>> >>> > > > > > >>>>>>>>>>>>>>>>>> show some operator-level information >>> from >>> >>> > > > > > >>>>> savepoint, >>> >>> > > > > > >>>>>>>> that >>> >>> > > > > > >>>>>>>>>>> would >>> >>> > > > > > >>>>>>>>>>>>> be >>> >>> > > > > > >>>>>>>>>>>>>>>> great. >>> >>> > > > > > >>>>>>>>>>>>>>>>>> >>> >>> > > > > > >>>>>>>>>>>>>>>>>> If the Catalog can't do that, I would >>> >>> > > > > > >> consider >>> >>> > > > > > >>>> the >>> >>> > > > > > >>>>>>>>> current >>> >>> > > > > > >>>>>>>>>>> FLIP >>> >>> > > > > > >>>>>>>>>>>>> to >>> >>> > > > > > >>>>>>>>>>>>>>> be a >>> >>> > > > > > >>>>>>>>>>>>>>>>>> compromise solution. >>> >>> > > > > > >>>>>>>>>>>>>>>>>> >>> >>> > > > > > >>>>>>>>>>>>>>>>>> And if we have that unified metadata for >>> >>> > > > > > >>>>>>>>>> checkpoint/savepoint >>> >>> > > > > > >>>>>>>>>>>> in >>> >>> > > > > > >>>>>>>>>>>>>>>> future, >>> >>> > > > > > >>>>>>>>>>>>>>>>> we >>> >>> > > > > > >>>>>>>>>>>>>>>>>> may directly register savepoint in >>> catalog, >>> >>> > > > > > >> and >>> >>> > > > > > >>>>>>> create >>> >>> > > > > > >>>>>>>> a >>> >>> > > > > > >>>>>>>>>>> source >>> >>> > > > > > >>>>>>>>>>>>>>> without >>> >>> > > > > > >>>>>>>>>>>>>>>>>> specifying complex columns, as well as >>> >>> > > > > > >> describe >>> >>> > > > > > >>>>> the >>> >>> > > > > > >>>>>>>>>> savepoint >>> >>> > > > > > >>>>>>>>>>>>>> catalog >>> >>> > > > > > >>>>>>>>>>>>>>>> to >>> >>> > > > > > >>>>>>>>>>>>>>>>>> get the metadata. That's a good >>> solution in >>> >>> > > > > > >> my >>> >>> > > > > > >>>>> mind. >>> >>> > > > > > >>>>>>>>>>>>>>>>>> >>> >>> > > > > > >>>>>>>>>>>>>>>>>> >>> >>> > > > > > >>>>>>>>>>>>>>>>>> Best, >>> >>> > > > > > >>>>>>>>>>>>>>>>>> Zakelly >>> >>> > > > > > >>>>>>>>>>>>>>>>>> >>> >>> > > > > > >>>>>>>>>>>>>>>>>> On Wed, Mar 12, 2025 at 10:35 AM >>> Shengkai >>> >>> > > > > > >> Fang >>> >>> > > > > > >>>> < >>> >>> > > > > > >>>>>>>>>>>>> fskm...@gmail.com> >>> >>> > > > > > >>>>>>>>>>>>>>>>> wrote: >>> >>> > > > > > >>>>>>>>>>>>>>>>>> >>> >>> > > > > > >>>>>>>>>>>>>>>>>>> Hi Gabor, >>> >>> > > > > > >>>>>>>>>>>>>>>>>>> >>> >>> > > > > > >>>>>>>>>>>>>>>>>>>> 2. Adding a new connector with >>> >>> > > > > > >>>>>>> `savepoint-metadata` >>> >>> > > > > > >>>>>>>>>>>>>>>>>>> >>> >>> > > > > > >>>>>>>>>>>>>>>>>>> I would argue against introducing a new >>> >>> > > > > > >>>>> connector >>> >>> > > > > > >>>>>>>> type >>> >>> > > > > > >>>>>>>>>>> named >>> >>> > > > > > >>>>>>>>>>>>>>>>>>> savepoint-metadata, as the existing >>> Catalog >>> >>> > > > > > >>>>>>> mechanism >>> >>> > > > > > >>>>>>>>> can >>> >>> > > > > > >>>>>>>>>>>>>>> inherently >>> >>> > > > > > >>>>>>>>>>>>>>>>>>> provide the necessary connector factory >>> >>> > > > > > >>>>>>> capabilities. >>> >>> > > > > > >>>>>>>>>> I’ve >>> >>> > > > > > >>>>>>>>>>>>>> detailed >>> >>> > > > > > >>>>>>>>>>>>>>>>> this >>> >>> > > > > > >>>>>>>>>>>>>>>>>>> proposal in branch[1]. Please take a >>> moment >>> >>> > > > > > >>>> to >>> >>> > > > > > >>>>>>> review >>> >>> > > > > > >>>>>>>>> it. >>> >>> > > > > > >>>>>>>>>>>>>>>>>>> >>> >>> > > > > > >>>>>>>>>>>>>>>>>>> If we introduce a connector named >>> >>> > > > > > >>>>>>>> `savepoint-metadata`, >>> >>> > > > > > >>>>>>>>>> it >>> >>> > > > > > >>>>>>>>>>>>> means >>> >>> > > > > > >>>>>>>>>>>>>>> user >>> >>> > > > > > >>>>>>>>>>>>>>>>> can >>> >>> > > > > > >>>>>>>>>>>>>>>>>>> create a temporary table with connector >>> >>> > > > > > >>>>>>>>>>> `savepoint-metadata` >>> >>> > > > > > >>>>>>>>>>>>> and >>> >>> > > > > > >>>>>>>>>>>>>>> the >>> >>> > > > > > >>>>>>>>>>>>>>>>>>> connector needs to check whether table >>> >>> > > > > > >>>> schema is >>> >>> > > > > > >>>>>>> same >>> >>> > > > > > >>>>>>>>> to >>> >>> > > > > > >>>>>>>>>>> the >>> >>> > > > > > >>>>>>>>>>>>>> schema >>> >>> > > > > > >>>>>>>>>>>>>>>> we >>> >>> > > > > > >>>>>>>>>>>>>>>>>>> proposed in the FLIP. On the other >>> hand, >>> >>> > > > > > >> it's >>> >>> > > > > > >>>>> not >>> >>> > > > > > >>>>>>>> easy >>> >>> > > > > > >>>>>>>>>> work >>> >>> > > > > > >>>>>>>>>>>> for >>> >>> > > > > > >>>>>>>>>>>>>>>> others >>> >>> > > > > > >>>>>>>>>>>>>>>>> to >>> >>> > > > > > >>>>>>>>>>>>>>>>>>> users a metadata table with same >>> schema. >>> >>> > > > > > >>>>>>>>>>>>>>>>>>> >>> >>> > > > > > >>>>>>>>>>>>>>>>>>> [1] >>> >>> > > > > > >>>>>>>>>>>>>>>>>>> >>> >>> > > > > > >>>>>>>>>>>>>>>>>>> >>> >>> > > > > > >>>>>>>>>>>>>>>>>> >>> >>> > > > > > >>>>>>>>>>>>>>>>> >>> >>> > > > > > >>>>>>>>>>>>>>>> >>> >>> > > > > > >>>>>>>>>>>>>>> >>> >>> > > > > > >>>>>>>>>>>>>> >>> >>> > > > > > >>>>>>>>>>>>> >>> >>> > > > > > >>>>>>>>>>>> >>> >>> > > > > > >>>>>>>>>>> >>> >>> > > > > > >>>>>>>>>> >>> >>> > > > > > >>>>>>>>> >>> >>> > > > > > >>>>>>>> >>> >>> > > > > > >>>>>>> >>> >>> > > > > > >>>>> >>> >>> > > > > > >>>> >>> >>> > > > > > >> >>> >>> > > > > > >>> >>> > > > > >>> >>> > > > >>> >>> > > >>> >>> > >>> >>> >>> https://github.com/apache/flink/compare/master...fsk119:flink:state-metadata?expand=1#diff-712a7bc92fe46c405fb0e61b475bb2a005cb7a72bab7df28bbb92744bcb5f465R63 >>> >>> > > > > > >>>>>>>>>>>>>>>>>>> >>> >>> > > > > > >>>>>>>>>>>>>>>>>>> Best, >>> >>> > > > > > >>>>>>>>>>>>>>>>>>> Shengkai >>> >>> > > > > > >>>>>>>>>>>>>>>>>>> >>> >>> > > > > > >>>>>>>>>>>>>>>>>>> Gabor Somogyi < >>> gabor.g.somo...@gmail.com> >>> >>> > > > > > >>>>>>>>> 于2025年3月11日周二 >>> >>> > > > > > >>>>>>>>>>>>> 16:56写道: >>> >>> > > > > > >>>>>>>>>>>>>>>>>>> >>> >>> > > > > > >>>>>>>>>>>>>>>>>>>> Hi Shengkai, >>> >>> > > > > > >>>>>>>>>>>>>>>>>>>> >>> >>> > > > > > >>>>>>>>>>>>>>>>>>>>> 1. State TTL for Value Columns >>> >>> > > > > > >>>>>>>>>>>>>>>>>>>> >>> >>> > > > > > >>>>>>>>>>>>>>>>>>>> From directional perspective I agree >>> your >>> >>> > > > > > >>>> idea >>> >>> > > > > > >>>>>>> how >>> >>> > > > > > >>>>>>>> it >>> >>> > > > > > >>>>>>>>>> can >>> >>> > > > > > >>>>>>>>>>>> be >>> >>> > > > > > >>>>>>>>>>>>>>>>>> implemented. >>> >>> > > > > > >>>>>>>>>>>>>>>>>>>> Previously I've mentioned that TTL >>> >>> > > > > > >>>> information >>> >>> > > > > > >>>>>>> is >>> >>> > > > > > >>>>>>>> not >>> >>> > > > > > >>>>>>>>>>>> exposed >>> >>> > > > > > >>>>>>>>>>>>>> on >>> >>> > > > > > >>>>>>>>>>>>>>>> the >>> >>> > > > > > >>>>>>>>>>>>>>>>>>> state >>> >>> > > > > > >>>>>>>>>>>>>>>>>>>> processor API (which the SQL state >>> >>> > > > > > >>>> connector >>> >>> > > > > > >>>>>>> uses >>> >>> > > > > > >>>>>>>> to >>> >>> > > > > > >>>>>>>>>> read >>> >>> > > > > > >>>>>>>>>>>>> data) >>> >>> > > > > > >>>>>>>>>>>>>>>>>>>> and unless somebody show me the >>> opposite >>> >>> > > > > > >>>> this >>> >>> > > > > > >>>>>>> FLIP >>> >>> > > > > > >>>>>>>> is >>> >>> > > > > > >>>>>>>>>> not >>> >>> > > > > > >>>>>>>>>>>>> going >>> >>> > > > > > >>>>>>>>>>>>>>> to >>> >>> > > > > > >>>>>>>>>>>>>>>>>>> address >>> >>> > > > > > >>>>>>>>>>>>>>>>>>>> this to avoid feature creep. Our users >>> >>> > > > > > >> are >>> >>> > > > > > >>>>> also >>> >>> > > > > > >>>>>>>>>>> interested >>> >>> > > > > > >>>>>>>>>>>> in >>> >>> > > > > > >>>>>>>>>>>>>> TTL >>> >>> > > > > > >>>>>>>>>>>>>>>> so >>> >>> > > > > > >>>>>>>>>>>>>>>>>>>> sooner or later we're going to expose >>> it, >>> >>> > > > > > >>>> this >>> >>> > > > > > >>>>>>> is >>> >>> > > > > > >>>>>>>>>> matter >>> >>> > > > > > >>>>>>>>>>> of >>> >>> > > > > > >>>>>>>>>>>>>>>>> scheduling. >>> >>> > > > > > >>>>>>>>>>>>>>>>>>>> >>> >>> > > > > > >>>>>>>>>>>>>>>>>>>>> 2. Adding a new connector with >>> >>> > > > > > >>>>>>>> `savepoint-metadata` >>> >>> > > > > > >>>>>>>>>>>>>>>>>>>> >>> >>> > > > > > >>>>>>>>>>>>>>>>>>>> Not sure I understand your point at >>> all >>> >>> > > > > > >>>>> related >>> >>> > > > > > >>>>>>>>>>>> StateCatalog. >>> >>> > > > > > >>>>>>>>>>>>>>> First >>> >>> > > > > > >>>>>>>>>>>>>>>>> of >>> >>> > > > > > >>>>>>>>>>>>>>>>>>> all >>> >>> > > > > > >>>>>>>>>>>>>>>>>>>> I can't agree more that StateCatalog >>> is >>> >>> > > > > > >>>> needed >>> >>> > > > > > >>>>>>> and >>> >>> > > > > > >>>>>>>>> is a >>> >>> > > > > > >>>>>>>>>>>>> planned >>> >>> > > > > > >>>>>>>>>>>>>>>>>> building >>> >>> > > > > > >>>>>>>>>>>>>>>>>>>> block in an upcoming >>> >>> > > > > > >>>>>>>>>>>>>>>>>>>> FLIP but not sure how can it help >>> now? No >>> >>> > > > > > >>>>> matter >>> >>> > > > > > >>>>>>>>> what, >>> >>> > > > > > >>>>>>>>>>> your >>> >>> > > > > > >>>>>>>>>>>>>>>> knowledge >>> >>> > > > > > >>>>>>>>>>>>>>>>>> is >>> >>> > > > > > >>>>>>>>>>>>>>>>>>>> essential when we add StateCatalog. >>> Let >>> >>> > > > > > >> me >>> >>> > > > > > >>>>>>> expose >>> >>> > > > > > >>>>>>>> my >>> >>> > > > > > >>>>>>>>>>>>>>> understanding >>> >>> > > > > > >>>>>>>>>>>>>>>> in >>> >>> > > > > > >>>>>>>>>>>>>>>>>>> this >>> >>> > > > > > >>>>>>>>>>>>>>>>>>>> area: >>> >>> > > > > > >>>>>>>>>>>>>>>>>>>> * First we need create table >>> statements >>> >>> > > > > > >> to >>> >>> > > > > > >>>>>>> access >>> >>> > > > > > >>>>>>>>> state >>> >>> > > > > > >>>>>>>>>>>> data >>> >>> > > > > > >>>>>>>>>>>>>> and >>> >>> > > > > > >>>>>>>>>>>>>>>>>> metadata >>> >>> > > > > > >>>>>>>>>>>>>>>>>>>> * When we have that then we can add >>> >>> > > > > > >>>>> StateCatalog >>> >>> > > > > > >>>>>>>>> which >>> >>> > > > > > >>>>>>>>>>>> could >>> >>> > > > > > >>>>>>>>>>>>>>>>>> potentially >>> >>> > > > > > >>>>>>>>>>>>>>>>>>>> ease the life of users by for ex. >>> giving >>> >>> > > > > > >>>>>>>>> off-the-shelf >>> >>> > > > > > >>>>>>>>>>>> tables >>> >>> > > > > > >>>>>>>>>>>>>>>> without >>> >>> > > > > > >>>>>>>>>>>>>>>>>>>> sweating with create table statements >>> >>> > > > > > >>>>>>>>>>>>>>>>>>>> >>> >>> > > > > > >>>>>>>>>>>>>>>>>>>> User expectations: >>> >>> > > > > > >>>>>>>>>>>>>>>>>>>> * See state data (this is fulfilled >>> with >>> >>> > > > > > >>>> the >>> >>> > > > > > >>>>>>>> existing >>> >>> > > > > > >>>>>>>>>>>>>> connector) >>> >>> > > > > > >>>>>>>>>>>>>>>>>>>> * See metadata about state data like >>> TTL >>> >>> > > > > > >>>> (this >>> >>> > > > > > >>>>>>> can >>> >>> > > > > > >>>>>>>> be >>> >>> > > > > > >>>>>>>>>>> added >>> >>> > > > > > >>>>>>>>>>>>> as >>> >>> > > > > > >>>>>>>>>>>>>>>>> metadata >>> >>> > > > > > >>>>>>>>>>>>>>>>>>>> column as you suggested since it >>> belongs >>> >>> > > > > > >> to >>> >>> > > > > > >>>>> the >>> >>> > > > > > >>>>>>>> data) >>> >>> > > > > > >>>>>>>>>>>>>>>>>>>> * See metadata about operators (this >>> can >>> >>> > > > > > >> be >>> >>> > > > > > >>>>>>> added >>> >>> > > > > > >>>>>>>>> from >>> >>> > > > > > >>>>>>>>>>>>>>>>>>> savepoint-metadata) >>> >>> > > > > > >>>>>>>>>>>>>>>>>>>> >>> >>> > > > > > >>>>>>>>>>>>>>>>>>>> Important to highlight that state data >>> >>> > > > > > >>>> table >>> >>> > > > > > >>>>>>> format >>> >>> > > > > > >>>>>>>>>>> differs >>> >>> > > > > > >>>>>>>>>>>>>> from >>> >>> > > > > > >>>>>>>>>>>>>>>>> state >>> >>> > > > > > >>>>>>>>>>>>>>>>>>>> metadata table format. Namely one >>> table >>> >>> > > > > > >> has >>> >>> > > > > > >>>>> rows >>> >>> > > > > > >>>>>>>> for >>> >>> > > > > > >>>>>>>>>>> state >>> >>> > > > > > >>>>>>>>>>>>>> values >>> >>> > > > > > >>>>>>>>>>>>>>>> and >>> >>> > > > > > >>>>>>>>>>>>>>>>>>>> another has rows for operators, right? >>> >>> > > > > > >>>>>>>>>>>>>>>>>>>> I think that's the reason why you've >>> >>> > > > > > >>>>> pinpointed >>> >>> > > > > > >>>>>>> out >>> >>> > > > > > >>>>>>>>>> that >>> >>> > > > > > >>>>>>>>>>>> the >>> >>> > > > > > >>>>>>>>>>>>>>>>> suggested >>> >>> > > > > > >>>>>>>>>>>>>>>>>>>> metadata columns are somewhat clunky. >>> >>> > > > > > >>>>>>>>>>>>>>>>>>>> >>> >>> > > > > > >>>>>>>>>>>>>>>>>>>> As a conclusion I agree to add >>> >>> > > > > > >>>>> ${state-name}_ttl >>> >>> > > > > > >>>>>>>>>> metadata >>> >>> > > > > > >>>>>>>>>>>>>> column >>> >>> > > > > > >>>>>>>>>>>>>>>>> later >>> >>> > > > > > >>>>>>>>>>>>>>>>>> on >>> >>> > > > > > >>>>>>>>>>>>>>>>>>>> since it belongs to the state value >>> and >>> >>> > > > > > >>>>> adding a >>> >>> > > > > > >>>>>>>> new >>> >>> > > > > > >>>>>>>>>>> table >>> >>> > > > > > >>>>>>>>>>>>> type >>> >>> > > > > > >>>>>>>>>>>>>>>> (like >>> >>> > > > > > >>>>>>>>>>>>>>>>>> you >>> >>> > > > > > >>>>>>>>>>>>>>>>>>>> suggested similar to PG [1]) >>> >>> > > > > > >>>>>>>>>>>>>>>>>>>> for metadata. Please see how Spark >>> does >>> >>> > > > > > >>>> that >>> >>> > > > > > >>>>> too >>> >>> > > > > > >>>>>>>> [2]. >>> >>> > > > > > >>>>>>>>>>>>>>>>>>>> >>> >>> > > > > > >>>>>>>>>>>>>>>>>>>> If you have better approach then >>> please >>> >>> > > > > > >>>>>>> elaborate >>> >>> > > > > > >>>>>>>>> with >>> >>> > > > > > >>>>>>>>>>> more >>> >>> > > > > > >>>>>>>>>>>>>>> details >>> >>> > > > > > >>>>>>>>>>>>>>>>> and >>> >>> > > > > > >>>>>>>>>>>>>>>>>>>> help me to understand your point. >>> >>> > > > > > >>>>>>>>>>>>>>>>>>>> >>> >>> > > > > > >>>>>>>>>>>>>>>>>>>>> Up until now we've seen even in TB >>> >>> > > > > > >>>>> savepoints >>> >>> > > > > > >>>>>>>> that >>> >>> > > > > > >>>>>>>>>> the >>> >>> > > > > > >>>>>>>>>>>>> number >>> >>> > > > > > >>>>>>>>>>>>>>> of >>> >>> > > > > > >>>>>>>>>>>>>>>>> keys >>> >>> > > > > > >>>>>>>>>>>>>>>>>>> can >>> >>> > > > > > >>>>>>>>>>>>>>>>>>>>> be extremely huge but not the per key >>> >>> > > > > > >>>> state >>> >>> > > > > > >>>>>>>> itself. >>> >>> > > > > > >>>>>>>>>>>>>>>>>>>>> But again, this is a good feature >>> as-is >>> >>> > > > > > >>>> and >>> >>> > > > > > >>>>>>> can >>> >>> > > > > > >>>>>>>> be >>> >>> > > > > > >>>>>>>>>>>> handled >>> >>> > > > > > >>>>>>>>>>>>>> in a >>> >>> > > > > > >>>>>>>>>>>>>>>>>>> separate >>> >>> > > > > > >>>>>>>>>>>>>>>>>>>>> jira. >>> >>> > > > > > >>>>>>>>>>>>>>>>>>>> >>> >>> > > > > > >>>>>>>>>>>>>>>>>>>> I've just created >>> >>> > > > > > >>>>>>>>>>>>>>>> >>> >>> > > > > > >> https://issues.apache.org/jira/browse/FLINK-37456. >>> >>> > > > > > >>>>>>>>>>>>>>>>>>>> >>> >>> > > > > > >>>>>>>>>>>>>>>>>>>> [1] >>> >>> > > > > > >>>>>>>>>>>>>> >>> >>> > > > > > >>>>> >>> >>> https://www.postgresql.org/docs/current/view-pg-tables.html >>> >>> > > > > > >>>>>>>>>>>>>>>>>>>> [2] >>> >>> > > > > > >>>>>>>>>>>>>>>>>>>> >>> >>> > > > > > >>>>>>>>>>>>>>>>>>>> >>> >>> > > > > > >>>>>>>>>>>>>>>>>>> >>> >>> > > > > > >>>>>>>>>>>>>>>>>> >>> >>> > > > > > >>>>>>>>>>>>>>>>> >>> >>> > > > > > >>>>>>>>>>>>>>>> >>> >>> > > > > > >>>>>>>>>>>>>>> >>> >>> > > > > > >>>>>>>>>>>>>> >>> >>> > > > > > >>>>>>>>>>>>> >>> >>> > > > > > >>>>>>>>>>>> >>> >>> > > > > > >>>>>>>>>>> >>> >>> > > > > > >>>>>>>>>> >>> >>> > > > > > >>>>>>>>> >>> >>> > > > > > >>>>>>>> >>> >>> > > > > > >>>>>>> >>> >>> > > > > > >>>>> >>> >>> > > > > > >>>> >>> >>> > > > > > >> >>> >>> > > > > > >>> >>> > > > > >>> >>> > > > >>> >>> > > >>> >>> > >>> >>> >>> https://www.databricks.com/blog/announcing-state-reader-api-new-statestore-data-source >>> >>> > > > > > >>>>>>>>>>>>>>>>>>>> >>> >>> > > > > > >>>>>>>>>>>>>>>>>>>> BR, >>> >>> > > > > > >>>>>>>>>>>>>>>>>>>> G >>> >>> > > > > > >>>>>>>>>>>>>>>>>>>> >>> >>> > > > > > >>>>>>>>>>>>>>>>>>>> >>> >>> > > > > > >>>>>>>>>>>>>>>>>>>> On Tue, Mar 11, 2025 at 3:55 AM >>> Shengkai >>> >>> > > > > > >>>> Fang >>> >>> > > > > > >>>>> < >>> >>> > > > > > >>>>>>>>>>>>>> fskm...@gmail.com >>> >>> > > > > > >>>>>>>>>>>>>>>> >>> >>> > > > > > >>>>>>>>>>>>>>>>>> wrote: >>> >>> > > > > > >>>>>>>>>>>>>>>>>>>> >>> >>> > > > > > >>>>>>>>>>>>>>>>>>>>> Hi, Gabor. Thanks for your response. >>> >>> > > > > > >>>>>>>>>>>>>>>>>>>>> >>> >>> > > > > > >>>>>>>>>>>>>>>>>>>>>> 1. State TTL for Value Columns >>> >>> > > > > > >>>>>>>>>>>>>>>>>>>>> >>> >>> > > > > > >>>>>>>>>>>>>>>>>>>>> Thank you for addressing the >>> >>> > > > > > >> limitations >>> >>> > > > > > >>>>> here. >>> >>> > > > > > >>>>>>>>>>> However, I >>> >>> > > > > > >>>>>>>>>>>>>>> believe >>> >>> > > > > > >>>>>>>>>>>>>>>>> it >>> >>> > > > > > >>>>>>>>>>>>>>>>>>>> would >>> >>> > > > > > >>>>>>>>>>>>>>>>>>>>> be beneficial to further clarify the >>> >>> > > > > > >> API >>> >>> > > > > > >>>> in >>> >>> > > > > > >>>>>>> this >>> >>> > > > > > >>>>>>>>> FLIP >>> >>> > > > > > >>>>>>>>>>>>>> regarding >>> >>> > > > > > >>>>>>>>>>>>>>>> how >>> >>> > > > > > >>>>>>>>>>>>>>>>>>> users >>> >>> > > > > > >>>>>>>>>>>>>>>>>>>>> can specify the TTL column. >>> >>> > > > > > >>>>>>>>>>>>>>>>>>>>> >>> >>> > > > > > >>>>>>>>>>>>>>>>>>>>> One potential approach that comes to >>> >>> > > > > > >>>> mind is >>> >>> > > > > > >>>>>>>> using >>> >>> > > > > > >>>>>>>>> a >>> >>> > > > > > >>>>>>>>>>>>>>> standardized >>> >>> > > > > > >>>>>>>>>>>>>>>>>>> naming >>> >>> > > > > > >>>>>>>>>>>>>>>>>>>>> convention such as ${state-name}_ttl >>> >>> > > > > > >> for >>> >>> > > > > > >>>> the >>> >>> > > > > > >>>>>>>>> metadata >>> >>> > > > > > >>>>>>>>>>>>> column >>> >>> > > > > > >>>>>>>>>>>>>>> that >>> >>> > > > > > >>>>>>>>>>>>>>>>>>> defines >>> >>> > > > > > >>>>>>>>>>>>>>>>>>>>> the TTL value. In terms of >>> >>> > > > > > >>>> implementation, >>> >>> > > > > > >>>>> the >>> >>> > > > > > >>>>>>>>>>>>>>>> listReadableMetadata >>> >>> > > > > > >>>>>>>>>>>>>>>>>>>>> function could: >>> >>> > > > > > >>>>>>>>>>>>>>>>>>>>> >>> >>> > > > > > >>>>>>>>>>>>>>>>>>>>> 1. Read the table’s columns and >>> >>> > > > > > >>>>> configuration, >>> >>> > > > > > >>>>>>>>>>>>>>>>>>>>> 2. Extract all defined state names, >>> and >>> >>> > > > > > >>>>>>>>>>>>>>>>>>>>> 3. Return a structured list of >>> metadata >>> >>> > > > > > >>>>>>> entries >>> >>> > > > > > >>>>>>>>>>> formatted >>> >>> > > > > > >>>>>>>>>>>>> as >>> >>> > > > > > >>>>>>>>>>>>>>>>>>>>> ${state-name}_ttl. >>> >>> > > > > > >>>>>>>>>>>>>>>>>>>>> >>> >>> > > > > > >>>>>>>>>>>>>>>>>>>>> WDYT? >>> >>> > > > > > >>>>>>>>>>>>>>>>>>>>> >>> >>> > > > > > >>>>>>>>>>>>>>>>>>>>>> 2. Adding a new connector with >>> >>> > > > > > >>>>>>>>> `savepoint-metadata` >>> >>> > > > > > >>>>>>>>>>>>>>>>>>>>> >>> >>> > > > > > >>>>>>>>>>>>>>>>>>>>> Introducing a new connector type at >>> >>> > > > > > >> this >>> >>> > > > > > >>>>> stage >>> >>> > > > > > >>>>>>>> may >>> >>> > > > > > >>>>>>>>>>>>>>> unnecessarily >>> >>> > > > > > >>>>>>>>>>>>>>>>>>>> complicate >>> >>> > > > > > >>>>>>>>>>>>>>>>>>>>> the system. Given that every table >>> >>> > > > > > >>>> already >>> >>> > > > > > >>>>>>>> belongs >>> >>> > > > > > >>>>>>>>>> to a >>> >>> > > > > > >>>>>>>>>>>>>>> Catalog, >>> >>> > > > > > >>>>>>>>>>>>>>>>>> which >>> >>> > > > > > >>>>>>>>>>>>>>>>>>> is >>> >>> > > > > > >>>>>>>>>>>>>>>>>>>>> designed to provide a Factory for >>> >>> > > > > > >>>> building >>> >>> > > > > > >>>>>>> source >>> >>> > > > > > >>>>>>>>> or >>> >>> > > > > > >>>>>>>>>>> sink >>> >>> > > > > > >>>>>>>>>>>>>>>>>> connectors, I >>> >>> > > > > > >>>>>>>>>>>>>>>>>>>>> propose integrating a dedicated >>> >>> > > > > > >>>> StateCatalog >>> >>> > > > > > >>>>>>>>> instead. >>> >>> > > > > > >>>>>>>>>>>> This >>> >>> > > > > > >>>>>>>>>>>>>>>> approach >>> >>> > > > > > >>>>>>>>>>>>>>>>>>> would >>> >>> > > > > > >>>>>>>>>>>>>>>>>>>>> allow us to: >>> >>> > > > > > >>>>>>>>>>>>>>>>>>>>> >>> >>> > > > > > >>>>>>>>>>>>>>>>>>>>> 1. Leverage the Catalog’s existing >>> >>> > > > > > >>>>>>> capabilities >>> >>> > > > > > >>>>>>>> to >>> >>> > > > > > >>>>>>>>>>> manage >>> >>> > > > > > >>>>>>>>>>>>> TTL >>> >>> > > > > > >>>>>>>>>>>>>>>>>> metadata >>> >>> > > > > > >>>>>>>>>>>>>>>>>>>>> (e.g., state names and TTL logic) >>> >>> > > > > > >> without >>> >>> > > > > > >>>>>>>>> duplicating >>> >>> > > > > > >>>>>>>>>>>>>>>>> functionality. >>> >>> > > > > > >>>>>>>>>>>>>>>>>>>>> 2. Provide a unified interface for >>> >>> > > > > > >>>> connector >>> >>> > > > > > >>>>>>>>>>>> instantiation >>> >>> > > > > > >>>>>>>>>>>>>> and >>> >>> > > > > > >>>>>>>>>>>>>>>>>> metadata >>> >>> > > > > > >>>>>>>>>>>>>>>>>>>>> handling through the Catalog’s >>> Factory >>> >>> > > > > > >>>>>>> pattern. >>> >>> > > > > > >>>>>>>>>>>>>>>>>>>>> >>> >>> > > > > > >>>>>>>>>>>>>>>>>>>>> Would this design decision better >>> align >>> >>> > > > > > >>>> with >>> >>> > > > > > >>>>>>> our >>> >>> > > > > > >>>>>>>>>>>>>> architecture’s >>> >>> > > > > > >>>>>>>>>>>>>>>>>>>>> extensibility and reduce redundancy? >>> >>> > > > > > >>>>>>>>>>>>>>>>>>>>> >>> >>> > > > > > >>>>>>>>>>>>>>>>>>>>>> Up until now we've seen even in TB >>> >>> > > > > > >>>>>>> savepoints >>> >>> > > > > > >>>>>>>>> that >>> >>> > > > > > >>>>>>>>>>> the >>> >>> > > > > > >>>>>>>>>>>>>> number >>> >>> > > > > > >>>>>>>>>>>>>>>> of >>> >>> > > > > > >>>>>>>>>>>>>>>>>> keys >>> >>> > > > > > >>>>>>>>>>>>>>>>>>>> can >>> >>> > > > > > >>>>>>>>>>>>>>>>>>>>>> be extremely huge but not the per >>> key >>> >>> > > > > > >>>>> state >>> >>> > > > > > >>>>>>>>> itself. >>> >>> > > > > > >>>>>>>>>>>>>>>>>>>>>> But again, this is a good feature >>> >>> > > > > > >> as-is >>> >>> > > > > > >>>>> and >>> >>> > > > > > >>>>>>> can >>> >>> > > > > > >>>>>>>>> be >>> >>> > > > > > >>>>>>>>>>>>> handled >>> >>> > > > > > >>>>>>>>>>>>>>> in a >>> >>> > > > > > >>>>>>>>>>>>>>>>>>>> separate >>> >>> > > > > > >>>>>>>>>>>>>>>>>>>>>> jira. >>> >>> > > > > > >>>>>>>>>>>>>>>>>>>>> >>> >>> > > > > > >>>>>>>>>>>>>>>>>>>>> +1 for a separate jira. >>> >>> > > > > > >>>>>>>>>>>>>>>>>>>>> >>> >>> > > > > > >>>>>>>>>>>>>>>>>>>>> Best, >>> >>> > > > > > >>>>>>>>>>>>>>>>>>>>> Shengkai >>> >>> > > > > > >>>>>>>>>>>>>>>>>>>>> >>> >>> > > > > > >>>>>>>>>>>>>>>>>>>>> Gabor Somogyi < >>> >>> > > > > > >> gabor.g.somo...@gmail.com >>> >>> > > > > > >>>>> >>> >>> > > > > > >>>>>>>>>>> 于2025年3月10日周一 >>> >>> > > > > > >>>>>>>>>>>>>>> 19:05写道: >>> >>> > > > > > >>>>>>>>>>>>>>>>>>>>> >>> >>> > > > > > >>>>>>>>>>>>>>>>>>>>>> Hi Shengkai, >>> >>> > > > > > >>>>>>>>>>>>>>>>>>>>>> >>> >>> > > > > > >>>>>>>>>>>>>>>>>>>>>> Please see my comments inline. >>> >>> > > > > > >>>>>>>>>>>>>>>>>>>>>> >>> >>> > > > > > >>>>>>>>>>>>>>>>>>>>>> BR, >>> >>> > > > > > >>>>>>>>>>>>>>>>>>>>>> G >>> >>> > > > > > >>>>>>>>>>>>>>>>>>>>>> >>> >>> > > > > > >>>>>>>>>>>>>>>>>>>>>> >>> >>> > > > > > >>>>>>>>>>>>>>>>>>>>>> On Mon, Mar 3, 2025 at 7:07 AM >>> >>> > > > > > >> Shengkai >>> >>> > > > > > >>>>>>> Fang < >>> >>> > > > > > >>>>>>>>>>>>>>>> fskm...@gmail.com> >>> >>> > > > > > >>>>>>>>>>>>>>>>>>>> wrote: >>> >>> > > > > > >>>>>>>>>>>>>>>>>>>>>> >>> >>> > > > > > >>>>>>>>>>>>>>>>>>>>>>> Hi, Gabor. Thanks for your the >>> >>> > > > > > >> FLIP. >>> >>> > > > > > >>>> I >>> >>> > > > > > >>>>>>> have >>> >>> > > > > > >>>>>>>>> some >>> >>> > > > > > >>>>>>>>>>>>>> questions >>> >>> > > > > > >>>>>>>>>>>>>>>>> about >>> >>> > > > > > >>>>>>>>>>>>>>>>>>> the >>> >>> > > > > > >>>>>>>>>>>>>>>>>>>>>> FLIP: >>> >>> > > > > > >>>>>>>>>>>>>>>>>>>>>>> >>> >>> > > > > > >>>>>>>>>>>>>>>>>>>>>>> 1. State TTL for Value Columns >>> >>> > > > > > >>>>>>>>>>>>>>>>>>>>>>> How can users retrieve the state >>> >>> > > > > > >> TTL >>> >>> > > > > > >>>>>>>>>> (Time-to-Live) >>> >>> > > > > > >>>>>>>>>>>> for >>> >>> > > > > > >>>>>>>>>>>>>>> each >>> >>> > > > > > >>>>>>>>>>>>>>>>>> value >>> >>> > > > > > >>>>>>>>>>>>>>>>>>>>>> column? >>> >>> > > > > > >>>>>>>>>>>>>>>>>>>>>>> From my understanding of the >>> >>> > > > > > >> current >>> >>> > > > > > >>>>>>> design, >>> >>> > > > > > >>>>>>>> it >>> >>> > > > > > >>>>>>>>>>> seems >>> >>> > > > > > >>>>>>>>>>>>>> that >>> >>> > > > > > >>>>>>>>>>>>>>>> this >>> >>> > > > > > >>>>>>>>>>>>>>>>>>>>>>> functionality is not supported. >>> >>> > > > > > >> Could >>> >>> > > > > > >>>>> you >>> >>> > > > > > >>>>>>>>> clarify >>> >>> > > > > > >>>>>>>>>>> if >>> >>> > > > > > >>>>>>>>>>>>>> there >>> >>> > > > > > >>>>>>>>>>>>>>>> are >>> >>> > > > > > >>>>>>>>>>>>>>>>>>> plans >>> >>> > > > > > >>>>>>>>>>>>>>>>>>>> to >>> >>> > > > > > >>>>>>>>>>>>>>>>>>>>>>> address this limitation? >>> >>> > > > > > >>>>>>>>>>>>>>>>>>>>>>> >>> >>> > > > > > >>>>>>>>>>>>>>>>>>>>>> >>> >>> > > > > > >>>>>>>>>>>>>>>>>>>>>> Since the state processor API is not >>> >>> > > > > > >>>> yet >>> >>> > > > > > >>>>>>>> exposing >>> >>> > > > > > >>>>>>>>>>> this >>> >>> > > > > > >>>>>>>>>>>>>>>>> information >>> >>> > > > > > >>>>>>>>>>>>>>>>>>> this >>> >>> > > > > > >>>>>>>>>>>>>>>>>>>>>> would require several steps. >>> >>> > > > > > >>>>>>>>>>>>>>>>>>>>>> First, the state processor API >>> >>> > > > > > >> support >>> >>> > > > > > >>>>>>> needs to >>> >>> > > > > > >>>>>>>>> be >>> >>> > > > > > >>>>>>>>>>>> added >>> >>> > > > > > >>>>>>>>>>>>>>> which >>> >>> > > > > > >>>>>>>>>>>>>>>>> can >>> >>> > > > > > >>>>>>>>>>>>>>>>>> be >>> >>> > > > > > >>>>>>>>>>>>>>>>>>>>> then >>> >>> > > > > > >>>>>>>>>>>>>>>>>>>>>> exposed on the SQL API. >>> >>> > > > > > >>>>>>>>>>>>>>>>>>>>>> This is definitely a future >>> >>> > > > > > >> improvement >>> >>> > > > > > >>>>>>> which >>> >>> > > > > > >>>>>>>> is >>> >>> > > > > > >>>>>>>>>>> useful >>> >>> > > > > > >>>>>>>>>>>>> and >>> >>> > > > > > >>>>>>>>>>>>>>> can >>> >>> > > > > > >>>>>>>>>>>>>>>>> be >>> >>> > > > > > >>>>>>>>>>>>>>>>>>>>> handled >>> >>> > > > > > >>>>>>>>>>>>>>>>>>>>>> in a separate jira. >>> >>> > > > > > >>>>>>>>>>>>>>>>>>>>>> >>> >>> > > > > > >>>>>>>>>>>>>>>>>>>>>> >>> >>> > > > > > >>>>>>>>>>>>>>>>>>>>>>> 2. Metadata Table vs. Metadata >>> >>> > > > > > >> Column >>> >>> > > > > > >>>>>>>>>>>>>>>>>>>>>>> The metadata information described >>> >>> > > > > > >> in >>> >>> > > > > > >>>>> the >>> >>> > > > > > >>>>>>>> FLIP >>> >>> > > > > > >>>>>>>>>>>> appears >>> >>> > > > > > >>>>>>>>>>>>> to >>> >>> > > > > > >>>>>>>>>>>>>>> be >>> >>> > > > > > >>>>>>>>>>>>>>>>>>> intended >>> >>> > > > > > >>>>>>>>>>>>>>>>>>>>> to >>> >>> > > > > > >>>>>>>>>>>>>>>>>>>>>>> describe the state files stored at >>> >>> > > > > > >> a >>> >>> > > > > > >>>>>>> specific >>> >>> > > > > > >>>>>>>>>>>> location. >>> >>> > > > > > >>>>>>>>>>>>>> To >>> >>> > > > > > >>>>>>>>>>>>>>>> me, >>> >>> > > > > > >>>>>>>>>>>>>>>>>> this >>> >>> > > > > > >>>>>>>>>>>>>>>>>>>>>> concept >>> >>> > > > > > >>>>>>>>>>>>>>>>>>>>>>> aligns more closely with system >>> >>> > > > > > >>>> tables >>> >>> > > > > > >>>>>>> like >>> >>> > > > > > >>>>>>>>>>> pg_tables >>> >>> > > > > > >>>>>>>>>>>>> in >>> >>> > > > > > >>>>>>>>>>>>>>>>>> PostgreSQL >>> >>> > > > > > >>>>>>>>>>>>>>>>>>>> [1] >>> >>> > > > > > >>>>>>>>>>>>>>>>>>>>>> or >>> >>> > > > > > >>>>>>>>>>>>>>>>>>>>>>> the INFORMATION_SCHEMA in MySQL >>> >>> > > > > > >> [2]. >>> >>> > > > > > >>>>>>>>>>>>>>>>>>>>>>> >>> >>> > > > > > >>>>>>>>>>>>>>>>>>>>>> >>> >>> > > > > > >>>>>>>>>>>>>>>>>>>>>> Adding a new connector with >>> >>> > > > > > >>>>>>>> `savepoint-metadata` >>> >>> > > > > > >>>>>>>>>> is a >>> >>> > > > > > >>>>>>>>>>>>>>>> possibility >>> >>> > > > > > >>>>>>>>>>>>>>>>>>> where >>> >>> > > > > > >>>>>>>>>>>>>>>>>>>>> we >>> >>> > > > > > >>>>>>>>>>>>>>>>>>>>>> can create such functionality. >>> >>> > > > > > >>>>>>>>>>>>>>>>>>>>>> I'm not against that, just want to >>> >>> > > > > > >>>> have a >>> >>> > > > > > >>>>>>>> common >>> >>> > > > > > >>>>>>>>>>>>> agreement >>> >>> > > > > > >>>>>>>>>>>>>>> that >>> >>> > > > > > >>>>>>>>>>>>>>>>> we >>> >>> > > > > > >>>>>>>>>>>>>>>>>>>> would >>> >>> > > > > > >>>>>>>>>>>>>>>>>>>>>> like to move that direction. >>> >>> > > > > > >>>>>>>>>>>>>>>>>>>>>> (As a side note not just PG but >>> Spark >>> >>> > > > > > >>>> also >>> >>> > > > > > >>>>>>> has >>> >>> > > > > > >>>>>>>>>>> similar >>> >>> > > > > > >>>>>>>>>>>>>>> approach >>> >>> > > > > > >>>>>>>>>>>>>>>>>> and I >>> >>> > > > > > >>>>>>>>>>>>>>>>>>>>>> basically like the idea). >>> >>> > > > > > >>>>>>>>>>>>>>>>>>>>>> If we would go that direction >>> >>> > > > > > >> savepoint >>> >>> > > > > > >>>>>>>> metadata >>> >>> > > > > > >>>>>>>>>> can >>> >>> > > > > > >>>>>>>>>>> be >>> >>> > > > > > >>>>>>>>>>>>>>> reached >>> >>> > > > > > >>>>>>>>>>>>>>>>> in >>> >>> > > > > > >>>>>>>>>>>>>>>>>> a >>> >>> > > > > > >>>>>>>>>>>>>>>>>>>> way >>> >>> > > > > > >>>>>>>>>>>>>>>>>>>>>> that one row would represent >>> >>> > > > > > >>>>>>>>>>>>>>>>>>>>>> an operator with it's values >>> >>> > > > > > >> something >>> >>> > > > > > >>>>> like >>> >>> > > > > > >>>>>>>> this: >>> >>> > > > > > >>>>>>>>>>>>>>>>>>>>>> >>> >>> > > > > > >>>>>>>>>>>>>>>>>>>>>> >>> >>> > > > > > >>>>>>>>>>>>>>>>>>>>>> >>> >>> > > > > > >>>>>>>>>>>>>>>>>>>>> >>> >>> > > > > > >>>>>>>>>>>>>>>>>>>> >>> >>> > > > > > >>>>>>>>>>>>>>>>>>> >>> >>> > > > > > >>>>>>>>>>>>>>>>>> >>> >>> > > > > > >>>>>>>>>>>>>>>>> >>> >>> > > > > > >>>>>>>>>>>>>>>> >>> >>> > > > > > >>>>>>>>>>>>>>> >>> >>> > > > > > >>>>>>>>>>>>>> >>> >>> > > > > > >>>>>>>>>>>>> >>> >>> > > > > > >>>>>>>>>>>> >>> >>> > > > > > >>>>>>>>>>> >>> >>> > > > > > >>>>>>>>>> >>> >>> > > > > > >>>>>>>>> >>> >>> > > > > > >>>>>>>> >>> >>> > > > > > >>>>>>> >>> >>> > > > > > >>>>> >>> >>> > > > > > >>>> >>> >>> > > > > > >> >>> >>> > > > > > >>> >>> > > > > >>> >>> > > > >>> >>> > > >>> >>> > >>> >>> >>> ┌─────────┬─────────┬─────────┬─────────┬─────────┬─────────┬─────────┬────────┐ >>> >>> > > > > > >>>>>>>>>>>>>>>>>>>>>> >>> >>> > > > > > >>>>>>>>>>>>>>>>>>>>>> >>> >>> > > > > > >>>>>>>>>>>>>>>>>>>>> >>> >>> > > > > > >>>>>>>>>>>>>>>>>>>> >>> >>> > > > > > >>>>>>>>>>>>>>>>>>> >>> >>> > > > > > >>>>>>>>>>>>>>>>>> >>> >>> > > > > > >>>>>>>>>>>>>>>>> >>> >>> > > > > > >>>>>>>>>>>>>>>> >>> >>> > > > > > >>>>>>>>>>>>>>> >>> >>> > > > > > >>>>>>>>>>>>>> >>> >>> > > > > > >>>>>>>>>>>>> >>> >>> > > > > > >>>>>>>>>>>> >>> >>> > > > > > >>>>>>>>>>> >>> >>> > > > > > >>>>>>>>>> >>> >>> > > > > > >>>>>>>>> >>> >>> > > > > > >>>>>>>> >>> >>> > > > > > >>>>>>> >>> >>> > > > > > >>>>> >>> >>> > > > > > >>>> >>> >>> > > > > > >> >>> >>> > > > > > >>> >>> > > > > >>> >>> > > > >>> >>> > > >>> >>> > >>> >>> >>> │operatorN│operatorU│operatorH│paralleli│maxParall│subtaskSt│coordinat│totalSta│ >>> >>> > > > > > >>>>>>>>>>>>>>>>>>>>>> │ame │id │ash │sm >>> >>> > > > > > >>>>>>> │elism >>> >>> > > > > > >>>>>>>>>>>>>>>>>>>>>> │atesCount│orStateSi│tesSizeI│ >>> >>> > > > > > >>>>>>>>>>>>>>>>>>>>>> │ │ │ │ >>> >>> > > > > > >>>> │ >>> >>> > > > > > >>>>>>>>> │ >>> >>> > > > > > >>>>>>>>>>>>>>>>>>>>>> │zeInBytes│nBytes │ >>> >>> > > > > > >>>>>>>>>>>>>>>>>>>>>> >>> >>> > > > > > >>>>>>>>>>>>>>>>>>>>>> >>> >>> > > > > > >>>>>>>>>>>>>>>>>>>>> >>> >>> > > > > > >>>>>>>>>>>>>>>>>>>> >>> >>> > > > > > >>>>>>>>>>>>>>>>>>> >>> >>> > > > > > >>>>>>>>>>>>>>>>>> >>> >>> > > > > > >>>>>>>>>>>>>>>>> >>> >>> > > > > > >>>>>>>>>>>>>>>> >>> >>> > > > > > >>>>>>>>>>>>>>> >>> >>> > > > > > >>>>>>>>>>>>>> >>> >>> > > > > > >>>>>>>>>>>>> >>> >>> > > > > > >>>>>>>>>>>> >>> >>> > > > > > >>>>>>>>>>> >>> >>> > > > > > >>>>>>>>>> >>> >>> > > > > > >>>>>>>>> >>> >>> > > > > > >>>>>>>> >>> >>> > > > > > >>>>>>> >>> >>> > > > > > >>>>> >>> >>> > > > > > >>>> >>> >>> > > > > > >> >>> >>> > > > > > >>> >>> > > > > >>> >>> > > > >>> >>> > > >>> >>> > >>> >>> >>> ├─────────┼─────────┼─────────┼─────────┼─────────┼─────────┼─────────┼────────┤ >>> >>> > > > > > >>>>>>>>>>>>>>>>>>>>>> │Source: │datagen-s│47aee9439│2 >>> >>> > > > > > >>>>> │128 >>> >>> > > > > > >>>>>>>>>> │2 >>> >>> > > > > > >>>>>>>>>>>>>>> │16 >>> >>> > > > > > >>>>>>>>>>>>>>>>>>>>>> │546 │ >>> >>> > > > > > >>>>>>>>>>>>>>>>>>>>>> │datagen-s│ource-uid│4d6ea26e2│ >>> >>> > > > > > >>>> │ >>> >>> > > > > > >>>>>>>>> │ >>> >>> > > > > > >>>>>>>>>>>>>> │ >>> >>> > > > > > >>>>>>>>>>>>>>>>>>> │ >>> >>> > > > > > >>>>>>>>>>>>>>>>>>>>>> │ >>> >>> > > > > > >>>>>>>>>>>>>>>>>>>>>> │ource │ │d544bef0a│ >>> >>> > > > > > >>>> │ >>> >>> > > > > > >>>>>>>>> │ >>> >>> > > > > > >>>>>>>>>>>>>> │ >>> >>> > > > > > >>>>>>>>>>>>>>>>>>> │ >>> >>> > > > > > >>>>>>>>>>>>>>>>>>>>>> │ >>> >>> > > > > > >>>>>>>>>>>>>>>>>>>>>> │ │ │37bb5 │ >>> >>> > > > > > >>>> │ >>> >>> > > > > > >>>>>>>>> │ >>> >>> > > > > > >>>>>>>>>>>>>> │ >>> >>> > > > > > >>>>>>>>>>>>>>>>>>> │ >>> >>> > > > > > >>>>>>>>>>>>>>>>>>>>>> │ >>> >>> > > > > > >>>>>>>>>>>>>>>>>>>>>> >>> >>> > > > > > >>>>>>>>>>>>>>>>>>>>>> >>> >>> > > > > > >>>>>>>>>>>>>>>>>>>>> >>> >>> > > > > > >>>>>>>>>>>>>>>>>>>> >>> >>> > > > > > >>>>>>>>>>>>>>>>>>> >>> >>> > > > > > >>>>>>>>>>>>>>>>>> >>> >>> > > > > > >>>>>>>>>>>>>>>>> >>> >>> > > > > > >>>>>>>>>>>>>>>> >>> >>> > > > > > >>>>>>>>>>>>>>> >>> >>> > > > > > >>>>>>>>>>>>>> >>> >>> > > > > > >>>>>>>>>>>>> >>> >>> > > > > > >>>>>>>>>>>> >>> >>> > > > > > >>>>>>>>>>> >>> >>> > > > > > >>>>>>>>>> >>> >>> > > > > > >>>>>>>>> >>> >>> > > > > > >>>>>>>> >>> >>> > > > > > >>>>>>> >>> >>> > > > > > >>>>> >>> >>> > > > > > >>>> >>> >>> > > > > > >> >>> >>> > > > > > >>> >>> > > > > >>> >>> > > > >>> >>> > > >>> >>> > >>> >>> >>> ├─────────┼─────────┼─────────┼─────────┼─────────┼─────────┼─────────┼────────┤ >>> >>> > > > > > >>>>>>>>>>>>>>>>>>>>>> │long-udf-│long-udf-│6ed3f40bf│2 >>> >>> > > > > > >>>>> │128 >>> >>> > > > > > >>>>>>>>>> │2 >>> >>> > > > > > >>>>>>>>>>>>>>> │0 >>> >>> > > > > > >>>>>>>>>>>>>>>>>>>> │0 >>> >>> > > > > > >>>>>>>>>>>>>>>>>>>>>> │ >>> >>> > > > > > >>>>>>>>>>>>>>>>>>>>>> │with-mast│with-mast│f3c8dfcdf│ >>> >>> > > > > > >>>> │ >>> >>> > > > > > >>>>>>>>> │ >>> >>> > > > > > >>>>>>>>>>>>>> │ >>> >>> > > > > > >>>>>>>>>>>>>>>>>>> │ >>> >>> > > > > > >>>>>>>>>>>>>>>>>>>>>> │ >>> >>> > > > > > >>>>>>>>>>>>>>>>>>>>>> │er-hook │er-hook-u│cb95128a1│ >>> >>> > > > > > >>>> │ >>> >>> > > > > > >>>>>>>>> │ >>> >>> > > > > > >>>>>>>>>>>>>> │ >>> >>> > > > > > >>>>>>>>>>>>>>>>>>> │ >>> >>> > > > > > >>>>>>>>>>>>>>>>>>>>>> │ >>> >>> > > > > > >>>>>>>>>>>>>>>>>>>>>> │ │id │018f1 │ >>> >>> > > > > > >>>> │ >>> >>> > > > > > >>>>>>>>> │ >>> >>> > > > > > >>>>>>>>>>>>>> │ >>> >>> > > > > > >>>>>>>>>>>>>>>>>>> │ >>> >>> > > > > > >>>>>>>>>>>>>>>>>>>>>> │ >>> >>> > > > > > >>>>>>>>>>>>>>>>>>>>>> >>> >>> > > > > > >>>>>>>>>>>>>>>>>>>>>> >>> >>> > > > > > >>>>>>>>>>>>>>>>>>>>> >>> >>> > > > > > >>>>>>>>>>>>>>>>>>>> >>> >>> > > > > > >>>>>>>>>>>>>>>>>>> >>> >>> > > > > > >>>>>>>>>>>>>>>>>> >>> >>> > > > > > >>>>>>>>>>>>>>>>> >>> >>> > > > > > >>>>>>>>>>>>>>>> >>> >>> > > > > > >>>>>>>>>>>>>>> >>> >>> > > > > > >>>>>>>>>>>>>> >>> >>> > > > > > >>>>>>>>>>>>> >>> >>> > > > > > >>>>>>>>>>>> >>> >>> > > > > > >>>>>>>>>>> >>> >>> > > > > > >>>>>>>>>> >>> >>> > > > > > >>>>>>>>> >>> >>> > > > > > >>>>>>>> >>> >>> > > > > > >>>>>>> >>> >>> > > > > > >>>>> >>> >>> > > > > > >>>> >>> >>> > > > > > >> >>> >>> > > > > > >>> >>> > > > > >>> >>> > > > >>> >>> > > >>> >>> > >>> >>> >>> ├─────────┼─────────┼─────────┼─────────┼─────────┼─────────┼─────────┼────────┤ >>> >>> > > > > > >>>>>>>>>>>>>>>>>>>>>> │value-pro│value-pro│ca4f5fe9a│2 >>> >>> > > > > > >>>>> │128 >>> >>> > > > > > >>>>>>>>>> │2 >>> >>> > > > > > >>>>>>>>>>>>>>> │0 >>> >>> > > > > > >>>>>>>>>>>>>>>>>>>>>> │40726 │ >>> >>> > > > > > >>>>>>>>>>>>>>>>>>>>>> │cess │cess-uid │637b656f0│ >>> >>> > > > > > >>>> │ >>> >>> > > > > > >>>>>>>>> │ >>> >>> > > > > > >>>>>>>>>>>>>> │ >>> >>> > > > > > >>>>>>>>>>>>>>>>>>> │ >>> >>> > > > > > >>>>>>>>>>>>>>>>>>>>>> │ >>> >>> > > > > > >>>>>>>>>>>>>>>>>>>>>> │ │ │9ea78b3e7│ >>> >>> > > > > > >>>> │ >>> >>> > > > > > >>>>>>>>> │ >>> >>> > > > > > >>>>>>>>>>>>>> │ >>> >>> > > > > > >>>>>>>>>>>>>>>>>>> │ >>> >>> > > > > > >>>>>>>>>>>>>>>>>>>>>> │ >>> >>> > > > > > >>>>>>>>>>>>>>>>>>>>>> │ │ │a15b9 │ >>> >>> > > > > > >>>> │ >>> >>> > > > > > >>>>>>>>> │ >>> >>> > > > > > >>>>>>>>>>>>>> │ >>> >>> > > > > > >>>>>>>>>>>>>>>>>>> │ >>> >>> > > > > > >>>>>>>>>>>>>>>>>>>>>> │ >>> >>> > > > > > >>>>>>>>>>>>>>>>>>>>>> >>> >>> > > > > > >>>>>>>>>>>>>>>>>>>>>> >>> >>> > > > > > >>>>>>>>>>>>>>>>>>>>> >>> >>> > > > > > >>>>>>>>>>>>>>>>>>>> >>> >>> > > > > > >>>>>>>>>>>>>>>>>>> >>> >>> > > > > > >>>>>>>>>>>>>>>>>> >>> >>> > > > > > >>>>>>>>>>>>>>>>> >>> >>> > > > > > >>>>>>>>>>>>>>>> >>> >>> > > > > > >>>>>>>>>>>>>>> >>> >>> > > > > > >>>>>>>>>>>>>> >>> >>> > > > > > >>>>>>>>>>>>> >>> >>> > > > > > >>>>>>>>>>>> >>> >>> > > > > > >>>>>>>>>>> >>> >>> > > > > > >>>>>>>>>> >>> >>> > > > > > >>>>>>>>> >>> >>> > > > > > >>>>>>>> >>> >>> > > > > > >>>>>>> >>> >>> > > > > > >>>>> >>> >>> > > > > > >>>> >>> >>> > > > > > >> >>> >>> > > > > > >>> >>> > > > > >>> >>> > > > >>> >>> > > >>> >>> > >>> >>> >>> ├─────────┼─────────┼─────────┼─────────┼─────────┼─────────┼─────────┼────────┤ >>> >>> > > > > > >>>>>>>>>>>>>>>>>>>>>> >>> >>> > > > > > >>>>>>>>>>>>>>>>>>>>>> This table can then be joined with >>> >>> > > > > > >> the >>> >>> > > > > > >>>>>>> actually >>> >>> > > > > > >>>>>>>>>>>> existing >>> >>> > > > > > >>>>>>>>>>>>>>>>>> `savepoint` >>> >>> > > > > > >>>>>>>>>>>>>>>>>>>>>> connector created tables based on >>> UID >>> >>> > > > > > >>>> hash >>> >>> > > > > > >>>>>>>> (which >>> >>> > > > > > >>>>>>>>>> is >>> >>> > > > > > >>>>>>>>>>>>> unique >>> >>> > > > > > >>>>>>>>>>>>>>> and >>> >>> > > > > > >>>>>>>>>>>>>>>>>>> always >>> >>> > > > > > >>>>>>>>>>>>>>>>>>>>>> exists). >>> >>> > > > > > >>>>>>>>>>>>>>>>>>>>>> This would mean that the already >>> >>> > > > > > >>>> existing >>> >>> > > > > > >>>>>>> table >>> >>> > > > > > >>>>>>>>>> would >>> >>> > > > > > >>>>>>>>>>>>> need >>> >>> > > > > > >>>>>>>>>>>>>>>> only a >>> >>> > > > > > >>>>>>>>>>>>>>>>>>>> single >>> >>> > > > > > >>>>>>>>>>>>>>>>>>>>>> metadata column which is the UID >>> >>> > > > > > >> hash. >>> >>> > > > > > >>>>>>>>>>>>>>>>>>>>>> WDYT? >>> >>> > > > > > >>>>>>>>>>>>>>>>>>>>>> @zakelly, plz share your thoughts >>> >>> > > > > > >> too. >>> >>> > > > > > >>>>>>>>>>>>>>>>>>>>>> >>> >>> > > > > > >>>>>>>>>>>>>>>>>>>>>> >>> >>> > > > > > >>>>>>>>>>>>>>>>>>>>>>> If we opt to use metadata columns, >>> >>> > > > > > >>>> every >>> >>> > > > > > >>>>>>>> record >>> >>> > > > > > >>>>>>>>>> in >>> >>> > > > > > >>>>>>>>>>>> the >>> >>> > > > > > >>>>>>>>>>>>>>> table >>> >>> > > > > > >>>>>>>>>>>>>>>>>> would >>> >>> > > > > > >>>>>>>>>>>>>>>>>>>> end >>> >>> > > > > > >>>>>>>>>>>>>>>>>>>>> up >>> >>> > > > > > >>>>>>>>>>>>>>>>>>>>>>> having identical values for these >>> >>> > > > > > >>>>> columns >>> >>> > > > > > >>>>>>>>> (please >>> >>> > > > > > >>>>>>>>>>>>> correct >>> >>> > > > > > >>>>>>>>>>>>>>> me >>> >>> > > > > > >>>>>>>>>>>>>>>> if >>> >>> > > > > > >>>>>>>>>>>>>>>>>> I’m >>> >>> > > > > > >>>>>>>>>>>>>>>>>>>>>>> mistaken). On the other hand, the >>> >>> > > > > > >>>> state >>> >>> > > > > > >>>>>>>>> connector >>> >>> > > > > > >>>>>>>>>>>>>> requires >>> >>> > > > > > >>>>>>>>>>>>>>>>> users >>> >>> > > > > > >>>>>>>>>>>>>>>>>> to >>> >>> > > > > > >>>>>>>>>>>>>>>>>>>>>> specify >>> >>> > > > > > >>>>>>>>>>>>>>>>>>>>>>> an operator UID or operator UID >>> >>> > > > > > >> hash, >>> >>> > > > > > >>>>>>> after >>> >>> > > > > > >>>>>>>>> which >>> >>> > > > > > >>>>>>>>>>> it >>> >>> > > > > > >>>>>>>>>>>>>>> outputs >>> >>> > > > > > >>>>>>>>>>>>>>>>>>>>> user-defined >>> >>> > > > > > >>>>>>>>>>>>>>>>>>>>>>> values in its records. This >>> >>> > > > > > >> approach >>> >>> > > > > > >>>>> feels >>> >>> > > > > > >>>>>>>>>> somewhat >>> >>> > > > > > >>>>>>>>>>>>>>> redundant >>> >>> > > > > > >>>>>>>>>>>>>>>>> to >>> >>> > > > > > >>>>>>>>>>>>>>>>>>> me. >>> >>> > > > > > >>>>>>>>>>>>>>>>>>>>>>> >>> >>> > > > > > >>>>>>>>>>>>>>>>>>>>>> >>> >>> > > > > > >>>>>>>>>>>>>>>>>>>>>> If we would add a new >>> >>> > > > > > >>>> `savepoint-metadata` >>> >>> > > > > > >>>>>>>>>> connector >>> >>> > > > > > >>>>>>>>>>>> then >>> >>> > > > > > >>>>>>>>>>>>>>> this >>> >>> > > > > > >>>>>>>>>>>>>>>>> can >>> >>> > > > > > >>>>>>>>>>>>>>>>>> be >>> >>> > > > > > >>>>>>>>>>>>>>>>>>>>>> addressed. >>> >>> > > > > > >>>>>>>>>>>>>>>>>>>>>> On the other hand UID and UID hash >>> >>> > > > > > >> are >>> >>> > > > > > >>>>>>> having >>> >>> > > > > > >>>>>>>>>>> either-or >>> >>> > > > > > >>>>>>>>>>>>>>>>>> relationship >>> >>> > > > > > >>>>>>>>>>>>>>>>>>>> from >>> >>> > > > > > >>>>>>>>>>>>>>>>>>>>>> config perspective, >>> >>> > > > > > >>>>>>>>>>>>>>>>>>>>>> so when a user provides the UID then >>> >>> > > > > > >>>>> he/she >>> >>> > > > > > >>>>>>> can >>> >>> > > > > > >>>>>>>>> be >>> >>> > > > > > >>>>>>>>>>>>>> interested >>> >>> > > > > > >>>>>>>>>>>>>>>> in >>> >>> > > > > > >>>>>>>>>>>>>>>>>> the >>> >>> > > > > > >>>>>>>>>>>>>>>>>>>> hash >>> >>> > > > > > >>>>>>>>>>>>>>>>>>>>>> for further calculations >>> >>> > > > > > >>>>>>>>>>>>>>>>>>>>>> (the whole Flink internals are >>> >>> > > > > > >>>> depending >>> >>> > > > > > >>>>> on >>> >>> > > > > > >>>>>>> the >>> >>> > > > > > >>>>>>>>>>> hash). >>> >>> > > > > > >>>>>>>>>>>>>>> Printing >>> >>> > > > > > >>>>>>>>>>>>>>>>> out >>> >>> > > > > > >>>>>>>>>>>>>>>>>>> the >>> >>> > > > > > >>>>>>>>>>>>>>>>>>>>>> human readable UID >>> >>> > > > > > >>>>>>>>>>>>>>>>>>>>>> is an explicit requirement from the >>> >>> > > > > > >>>> user >>> >>> > > > > > >>>>>>> side >>> >>> > > > > > >>>>>>>>>> because >>> >>> > > > > > >>>>>>>>>>>>>> hashes >>> >>> > > > > > >>>>>>>>>>>>>>>> are >>> >>> > > > > > >>>>>>>>>>>>>>>>>> not >>> >>> > > > > > >>>>>>>>>>>>>>>>>>>>> human >>> >>> > > > > > >>>>>>>>>>>>>>>>>>>>>> readable. >>> >>> > > > > > >>>>>>>>>>>>>>>>>>>>>> >>> >>> > > > > > >>>>>>>>>>>>>>>>>>>>>> >>> >>> > > > > > >>>>>>>>>>>>>>>>>>>>>>> 3. Handling LIST and MAP States in >>> >>> > > > > > >>>> the >>> >>> > > > > > >>>>>>> State >>> >>> > > > > > >>>>>>>>>>>> Connector >>> >>> > > > > > >>>>>>>>>>>>>>>>>>>>>>> I have concerns about how the >>> >>> > > > > > >> current >>> >>> > > > > > >>>>>>> design >>> >>> > > > > > >>>>>>>>>>> handles >>> >>> > > > > > >>>>>>>>>>>>> LIST >>> >>> > > > > > >>>>>>>>>>>>>>> and >>> >>> > > > > > >>>>>>>>>>>>>>>>> MAP >>> >>> > > > > > >>>>>>>>>>>>>>>>>>>>> states. >>> >>> > > > > > >>>>>>>>>>>>>>>>>>>>>>> Specifically, the state connector >>> >>> > > > > > >>>> uses >>> >>> > > > > > >>>>>>> Flink >>> >>> > > > > > >>>>>>>>>> SQL’s >>> >>> > > > > > >>>>>>>>>>>> MAP >>> >>> > > > > > >>>>>>>>>>>>>> and >>> >>> > > > > > >>>>>>>>>>>>>>>>> ARRAY >>> >>> > > > > > >>>>>>>>>>>>>>>>>>>> types, >>> >>> > > > > > >>>>>>>>>>>>>>>>>>>>>>> which implies that it attempts to >>> >>> > > > > > >>>> load >>> >>> > > > > > >>>>>>> entire >>> >>> > > > > > >>>>>>>>> MAP >>> >>> > > > > > >>>>>>>>>>> or >>> >>> > > > > > >>>>>>>>>>>>> LIST >>> >>> > > > > > >>>>>>>>>>>>>>>>> states >>> >>> > > > > > >>>>>>>>>>>>>>>>>>> into >>> >>> > > > > > >>>>>>>>>>>>>>>>>>>>>>> memory. >>> >>> > > > > > >>>>>>>>>>>>>>>>>>>>>>> >>> >>> > > > > > >>>>>>>>>>>>>>>>>>>>>>> However, in many real-world >>> >>> > > > > > >>>> scenarios, >>> >>> > > > > > >>>>>>> these >>> >>> > > > > > >>>>>>>>>> states >>> >>> > > > > > >>>>>>>>>>>> can >>> >>> > > > > > >>>>>>>>>>>>>>> grow >>> >>> > > > > > >>>>>>>>>>>>>>>>> very >>> >>> > > > > > >>>>>>>>>>>>>>>>>>>>> large. >>> >>> > > > > > >>>>>>>>>>>>>>>>>>>>>>> Typically, the state API addresses >>> >>> > > > > > >>>> this >>> >>> > > > > > >>>>> by >>> >>> > > > > > >>>>>>>>>>> providing >>> >>> > > > > > >>>>>>>>>>>> an >>> >>> > > > > > >>>>>>>>>>>>>>>>> iterator >>> >>> > > > > > >>>>>>>>>>>>>>>>>> to >>> >>> > > > > > >>>>>>>>>>>>>>>>>>>>>>> traverse elements within the state >>> >>> > > > > > >>>>>>>>> incrementally. >>> >>> > > > > > >>>>>>>>>>> I’m >>> >>> > > > > > >>>>>>>>>>>>>>> unsure >>> >>> > > > > > >>>>>>>>>>>>>>>>>>> whether >>> >>> > > > > > >>>>>>>>>>>>>>>>>>>>> I’ve >>> >>> > > > > > >>>>>>>>>>>>>>>>>>>>>>> missed something in FLIP-496 or >>> >>> > > > > > >>>>> FLIP-512, >>> >>> > > > > > >>>>>>> but >>> >>> > > > > > >>>>>>>>> it >>> >>> > > > > > >>>>>>>>>>>> seems >>> >>> > > > > > >>>>>>>>>>>>>> that >>> >>> > > > > > >>>>>>>>>>>>>>>> the >>> >>> > > > > > >>>>>>>>>>>>>>>>>>>> current >>> >>> > > > > > >>>>>>>>>>>>>>>>>>>>>>> design might struggle with >>> >>> > > > > > >>>> scalability >>> >>> > > > > > >>>>> in >>> >>> > > > > > >>>>>>>> such >>> >>> > > > > > >>>>>>>>>>> cases. >>> >>> > > > > > >>>>>>>>>>>>>>>>>>>>>>> >>> >>> > > > > > >>>>>>>>>>>>>>>>>>>>>> >>> >>> > > > > > >>>>>>>>>>>>>>>>>>>>>> You see it good, the current >>> >>> > > > > > >>>>> implementation >>> >>> > > > > > >>>>>>>> keeps >>> >>> > > > > > >>>>>>>>>>> state >>> >>> > > > > > >>>>>>>>>>>>>> for a >>> >>> > > > > > >>>>>>>>>>>>>>>>>> single >>> >>> > > > > > >>>>>>>>>>>>>>>>>>>> key >>> >>> > > > > > >>>>>>>>>>>>>>>>>>>>> in >>> >>> > > > > > >>>>>>>>>>>>>>>>>>>>>> memory. >>> >>> > > > > > >>>>>>>>>>>>>>>>>>>>>> Back in the days we've considered >>> >>> > > > > > >> this >>> >>> > > > > > >>>>>>>> potential >>> >>> > > > > > >>>>>>>>>>> issue >>> >>> > > > > > >>>>>>>>>>>>> and >>> >>> > > > > > >>>>>>>>>>>>>>>>>> concluded >>> >>> > > > > > >>>>>>>>>>>>>>>>>>>> that >>> >>> > > > > > >>>>>>>>>>>>>>>>>>>>>> this is not necessarily >>> >>> > > > > > >>>>>>>>>>>>>>>>>>>>>> needed for the initial version and >>> >>> > > > > > >> can >>> >>> > > > > > >>>> be >>> >>> > > > > > >>>>>>> done >>> >>> > > > > > >>>>>>>>> as a >>> >>> > > > > > >>>>>>>>>>>> later >>> >>> > > > > > >>>>>>>>>>>>>>>>>>> improvement. >>> >>> > > > > > >>>>>>>>>>>>>>>>>>>>>> >>> >>> > > > > > >>>>>>>>>>>>>>>>>>>>>> Up until now we've seen even in TB >>> >>> > > > > > >>>>>>> savepoints >>> >>> > > > > > >>>>>>>>> that >>> >>> > > > > > >>>>>>>>>>> the >>> >>> > > > > > >>>>>>>>>>>>>> number >>> >>> > > > > > >>>>>>>>>>>>>>>> of >>> >>> > > > > > >>>>>>>>>>>>>>>>>> keys >>> >>> > > > > > >>>>>>>>>>>>>>>>>>>> can >>> >>> > > > > > >>>>>>>>>>>>>>>>>>>>>> be extremely huge but not the per >>> key >>> >>> > > > > > >>>>> state >>> >>> > > > > > >>>>>>>>> itself. >>> >>> > > > > > >>>>>>>>>>>>>>>>>>>>>> But again, this is a good feature >>> >>> > > > > > >> as-is >>> >>> > > > > > >>>>> and >>> >>> > > > > > >>>>>>> can >>> >>> > > > > > >>>>>>>>> be >>> >>> > > > > > >>>>>>>>>>>>> handled >>> >>> > > > > > >>>>>>>>>>>>>>> in a >>> >>> > > > > > >>>>>>>>>>>>>>>>>>>> separate >>> >>> > > > > > >>>>>>>>>>>>>>>>>>>>>> jira. >>> >>> > > > > > >>>>>>>>>>>>>>>>>>>>>> >>> >>> > > > > > >>>>>>>>>>>>>>>>>>>>>> >>> >>> > > > > > >>>>>>>>>>>>>>>>>>>>>>> >>> >>> > > > > > >>>>>>>>>>>>>>>>>>>>>>> Best, >>> >>> > > > > > >>>>>>>>>>>>>>>>>>>>>>> Shengkai >>> >>> > > > > > >>>>>>>>>>>>>>>>>>>>>>> >>> >>> > > > > > >>>>>>>>>>>>>>>>>>>>>>> [1] >>> >>> > > > > > >>>>>>>>>>>>>>>>> >>> >>> > > > > > >>>>>>>>> >>> >>> > > https://www.postgresql.org/docs/current/view-pg-tables.html >>> >>> > > > > > >>>>>>>>>>>>>>>>>>>>>>> [2] >>> >>> > > > > > >>>>>>>>>>>>>>>>>>>>>>> >>> >>> > > > > > >>>>>>>>>>>>>>>>>>>>>>> >>> >>> > > > > > >>>>>>>>>>>>>>>>>>>>>> >>> >>> > > > > > >>>>>>>>>>>>>>>>>>>>> >>> >>> > > > > > >>>>>>>>>>>>>>>>>>>> >>> >>> > > > > > >>>>>>>>>>>>>>>>>>> >>> >>> > > > > > >>>>>>>>>>>>>>>>>> >>> >>> > > > > > >>>>>>>>>>>>>>>>> >>> >>> > > > > > >>>>>>>>>>>>>>>> >>> >>> > > > > > >>>>>>>>>>>>>>> >>> >>> > > > > > >>>>>>>>>>>>>> >>> >>> > > > > > >>>>>>>>>>>>> >>> >>> > > > > > >>>>>>>>>>>> >>> >>> > > > > > >>>>>>>>>>> >>> >>> > > > > > >>>>>>>>>> >>> >>> > > > > > >>>>>>>>> >>> >>> > > > > > >>>>>>>> >>> >>> > > > > > >>>>>>> >>> >>> > > > > > >>>>> >>> >>> > > > > > >>>> >>> >>> > > > > > >> >>> >>> > > > > > >>> >>> > > > > >>> >>> > > > >>> >>> > > >>> >>> > >>> >>> >>> https://dev.mysql.com/doc/refman/8.4/en/information-schema-tables-table.html >>> >>> > > > > > >>>>>>>>>>>>>>>>>>>>>>> >>> >>> > > > > > >>>>>>>>>>>>>>>>>>>>>>> Gabor Somogyi < >>> >>> > > > > > >>>>> gabor.g.somo...@gmail.com> >>> >>> > > > > > >>>>>>>>>>>> 于2025年3月3日周一 >>> >>> > > > > > >>>>>>>>>>>>>>>>> 02:00写道: >>> >>> > > > > > >>>>>>>>>>>>>>>>>>>>>>> >>> >>> > > > > > >>>>>>>>>>>>>>>>>>>>>>>> Hi Zakelly, >>> >>> > > > > > >>>>>>>>>>>>>>>>>>>>>>>> >>> >>> > > > > > >>>>>>>>>>>>>>>>>>>>>>>> In order to shoot for simplicity >>> >>> > > > > > >>>>>>> `METADATA >>> >>> > > > > > >>>>>>>>>>> VIRTUAL` >>> >>> > > > > > >>>>>>>>>>>>> as >>> >>> > > > > > >>>>>>>>>>>>>>> key >>> >>> > > > > > >>>>>>>>>>>>>>>>>> words >>> >>> > > > > > >>>>>>>>>>>>>>>>>>>> for >>> >>> > > > > > >>>>>>>>>>>>>>>>>>>>>>>> definition is the target. >>> >>> > > > > > >>>>>>>>>>>>>>>>>>>>>>>> When it's not super complex the >>> >>> > > > > > >>>> latter >>> >>> > > > > > >>>>>>> can >>> >>> > > > > > >>>>>>>> be >>> >>> > > > > > >>>>>>>>>>> added >>> >>> > > > > > >>>>>>>>>>>>>> too. >>> >>> > > > > > >>>>>>>>>>>>>>>>>>>>>>>> >>> >>> > > > > > >>>>>>>>>>>>>>>>>>>>>>>> BR, >>> >>> > > > > > >>>>>>>>>>>>>>>>>>>>>>>> G >>> >>> > > > > > >>>>>>>>>>>>>>>>>>>>>>>> >>> >>> > > > > > >>>>>>>>>>>>>>>>>>>>>>>> >>> >>> > > > > > >>>>>>>>>>>>>>>>>>>>>>>> On Sun, Mar 2, 2025 at 3:37 PM >>> >>> > > > > > >>>> Zakelly >>> >>> > > > > > >>>>>>> Lan >>> >>> > > > > > >>>>>>>> < >>> >>> > > > > > >>>>>>>>>>>>>>>>>>> zakelly....@gmail.com> >>> >>> > > > > > >>>>>>>>>>>>>>>>>>>>>>> wrote: >>> >>> > > > > > >>>>>>>>>>>>>>>>>>>>>>>> >>> >>> > > > > > >>>>>>>>>>>>>>>>>>>>>>>>> Hi Gabor, >>> >>> > > > > > >>>>>>>>>>>>>>>>>>>>>>>>> >>> >>> > > > > > >>>>>>>>>>>>>>>>>>>>>>>>> +1 for this. >>> >>> > > > > > >>>>>>>>>>>>>>>>>>>>>>>>> >>> >>> > > > > > >>>>>>>>>>>>>>>>>>>>>>>>> Will the metadata column use >>> >>> > > > > > >>>>> `METADATA >>> >>> > > > > > >>>>>>>>>> VIRTUAL` >>> >>> > > > > > >>>>>>>>>>>> as >>> >>> > > > > > >>>>>>>>>>>>>> key >>> >>> > > > > > >>>>>>>>>>>>>>>>> words >>> >>> > > > > > >>>>>>>>>>>>>>>>>>> for >>> >>> > > > > > >>>>>>>>>>>>>>>>>>>>>>>>> definition, or `METADATA FROM >>> >>> > > > > > >> xxx >>> >>> > > > > > >>>>>>>> VIRTUAL` >>> >>> > > > > > >>>>>>>>>> for >>> >>> > > > > > >>>>>>>>>>>>>>> renaming, >>> >>> > > > > > >>>>>>>>>>>>>>>>> just >>> >>> > > > > > >>>>>>>>>>>>>>>>>>>> like >>> >>> > > > > > >>>>>>>>>>>>>>>>>>>>>> the >>> >>> > > > > > >>>>>>>>>>>>>>>>>>>>>>>>> Kafka table? >>> >>> > > > > > >>>>>>>>>>>>>>>>>>>>>>>>> >>> >>> > > > > > >>>>>>>>>>>>>>>>>>>>>>>>> >>> >>> > > > > > >>>>>>>>>>>>>>>>>>>>>>>>> Best, >>> >>> > > > > > >>>>>>>>>>>>>>>>>>>>>>>>> Zakelly >>> >>> > > > > > >>>>>>>>>>>>>>>>>>>>>>>>> >>> >>> > > > > > >>>>>>>>>>>>>>>>>>>>>>>>> On Sat, Mar 1, 2025 at 1:31 PM >>> >>> > > > > > >>>> Gabor >>> >>> > > > > > >>>>>>>>> Somogyi >>> >>> > > > > > >>>>>>>>>> < >>> >>> > > > > > >>>>>>>>>>>>>>>>>>>>>>> gabor.g.somo...@gmail.com> >>> >>> > > > > > >>>>>>>>>>>>>>>>>>>>>>>>> wrote: >>> >>> > > > > > >>>>>>>>>>>>>>>>>>>>>>>>> >>> >>> > > > > > >>>>>>>>>>>>>>>>>>>>>>>>>> Hi All, >>> >>> > > > > > >>>>>>>>>>>>>>>>>>>>>>>>>> >>> >>> > > > > > >>>>>>>>>>>>>>>>>>>>>>>>>> I'd like to start a >>> >>> > > > > > >> discussion >>> >>> > > > > > >>>> of >>> >>> > > > > > >>>>>>>>> FLIP-512: >>> >>> > > > > > >>>>>>>>>>> Add >>> >>> > > > > > >>>>>>>>>>>>>> meta >>> >>> > > > > > >>>>>>>>>>>>>>>>>>>> information >>> >>> > > > > > >>>>>>>>>>>>>>>>>>>>> to >>> >>> > > > > > >>>>>>>>>>>>>>>>>>>>>>> SQL >>> >>> > > > > > >>>>>>>>>>>>>>>>>>>>>>>>>> state connector [1]. >>> >>> > > > > > >>>>>>>>>>>>>>>>>>>>>>>>>> Feel free to add your >>> >>> > > > > > >> thoughts >>> >>> > > > > > >>>> to >>> >>> > > > > > >>>>>>> make >>> >>> > > > > > >>>>>>>>> this >>> >>> > > > > > >>>>>>>>>>>>> feature >>> >>> > > > > > >>>>>>>>>>>>>>>>> better. >>> >>> > > > > > >>>>>>>>>>>>>>>>>>>>>>>>>> >>> >>> > > > > > >>>>>>>>>>>>>>>>>>>>>>>>>> [1] >>> >>> > > > > > >>>>>>>>>>>>>>>>>>>>>>>>>> >>> >>> > > > > > >>>>>>>>>>>>>>>>>>>>>>>>>> >>> >>> > > > > > >>>>>>>>>>>>>>>>>>>>>>>>> >>> >>> > > > > > >>>>>>>>>>>>>>>>>>>>>>>> >>> >>> > > > > > >>>>>>>>>>>>>>>>>>>>>>> >>> >>> > > > > > >>>>>>>>>>>>>>>>>>>>>> >>> >>> > > > > > >>>>>>>>>>>>>>>>>>>>> >>> >>> > > > > > >>>>>>>>>>>>>>>>>>>> >>> >>> > > > > > >>>>>>>>>>>>>>>>>>> >>> >>> > > > > > >>>>>>>>>>>>>>>>>> >>> >>> > > > > > >>>>>>>>>>>>>>>>> >>> >>> > > > > > >>>>>>>>>>>>>>>> >>> >>> > > > > > >>>>>>>>>>>>>>> >>> >>> > > > > > >>>>>>>>>>>>>> >>> >>> > > > > > >>>>>>>>>>>>> >>> >>> > > > > > >>>>>>>>>>>> >>> >>> > > > > > >>>>>>>>>>> >>> >>> > > > > > >>>>>>>>>> >>> >>> > > > > > >>>>>>>>> >>> >>> > > > > > >>>>>>>> >>> >>> > > > > > >>>>>>> >>> >>> > > > > > >>>>> >>> >>> > > > > > >>>> >>> >>> > > > > > >> >>> >>> > > > > > >>> >>> > > > > >>> >>> > > > >>> >>> > > >>> >>> > >>> >>> >>> https://cwiki.apache.org/confluence/display/FLINK/FLIP-512%3A+Add+meta+information+to+SQL+state+connector >>> >>> > > > > > >>>>>>>>>>>>>>>>>>>>>>>>>> >>> >>> > > > > > >>>>>>>>>>>>>>>>>>>>>>>>>> BR, >>> >>> > > > > > >>>>>>>>>>>>>>>>>>>>>>>>>> G >>> >>> > > > > > >>>>>>>>>>>>>>>>>>>>>>>>>> >>> >>> > > > > > >>>>>>>>>>>>>>>>>>>>>>>>> >>> >>> > > > > > >>>>>>>>>>>>>>>>>>>>>>>> >>> >>> > > > > > >>>>>>>>>>>>>>>>>>>>>>> >>> >>> > > > > > >>>>>>>>>>>>>>>>>>>>>> >>> >>> > > > > > >>>>>>>>>>>>>>>>>>>>> >>> >>> > > > > > >>>>>>>>>>>>>>>>>>>> >>> >>> > > > > > >>>>>>>>>>>>>>>>>>> >>> >>> > > > > > >>>>>>>>>>>>>>>>>> >>> >>> > > > > > >>>>>>>>>>>>>>>>> >>> >>> > > > > > >>>>>>>>>>>>>>>> >>> >>> > > > > > >>>>>>>>>>>>>>> >>> >>> > > > > > >>>>>>>>>>>>>> >>> >>> > > > > > >>>>>>>>>>>>> >>> >>> > > > > > >>>>>>>>>>>> >>> >>> > > > > > >>>>>>>>>>> >>> >>> > > > > > >>>>>>>>>> >>> >>> > > > > > >>>>>>>>> >>> >>> > > > > > >>>>>>>> >>> >>> > > > > > >>>>>>> >>> >>> > > > > > >>>>>> >>> >>> > > > > > >>>>> >>> >>> > > > > > >>>> >>> >>> > > > > > >>> >>> >>> > > > > > >> >>> >>> > > > > > >>> >>> > > > > > >>> >>> > > > > >>> >>> > > > >>> >>> > > >>> >>> > >>> >>> >>> >> >>> >>