In the meantime I've just updated the FLIP according to this to be
optimistic πŸ™‚

BR,
G

On Thu, Mar 27, 2025 at 2:15β€―PM Gabor Somogyi <gabor.g.somo...@gmail.com>
wrote:

> Considering all the facts I also +1 on PTF. Even if something is missing
> we can add later.
>
> @Zakelly Lan <zakelly....@gmail.com> @Shengkai Fang are you also on the
> same page or have something to add?
>
> BR,
> G
>
>
> On Thu, Mar 27, 2025 at 1:50β€―PM Lincoln Lee <lincoln.8...@gmail.com>
> wrote:
>
>> +1 for PTF
>>
>> > Is it possible to describe such function to see the column names/types?
>>
>> Although Flink SQL does not directly support this feature, users can
>> achieve
>> similar results with the help of `explain` syntax, e.g.
>> 'explain select * from read_state_metadata(...)'
>>
>>
>> Best,
>> Lincoln Lee
>>
>>
>> Gyula FΓ³ra <gyula.f...@gmail.com> 于2025εΉ΄3月27ζ—₯周四 20:41ε†™ι“οΌš
>>
>> > Hey!
>> >
>> > I think the PTF approach strikes a great balance in simplicity and the
>> > capabilities that we get out of it.
>> >
>> > I think this could be a completely viable alternative to the dedicated
>> > connector, +1.
>> >
>> > Cheers,
>> > Gyula
>> >
>> > On Thu, Mar 27, 2025 at 10:37β€―AM Shengkai Fang <fskm...@gmail.com>
>> wrote:
>> >
>> > > Hi, Gabor.
>> > >
>> > > > Do I understand correctly that this is 2.x only feature and we can't
>> > > backport it to 1.x line
>> > >
>> > > Yes. PTF is only supported in 2.x verison.
>> > >
>> > > > Is it possible to describe such function to see the column
>> names/types?
>> > >
>> > > Flink SQL doesn't support this feature, but postgres[2] or mysql[1]
>> has
>> > > similar feature.
>> > >
>> > > [1]
>> https://dev.mysql.com/doc/refman/8.4/en/show-create-procedure.html
>> > > [2]
>> > >
>> > >
>> >
>> https://stackoverflow.com/questions/6898453/show-the-code-of-a-function-procedure-and-trigger-in-postgresql
>> > >
>> > > Best,
>> > > Shengkai
>> > >
>> > >
>> > > Gabor Somogyi <gabor.g.somo...@gmail.com> 于2025εΉ΄3月27ζ—₯周四 16:25ε†™ι“οΌš
>> > >
>> > > > Hi Shengkai,
>> > > >
>> > > > Thanks for your effort with the example, this looks promising.
>> > > > I like the fact that users wouldn't need to sweat with complex
>> create
>> > > table
>> > > > statements.
>> > > >
>> > > > Couple of questions:
>> > > > * Do I understand correctly that this is 2.x only feature and we
>> can't
>> > > > backport it to 1.x line?
>> > > > I'm not intended to do any backport, just would like to know the
>> > > technical
>> > > > constraints.
>> > > > * Is it possible to describe such function to see the column
>> > names/types?
>> > > >
>> > > > BR,
>> > > > G
>> > > >
>> > > >
>> > > > On Thu, Mar 27, 2025 at 3:17β€―AM Shengkai Fang <fskm...@gmail.com>
>> > wrote:
>> > > >
>> > > > > Many thanks for your reminder, Leonard. Here's the link I
>> > mentioned[1].
>> > > > >
>> > > > > Best,
>> > > > > Shengkai
>> > > > >
>> > > > > [1] https://github.com/apache/flink/pull/26358
>> > > > >
>> > > > > Leonard Xu <xbjt...@gmail.com> 于2025εΉ΄3月27ζ—₯周四 10:05ε†™ι“οΌš
>> > > > >
>> > > > > > Your link is broken, Shengkai
>> > > > > >
>> > > > > > Best,
>> > > > > > Leonard
>> > > > > >
>> > > > > > > 2025εΉ΄3月27ζ—₯ 10:01,Shengkai Fang <fskm...@gmail.com> ε†™ι“οΌš
>> > > > > > >
>> > > > > > > Hi, All.
>> > > > > > >
>> > > > > > > I write a simple demo to illustrate my idea. Hope this helps.
>> > > > > > >
>> > > > > > > Best,
>> > > > > > > Shengkai
>> > > > > > >
>> > > > > > >
>> > > > > >
>> > > > >
>> > > >
>> > >
>> >
>> https://github.com/apache/flink/compare/master...fsk119:flink:example?expand=1
>> > > > > > >
>> > > > > > > Gabor Somogyi <gabor.g.somo...@gmail.com> 于2025εΉ΄3月26ζ—₯周三
>> 15:54ε†™ι“οΌš
>> > > > > > >
>> > > > > > >>> I'm fine with a seperate SQL connector for metadata, so
>> maybe
>> > we
>> > > > > could
>> > > > > > >> update the FLIP about our discussion?
>> > > > > > >>
>> > > > > > >> Sorry, I've forgotten this part. Yeah, no matter we choose
>> I'm
>> > > going
>> > > > > to
>> > > > > > >> update the FLIP.
>> > > > > > >>
>> > > > > > >> G
>> > > > > > >>
>> > > > > > >>
>> > > > > > >> On Wed, Mar 26, 2025 at 8:51β€―AM Gabor Somogyi <
>> > > > > > gabor.g.somo...@gmail.com>
>> > > > > > >> wrote:
>> > > > > > >>
>> > > > > > >>> Hi All,
>> > > > > > >>>
>> > > > > > >>> I've also lack of the knowledge of PTF so I've read just the
>> > > > > motivation
>> > > > > > >>> part:
>> > > > > > >>>
>> > > > > > >>> "The SQL 2016 standard introduced a way of defining custom
>> SQL
>> > > > > > operators
>> > > > > > >>> defined by ISO/IEC 19075-7:2021 (Part 7: Polymorphic table
>> > > > > functions).
>> > > > > > >>> ~200 pages define how this new kind of function can consume
>> and
>> > > > > produce
>> > > > > > >>> tables with various execution properties.
>> > > > > > >>> Unfortunately, this part of the standard is not publicly
>> > > > available."
>> > > > > > >>>
>> > > > > > >>> Of course we can take a look at some examples but do we
>> really
>> > > want
>> > > > > to
>> > > > > > >>> expose state data with this construct
>> > > > > > >>> which is described in ~200 pages and part of the standard is
>> > not
>> > > > > > publicly
>> > > > > > >>> available? πŸ™‚
>> > > > > > >>> I mean the dataset is couple of rows and the use-case is
>> join
>> > > with
>> > > > > > >> another
>> > > > > > >>> table like with state data.
>> > > > > > >>> If somebody can give advantages I would buy that but from my
>> > > > limited
>> > > > > > >>> understanding this would be an overkill here.
>> > > > > > >>>
>> > > > > > >>> BR,
>> > > > > > >>> G
>> > > > > > >>>
>> > > > > > >>>
>> > > > > > >>> On Wed, Mar 26, 2025 at 8:28β€―AM Gyula FΓ³ra <
>> > gyula.f...@gmail.com
>> > > >
>> > > > > > wrote:
>> > > > > > >>>
>> > > > > > >>>> Hi Zakelly , Shengkai!
>> > > > > > >>>>
>> > > > > > >>>> I don't know too much about PTFs, it would be interesting
>> to
>> > see
>> > > > how
>> > > > > > the
>> > > > > > >>>> usage would look in practice.
>> > > > > > >>>>
>> > > > > > >>>> Do you have some mockup/example in mind how the PTF would
>> look
>> > > for
>> > > > > > >> example
>> > > > > > >>>> when want to:
>> > > > > > >>>> - Simply display/aggregate whats in the metadata
>> > > > > > >>>> - Join keyed state with some metadata columns
>> > > > > > >>>>
>> > > > > > >>>> Thanks
>> > > > > > >>>> Gyula
>> > > > > > >>>>
>> > > > > > >>>> On Wed, Mar 26, 2025 at 7:33β€―AM Zakelly Lan <
>> > > > zakelly....@gmail.com>
>> > > > > > >>>> wrote:
>> > > > > > >>>>
>> > > > > > >>>>> Hi everyone,
>> > > > > > >>>>>
>> > > > > > >>>>> I'm fine with a seperate SQL connector for metadata, so
>> maybe
>> > > we
>> > > > > > could
>> > > > > > >>>>> update the FLIP about our discussion? And Shengkai
>> provides a
>> > > PTF
>> > > > > > >>>>> implementation, does that also meet the requirement?
>> > > > > > >>>>>
>> > > > > > >>>>>
>> > > > > > >>>>> Best,
>> > > > > > >>>>> Zakelly
>> > > > > > >>>>>
>> > > > > > >>>>> On Thu, Mar 20, 2025 at 4:47β€―PM Gabor Somogyi <
>> > > > > > >>>> gabor.g.somo...@gmail.com>
>> > > > > > >>>>> wrote:
>> > > > > > >>>>>
>> > > > > > >>>>>> Hi All,
>> > > > > > >>>>>>
>> > > > > > >>>>>> @Zakelly: Gyula summarised it correctly what I meant so
>> > please
>> > > > > treat
>> > > > > > >>>> the
>> > > > > > >>>>>> content as mine.
>> > > > > > >>>>>> As an addition I'm not against to add CLI at all, I'm
>> just
>> > > > stating
>> > > > > > >>>> that
>> > > > > > >>>>> in
>> > > > > > >>>>>> some cases like this, users would like to have
>> > > > > > >>>>>> a self-serving solution where they can provide SQL
>> > statements
>> > > > > which
>> > > > > > >>>> can
>> > > > > > >>>>>> trigger alerts automatically.
>> > > > > > >>>>>>
>> > > > > > >>>>>> My personal opinion is that CLI would be beneficial for
>> > > several
>> > > > > > >>>> cases. A
>> > > > > > >>>>>> good example is when users want to restart job
>> > > > > > >>>>>> from specific Kafka offsets which are persisted in a
>> > > savepoint.
>> > > > > For
>> > > > > > >>>> such
>> > > > > > >>>>>> scenario users are more than happy since they
>> > > > > > >>>>>> expect manual intervention with full control. So all in
>> all
>> > > one
>> > > > > can
>> > > > > > >>>> count
>> > > > > > >>>>>> on my +1 when CLI FLIP would come up...
>> > > > > > >>>>>>
>> > > > > > >>>>>> BR,
>> > > > > > >>>>>> G
>> > > > > > >>>>>>
>> > > > > > >>>>>>
>> > > > > > >>>>>> On Thu, Mar 20, 2025 at 8:20β€―AM Gyula FΓ³ra <
>> > > > gyula.f...@gmail.com>
>> > > > > > >>>> wrote:
>> > > > > > >>>>>>
>> > > > > > >>>>>>> Hi!
>> > > > > > >>>>>>>
>> > > > > > >>>>>>> @Zakelly Lan <zakelly....@gmail.com>
>> > > > > > >>>>>>> I think what Gabor means is that users want to have
>> > > predefined
>> > > > > SQL
>> > > > > > >>>>> scripts
>> > > > > > >>>>>>> to perform state analysis tasks to debug/identify
>> problems.
>> > > > > > >>>>>>> Such as write a SQL script that joins the metadata table
>> > with
>> > > > the
>> > > > > > >>>> state
>> > > > > > >>>>>>> and
>> > > > > > >>>>>>> do some analytics on it.
>> > > > > > >>>>>>>
>> > > > > > >>>>>>> If we have a meta table then the SQL script that can do
>> > this
>> > > is
>> > > > > > >> fixed
>> > > > > > >>>>> and
>> > > > > > >>>>>>> users can trigger this on demand by simply providing a
>> new
>> > > > > > >> savepoint
>> > > > > > >>>>> path.
>> > > > > > >>>>>>>
>> > > > > > >>>>>>> If we have a different mechanism to extract metadata
>> that
>> > is
>> > > > not
>> > > > > > >> SQL
>> > > > > > >>>>>>> native
>> > > > > > >>>>>>> then manual steps need to be executed and a custom SQL
>> > script
>> > > > > would
>> > > > > > >>>> need
>> > > > > > >>>>>>> to
>> > > > > > >>>>>>> be written that adds the manually extracted metadata
>> into
>> > the
>> > > > > > >> script.
>> > > > > > >>>>>>>
>> > > > > > >>>>>>> Cheers,
>> > > > > > >>>>>>> Gyula
>> > > > > > >>>>>>>
>> > > > > > >>>>>>> On Thu, Mar 20, 2025 at 4:32β€―AM Zakelly Lan <
>> > > > > zakelly....@gmail.com
>> > > > > > >>>
>> > > > > > >>>>>>> wrote:
>> > > > > > >>>>>>>
>> > > > > > >>>>>>>> Hi all,
>> > > > > > >>>>>>>>
>> > > > > > >>>>>>>> Thanks for your answers! Getting everyone aligned on
>> this
>> > > > topic
>> > > > > > >> is
>> > > > > > >>>>>>>> challenging, but it’s definitely worth the effort
>> since it
>> > > > will
>> > > > > > >>>> help
>> > > > > > >>>>>>>> streamline things moving forward.
>> > > > > > >>>>>>>>
>> > > > > > >>>>>>>> @Gabor are you saying that users are using some
>> scripts to
>> > > > > define
>> > > > > > >>>> the
>> > > > > > >>>>>>> SQL
>> > > > > > >>>>>>>> metadata connector and get the information, right? If
>> so,
>> > > > would
>> > > > > a
>> > > > > > >>>> CLI
>> > > > > > >>>>>>> tool
>> > > > > > >>>>>>>> be more convenient? It's easy to invoke and can get the
>> > > result
>> > > > > > >>>>> swiftly.
>> > > > > > >>>>>>> And
>> > > > > > >>>>>>>> there should be some other systems to track the
>> checkpoint
>> > > > > > >> lineage
>> > > > > > >>>> and
>> > > > > > >>>>>>>> analyze if there are outliers in metadata (e.g. state
>> size
>> > > of
>> > > > > one
>> > > > > > >>>>>>> operator)
>> > > > > > >>>>>>>> right? Well, maybe I missed something so please
>> correct me
>> > > if
>> > > > > I'm
>> > > > > > >>>>> wrong.
>> > > > > > >>>>>>>>
>> > > > > > >>>>>>>> I think the overall vision in Flink SQL is to provide a
>> > SQL
>> > > > > > >> native
>> > > > > > >>>>>>>>> environment where we can serve complex use-cases like
>> you
>> > > > would
>> > > > > > >>>>> expect
>> > > > > > >>>>>>>> in a
>> > > > > > >>>>>>>>> regular database.
>> > > > > > >>>>>>>>
>> > > > > > >>>>>>>>
>> > > > > > >>>>>>>> @Gyula Well, this is a good point. From the
>> perspective of
>> > > > > > >>>>> comprehensive
>> > > > > > >>>>>>>> SQL experience, I'd +1 for treating metadata as data.
>> > > > Although I
>> > > > > > >>>> doubt
>> > > > > > >>>>>>> if
>> > > > > > >>>>>>>> there is a need for processing metadata, I won't be
>> > against
>> > > a
>> > > > > > >>>> separate
>> > > > > > >>>>>>>> connector.
>> > > > > > >>>>>>>>
>> > > > > > >>>>>>>> Regarding the CLI tool, I still think it’s worth
>> > > implementing.
>> > > > > > >>>> Such a
>> > > > > > >>>>>>> tool
>> > > > > > >>>>>>>> could provide savepoint information before resuming
>> from a
>> > > > > > >>>> savepoint,
>> > > > > > >>>>>>> which
>> > > > > > >>>>>>>> would enhance the user experience in CLI-based
>> workflows.
>> > It
>> > > > > > >> would
>> > > > > > >>>> be
>> > > > > > >>>>>>> good
>> > > > > > >>>>>>>> if someone could implement this feature. We shouldn’t
>> > worry
>> > > > > about
>> > > > > > >>>>>>> whether
>> > > > > > >>>>>>>> this tool might be retired in the future. Regardless of
>> > the
>> > > > > > >>>> SQL-based
>> > > > > > >>>>>>>> solution we eventually adopt, this capability will
>> remain
>> > > > > > >> essential
>> > > > > > >>>>> for
>> > > > > > >>>>>>> CLI
>> > > > > > >>>>>>>> users. This is another topic.
>> > > > > > >>>>>>>>
>> > > > > > >>>>>>>>
>> > > > > > >>>>>>>> Best,
>> > > > > > >>>>>>>> Zakelly
>> > > > > > >>>>>>>>
>> > > > > > >>>>>>>>
>> > > > > > >>>>>>>> On Thu, Mar 20, 2025 at 10:37β€―AM Shengkai Fang <
>> > > > > > >> fskm...@gmail.com>
>> > > > > > >>>>>>> wrote:
>> > > > > > >>>>>>>>
>> > > > > > >>>>>>>>> Hi.
>> > > > > > >>>>>>>>>
>> > > > > > >>>>>>>>> After reading the doc[1], I think Spark provides a
>> > function
>> > > > for
>> > > > > > >>>>> users
>> > > > > > >>>>>>> to
>> > > > > > >>>>>>>>> consume the metadata from the savepoint.  In Flink
>> SQL,
>> > > > similar
>> > > > > > >>>>>>>>> functionality is implemented through Polymorphic Table
>> > > > > > >> Functions
>> > > > > > >>>>>>> (PTF) as
>> > > > > > >>>>>>>>> proposed in FLIP-440[2]. Below is a code example[3]
>> > > > > > >> illustrating
>> > > > > > >>>>> this
>> > > > > > >>>>>>>>> concept:
>> > > > > > >>>>>>>>>
>> > > > > > >>>>>>>>> ```
>> > > > > > >>>>>>>>>    public static class ScalarArgsFunction extends
>> > > > > > >>>>>>>>> TestProcessTableFunctionBase {
>> > > > > > >>>>>>>>>        public void eval(Integer i, Boolean b) {
>> > > > > > >>>>>>>>>            collectObjects(i, b);
>> > > > > > >>>>>>>>>        }
>> > > > > > >>>>>>>>>    }
>> > > > > > >>>>>>>>> ```
>> > > > > > >>>>>>>>>
>> > > > > > >>>>>>>>> ```
>> > > > > > >>>>>>>>> INSERT INTO sink SELECT * FROM f(i => 42, b =>
>> > CAST('TRUE'
>> > > AS
>> > > > > > >>>>>>> BOOLEAN))
>> > > > > > >>>>>>>>> ``
>> > > > > > >>>>>>>>>
>> > > > > > >>>>>>>>> So we can add a builtin function named
>> > > `read_state_metadata`
>> > > > to
>> > > > > > >>>> read
>> > > > > > >>>>>>>>> savepoint data.
>> > > > > > >>>>>>>>>
>> > > > > > >>>>>>>>> Best,
>> > > > > > >>>>>>>>> Shengkai
>> > > > > > >>>>>>>>>
>> > > > > > >>>>>>>>> [1]
>> > > > > > >>>>>>>>>
>> > > > > > >>>>>>>>>
>> > > > > > >>>>>>>>
>> > > > > > >>>>>>>
>> > > > > > >>>>>
>> > > > > > >>>>
>> > > > > > >>
>> > > > > >
>> > > > >
>> > > >
>> > >
>> >
>> https://docs.databricks.com/aws/en/structured-streaming/read-state?language=SQL
>> > > > > > >>>>>>>>> [2]
>> > > > > > >>>>>>>>>
>> > > > > > >>>>>>>>
>> > > > > > >>>>>>>
>> > > > > > >>>>>
>> > > > > > >>>>
>> > > > > > >>
>> > > > > >
>> > > > >
>> > > >
>> > >
>> >
>> https://cwiki.apache.org/confluence/pages/viewpage.action?pageId=298781093
>> > > > > > >>>>>>>>> [3]
>> > > > > > >>>>>>>>>
>> > > > > > >>>>>>>>>
>> > > > > > >>>>>>>>
>> > > > > > >>>>>>>
>> > > > > > >>>>>
>> > > > > > >>>>
>> > > > > > >>
>> > > > > >
>> > > > >
>> > > >
>> > >
>> >
>> https://github.com/apache/flink/blob/master/flink-table/flink-table-planner/src/test/java/org/apache/flink/table/planner/plan/nodes/exec/stream/ProcessTableFunctionTestPrograms.java#L140
>> > > > > > >>>>>>>>>
>> > > > > > >>>>>>>>> Gyula FΓ³ra <gyula.f...@gmail.com> 于2025εΉ΄3月19ζ—₯周三
>> 18:37ε†™ι“οΌš
>> > > > > > >>>>>>>>>
>> > > > > > >>>>>>>>>> Hi All!
>> > > > > > >>>>>>>>>>
>> > > > > > >>>>>>>>>> Thank you for the answers and concerns from everyone.
>> > > > > > >>>>>>>>>>
>> > > > > > >>>>>>>>>> On the CLI vs State Metadata Connector/Table
>> question I
>> > > > would
>> > > > > > >>>> also
>> > > > > > >>>>>>> like
>> > > > > > >>>>>>>>> to
>> > > > > > >>>>>>>>>> step back a little and look at the bigger picture.
>> > > > > > >>>>>>>>>>
>> > > > > > >>>>>>>>>> I think the overall vision in Flink SQL is to
>> provide a
>> > > SQL
>> > > > > > >>>> native
>> > > > > > >>>>>>>>>> environment where we can serve complex use-cases like
>> > you
>> > > > > > >> would
>> > > > > > >>>>>>> expect
>> > > > > > >>>>>>>>> in a
>> > > > > > >>>>>>>>>> regular database.
>> > > > > > >>>>>>>>>> Most features, developments in the recent years have
>> > gone
>> > > > > > >> this
>> > > > > > >>>>> way.
>> > > > > > >>>>>>>>>>
>> > > > > > >>>>>>>>>> The State Metadata Table would be a natural and
>> > > > > > >> straightforward
>> > > > > > >>>>> fit
>> > > > > > >>>>>>>> here.
>> > > > > > >>>>>>>>>> So from my side, +1 for that.
>> > > > > > >>>>>>>>>>
>> > > > > > >>>>>>>>>> However I could understand if we are not ready to
>> add a
>> > > new
>> > > > > > >>>>>>>>>> connector/format due to maintenance concerns (and in
>> > > general
>> > > > > > >>>>> concern
>> > > > > > >>>>>>>>> about
>> > > > > > >>>>>>>>>> the design).
>> > > > > > >>>>>>>>>> If that's the issue then we should spend more time on
>> > the
>> > > > > > >>>> design
>> > > > > > >>>>> to
>> > > > > > >>>>>>> get
>> > > > > > >>>>>>>>>> comfortable with the approach and seek feedback from
>> the
>> > > > > > >> wider
>> > > > > > >>>>>>>> community
>> > > > > > >>>>>>>>>>
>> > > > > > >>>>>>>>>> I am -1 for the CLI/tooling approach as that will not
>> > > > provide
>> > > > > > >>>> the
>> > > > > > >>>>>>>>>> featureset we are looking for that is not already
>> > covered
>> > > by
>> > > > > > >>>> the
>> > > > > > >>>>>>> Java
>> > > > > > >>>>>>>>>> connector. And that approach would come with the same
>> > > > > > >>>> maintenance
>> > > > > > >>>>>>>>>> implications.
>> > > > > > >>>>>>>>>>
>> > > > > > >>>>>>>>>> Cheers
>> > > > > > >>>>>>>>>> Gyula
>> > > > > > >>>>>>>>>>
>> > > > > > >>>>>>>>>>
>> > > > > > >>>>>>>>>> On Wed, Mar 19, 2025 at 11:24β€―AM Gabor Somogyi <
>> > > > > > >>>>>>>>> gabor.g.somo...@gmail.com>
>> > > > > > >>>>>>>>>> wrote:
>> > > > > > >>>>>>>>>>
>> > > > > > >>>>>>>>>>> Hi Zaklely, Shengkai
>> > > > > > >>>>>>>>>>>
>> > > > > > >>>>>>>>>>> Several topics are going on so adding gist answers
>> to
>> > > them.
>> > > > > > >>>> When
>> > > > > > >>>>>>> some
>> > > > > > >>>>>>>>>> topic
>> > > > > > >>>>>>>>>>> is not touched please highlight it.
>> > > > > > >>>>>>>>>>>
>> > > > > > >>>>>>>>>>> @Shengkai: I've read through all the previous FLIPs
>> > > related
>> > > > > > >>>>>>> catalogs
>> > > > > > >>>>>>>>> and
>> > > > > > >>>>>>>>>> if
>> > > > > > >>>>>>>>>>> we would like to keep the concepts there
>> > > > > > >>>>>>>>>>> then one-to-one mapping relationship between
>> savepoint
>> > > and
>> > > > > > >>>>> catalog
>> > > > > > >>>>>>>> is a
>> > > > > > >>>>>>>>>>> reasonable direction. In short I'm happy that
>> > > > > > >>>>>>>>>>> you've highlighted this and agree as a whole. I've
>> > > written
>> > > > > > >> it
>> > > > > > >>>>> down
>> > > > > > >>>>>>>>>>> previously, just want to double confirm that state
>> > > catalog
>> > > > > > >> is
>> > > > > > >>>>>>>>>>> essential and planned. When we reach this point then
>> > your
>> > > > > > >>>> input
>> > > > > > >>>>> is
>> > > > > > >>>>>>>> more
>> > > > > > >>>>>>>>>>> than welcome.
>> > > > > > >>>>>>>>>>>
>> > > > > > >>>>>>>>>>> @Zakelly: We've tried the CLI and separate library
>> > > > > > >> approaches
>> > > > > > >>>>> with
>> > > > > > >>>>>>>>> users
>> > > > > > >>>>>>>>>>> already and these are not something which is welcome
>> > > > > > >> because
>> > > > > > >>>> of
>> > > > > > >>>>>>> the
>> > > > > > >>>>>>>>>>> following:
>> > > > > > >>>>>>>>>>> * Users want to have automated tasks and not manual
>> > > > > > >>>> CLI/library
>> > > > > > >>>>>>>> output
>> > > > > > >>>>>>>>>>> parsing. This can be hacked around but our
>> experience
>> > is
>> > > > > > >>>>> negative
>> > > > > > >>>>>>> on
>> > > > > > >>>>>>>>> this
>> > > > > > >>>>>>>>>>> because it's just brittle.
>> > > > > > >>>>>>>>>>> * From development perspective It's way much bigger
>> > > effort
>> > > > > > >>>> than
>> > > > > > >>>>> a
>> > > > > > >>>>>>>>>> connector
>> > > > > > >>>>>>>>>>> (hard to test, packaging/version handling is and
>> extra
>> > > > > > >> layer
>> > > > > > >>>> of
>> > > > > > >>>>>>>>>> complexity,
>> > > > > > >>>>>>>>>>> external FS authentication is pain for users,
>> expecting
>> > > > > > >> them
>> > > > > > >>>> to
>> > > > > > >>>>>>>>> download
>> > > > > > >>>>>>>>>>> savepoints also)
>> > > > > > >>>>>>>>>>> * Purely personal opinion but if we would find
>> better
>> > > ways
>> > > > > > >>>> later
>> > > > > > >>>>>>> then
>> > > > > > >>>>>>>>>>> retire a CLI is not more lightweight than retire a
>> > > > > > >> connector
>> > > > > > >>>>>>>>>>>
>> > > > > > >>>>>>>>>>>> It would be great if you give some examples on how
>> > user
>> > > > > > >>>> could
>> > > > > > >>>>>>>>> leverage
>> > > > > > >>>>>>>>>>> the separate connector to process the metadata.
>> > > > > > >>>>>>>>>>>
>> > > > > > >>>>>>>>>>> The most simplest cases:
>> > > > > > >>>>>>>>>>> * give me the overgroving state uids
>> > > > > > >>>>>>>>>>> * give me the not known (new or renamed) state uids
>> > > > > > >>>>>>>>>>> * give me the state uids where state size
>> drastically
>> > > > > > >> dropped
>> > > > > > >>>>>>> compare
>> > > > > > >>>>>>>>> to
>> > > > > > >>>>>>>>>> a
>> > > > > > >>>>>>>>>>> previous savepoint (accidental state loss)
>> > > > > > >>>>>>>>>>>
>> > > > > > >>>>>>>>>>> Since it was mentioned: as a general offtopic
>> teaser,
>> > > yeah
>> > > > > > >> it
>> > > > > > >>>>>>> would
>> > > > > > >>>>>>>> be
>> > > > > > >>>>>>>>>> good
>> > > > > > >>>>>>>>>>> to have some sort of checkpoint/savepoint lineage or
>> > > > > > >> however
>> > > > > > >>>> we
>> > > > > > >>>>>>> call
>> > > > > > >>>>>>>>> it.
>> > > > > > >>>>>>>>>>> Since we've not yet reached this point there are no
>> > > > > > >> technical
>> > > > > > >>>>>>>> details,
>> > > > > > >>>>>>>>>> it's
>> > > > > > >>>>>>>>>>> more like a vision. It's a common pattern that
>> > > > > > >>>>>>>>>>> jobs are physically running but somehow the state
>> > > > > > >> processing
>> > > > > > >>>> is
>> > > > > > >>>>>>> stuck
>> > > > > > >>>>>>>>> and
>> > > > > > >>>>>>>>>>> it would be good to add some way to find it out
>> > > > > > >>>> automatically.
>> > > > > > >>>>>>>>>>> The important saying here is automation and not
>> manual
>> > > > > > >>>>> evaluation
>> > > > > > >>>>>>>> since
>> > > > > > >>>>>>>>>>> handling 10k+ jobs is just not allowing that.
>> > > > > > >>>>>>>>>>>
>> > > > > > >>>>>>>>>>> BR,
>> > > > > > >>>>>>>>>>> G
>> > > > > > >>>>>>>>>>>
>> > > > > > >>>>>>>>>>>
>> > > > > > >>>>>>>>>>> On Wed, Mar 19, 2025 at 6:46β€―AM Shengkai Fang <
>> > > > > > >>>>> fskm...@gmail.com>
>> > > > > > >>>>>>>>> wrote:
>> > > > > > >>>>>>>>>>>
>> > > > > > >>>>>>>>>>>> Hi, All.
>> > > > > > >>>>>>>>>>>>
>> > > > > > >>>>>>>>>>>> About State Catalog, I want to share more thoughts
>> > about
>> > > > > > >>>> this.
>> > > > > > >>>>>>>>>>>>
>> > > > > > >>>>>>>>>>>> In the initial design concept, I understood that a
>> > > > > > >>>> savepoint
>> > > > > > >>>>>>> and a
>> > > > > > >>>>>>>>>> state
>> > > > > > >>>>>>>>>>>> catalog have a one-to-one mapping relationship.
>> Each
>> > > > > > >>>> operator
>> > > > > > >>>>>>>>>> corresponds
>> > > > > > >>>>>>>>>>>> to a database, and the state of each operator is
>> > > > > > >>>> represented
>> > > > > > >>>>> as
>> > > > > > >>>>>>>>>>> individual
>> > > > > > >>>>>>>>>>>> tables. The rationale behind this design is:
>> > > > > > >>>>>>>>>>>>
>> > > > > > >>>>>>>>>>>> *State Diversity*: An operator may involve multiple
>> > > types
>> > > > > > >>>> of
>> > > > > > >>>>>>>> states.
>> > > > > > >>>>>>>>>> For
>> > > > > > >>>>>>>>>>>> example, in our VVR design, a "multi-join" operator
>> > uses
>> > > > > > >>>> keyed
>> > > > > > >>>>>>>> states
>> > > > > > >>>>>>>>>> for
>> > > > > > >>>>>>>>>>>> two input streams and a broadcast state for the
>> third
>> > > > > > >>>> stream.
>> > > > > > >>>>>>> This
>> > > > > > >>>>>>>>>> makes
>> > > > > > >>>>>>>>>>> it
>> > > > > > >>>>>>>>>>>> challenging to represent all states of an operator
>> > > > > > >> within a
>> > > > > > >>>>>>> single
>> > > > > > >>>>>>>>>> table.
>> > > > > > >>>>>>>>>>>> *Scalability*: Internally, an operator might have
>> > > > > > >> multiple
>> > > > > > >>>>> keyed
>> > > > > > >>>>>>>>> states
>> > > > > > >>>>>>>>>>>> (e.g., value state and list state). However, large
>> > list
>> > > > > > >>>> states
>> > > > > > >>>>>>> may
>> > > > > > >>>>>>>>> not
>> > > > > > >>>>>>>>>>> fit
>> > > > > > >>>>>>>>>>>> entirely in memory. To address this, we recommend
>> > > > > > >>>> implementing
>> > > > > > >>>>>>> each
>> > > > > > >>>>>>>>>> state
>> > > > > > >>>>>>>>>>>> as a separate table.
>> > > > > > >>>>>>>>>>>>
>> > > > > > >>>>>>>>>>>> To resolve the loosely coupled relationships
>> between
>> > > > > > >>>> operator
>> > > > > > >>>>>>>> states,
>> > > > > > >>>>>>>>>> we
>> > > > > > >>>>>>>>>>>> propose embedding predefined views within the
>> catalog.
>> > > > > > >>>> These
>> > > > > > >>>>>>> views
>> > > > > > >>>>>>>>>>> simplify
>> > > > > > >>>>>>>>>>>> user understanding of operator implementations and
>> > > > > > >> provide
>> > > > > > >>>> a
>> > > > > > >>>>>>> more
>> > > > > > >>>>>>>>>>> intuitive
>> > > > > > >>>>>>>>>>>> perspective. For instance, a join operator may have
>> > > > > > >>>> multiple
>> > > > > > >>>>>>> state
>> > > > > > >>>>>>>>>>>> implementations (depending on whether the join key
>> > > > > > >> includes
>> > > > > > >>>>>>> unique
>> > > > > > >>>>>>>>>>>> attributes), but users primarily care about the
>> data
>> > > > > > >>>>> associated
>> > > > > > >>>>>>>> with
>> > > > > > >>>>>>>>> a
>> > > > > > >>>>>>>>>>>> specific join key across input streams.
>> > > > > > >>>>>>>>>>>>
>> > > > > > >>>>>>>>>>>> Returning to the one-to-one mapping between
>> savepoints
>> > > > > > >> and
>> > > > > > >>>>>>>> catalogs,
>> > > > > > >>>>>>>>> we
>> > > > > > >>>>>>>>>>> aim
>> > > > > > >>>>>>>>>>>> to manage multiple user state catalogs through a
>> > catalog
>> > > > > > >>>>> store.
>> > > > > > >>>>>>>> When
>> > > > > > >>>>>>>>> a
>> > > > > > >>>>>>>>>>> user
>> > > > > > >>>>>>>>>>>> triggers a savepoint for a job on the platform:
>> > > > > > >>>>>>>>>>>>
>> > > > > > >>>>>>>>>>>> 1. The platform sends a REST request to the
>> > JobManager.
>> > > > > > >>>>>>>>>>>> 2. Simultaneously, it registers a new state
>> catalog in
>> > > > > > >> the
>> > > > > > >>>>>>> catalog
>> > > > > > >>>>>>>>>> store,
>> > > > > > >>>>>>>>>>>> enabling immediate analysis of state data on the
>> > > > > > >> platform.
>> > > > > > >>>>>>>>>>>> 3. Deleting a savepoint would also trigger the
>> removal
>> > > of
>> > > > > > >>>> its
>> > > > > > >>>>>>>>>> associated
>> > > > > > >>>>>>>>>>>> catalog.
>> > > > > > >>>>>>>>>>>>
>> > > > > > >>>>>>>>>>>> This vision assumes that states are
>> self-describing or
>> > > > > > >>>> that a
>> > > > > > >>>>>>> state
>> > > > > > >>>>>>>>>>>> metaservice is introduced to analyze savepoint
>> > > > > > >> structures.
>> > > > > > >>>>>>>>>>>>
>> > > > > > >>>>>>>>>>>>> How can users create logic to identify differences
>> > > > > > >>>> between
>> > > > > > >>>>>>>> multiple
>> > > > > > >>>>>>>>>>>> savepoints?
>> > > > > > >>>>>>>>>>>>
>> > > > > > >>>>>>>>>>>> Since savepoints and state catalogs are one-to-one
>> > > > > > >> mapped,
>> > > > > > >>>>> users
>> > > > > > >>>>>>>> can
>> > > > > > >>>>>>>>>>> query
>> > > > > > >>>>>>>>>>>> metadata via their respective catalogs. For
>> example:
>> > > > > > >>>>>>>>>>>>
>> > > > > > >>>>>>>>>>>> 1.
>> > > > > > >>>>>
>> `savepoint-${id}`.`system`.`metadata_table`.`<operator-name>`
>> > > > > > >>>>>>>>>> provides
>> > > > > > >>>>>>>>>>>> operator-specific metadata (e.g., state size,
>> type).
>> > > > > > >>>>>>>>>>>> 2. Comparing metadata tables (e.g., schema
>> versions,
>> > > > > > >> state
>> > > > > > >>>>> entry
>> > > > > > >>>>>>>>>> counts)
>> > > > > > >>>>>>>>>>>> across catalogs reveals structural or quantitative
>> > > > > > >>>>> differences.
>> > > > > > >>>>>>>>>>>> 3. For deeper analysis, users could write SQL
>> queries
>> > to
>> > > > > > >>>>> compare
>> > > > > > >>>>>>>>>> specific
>> > > > > > >>>>>>>>>>>> state partitions or leverage the metaservice to
>> track
>> > > > > > >> state
>> > > > > > >>>>>>>> evolution
>> > > > > > >>>>>>>>>>>> (e.g., added/removed operators, modified state
>> > > > > > >>>>> configurations).
>> > > > > > >>>>>>>>>>>>
>> > > > > > >>>>>>>>>>>> If we plan to introduce a state catalog in the
>> > future, I
>> > > > > > >>>> would
>> > > > > > >>>>>>> lean
>> > > > > > >>>>>>>>>>> toward
>> > > > > > >>>>>>>>>>>> using metadata tables. If a utility tool can
>> address
>> > the
>> > > > > > >>>>>>> challenges
>> > > > > > >>>>>>>>> we
>> > > > > > >>>>>>>>>>>> face, could we avoid introducing an additional
>> > > connector?
>> > > > > > >>>>>>>>>>>>
>> > > > > > >>>>>>>>>>>> Best,
>> > > > > > >>>>>>>>>>>> Shengkai
>> > > > > > >>>>>>>>>>>>
>> > > > > > >>>>>>>>>>>> Gyula FΓ³ra <gyula.f...@gmail.com> 于2025εΉ΄3月17ζ—₯周一
>> > > 20:25ε†™ι“οΌš
>> > > > > > >>>>>>>>>>>>
>> > > > > > >>>>>>>>>>>>> Hi All!
>> > > > > > >>>>>>>>>>>>>
>> > > > > > >>>>>>>>>>>>> Without going into too much detail here are my 2
>> > cents
>> > > > > > >>>>>>> regarding
>> > > > > > >>>>>>>>> the
>> > > > > > >>>>>>>>>>>>> virtual column / catalog metadata / table
>> (connector)
>> > > > > > >>>>>>> discussion
>> > > > > > >>>>>>>>> for
>> > > > > > >>>>>>>>>>> the
>> > > > > > >>>>>>>>>>>>> State metadata.
>> > > > > > >>>>>>>>>>>>>
>> > > > > > >>>>>>>>>>>>> State metadata such as the types of states, their
>> > > > > > >>>>> properties,
>> > > > > > >>>>>>>>> names,
>> > > > > > >>>>>>>>>>>> sizes
>> > > > > > >>>>>>>>>>>>> etc are all valuable information that can be used
>> to
>> > > > > > >>>> enrich
>> > > > > > >>>>>>> the
>> > > > > > >>>>>>>>>>>>> computations we do on state.
>> > > > > > >>>>>>>>>>>>> We can either analyze it standalone (such as
>> discover
>> > > > > > >>>>>>> anomalies,
>> > > > > > >>>>>>>>> for
>> > > > > > >>>>>>>>>>>> large
>> > > > > > >>>>>>>>>>>>> jobs with many states), across multiple savepoints
>> > > > > > >>>> (discover
>> > > > > > >>>>>>> how
>> > > > > > >>>>>>>>>> state
>> > > > > > >>>>>>>>>>>>> changed over time) or by joining it with keyed or
>> > > > > > >>>> non-keyed
>> > > > > > >>>>>>> state
>> > > > > > >>>>>>>>>> data
>> > > > > > >>>>>>>>>>> to
>> > > > > > >>>>>>>>>>>>> serve more complex queries on the state.
>> > > > > > >>>>>>>>>>>>>
>> > > > > > >>>>>>>>>>>>> The only solution that seems to serve all these
>> > > > > > >> use-cases
>> > > > > > >>>>> and
>> > > > > > >>>>>>>>>>>> requirements
>> > > > > > >>>>>>>>>>>>> in a straightforward and SQL canonical way is to
>> > simply
>> > > > > > >>>>> expose
>> > > > > > >>>>>>>> the
>> > > > > > >>>>>>>>>>> state
>> > > > > > >>>>>>>>>>>>> metadata as a separate table. This is a metadata
>> > table
>> > > > > > >>>> but
>> > > > > > >>>>> you
>> > > > > > >>>>>>>> can
>> > > > > > >>>>>>>>>> also
>> > > > > > >>>>>>>>>>>>> think of it as data table, it makes no practical
>> > > > > > >>>> difference
>> > > > > > >>>>>>> here.
>> > > > > > >>>>>>>>>>>>>
>> > > > > > >>>>>>>>>>>>> Once we have a catalog later, the catalog can
>> offer
>> > > > > > >> this
>> > > > > > >>>>> table
>> > > > > > >>>>>>>> out
>> > > > > > >>>>>>>>> of
>> > > > > > >>>>>>>>>>> the
>> > > > > > >>>>>>>>>>>>> box, the same way databases provide metadata
>> tables.
>> > > > > > >> For
>> > > > > > >>>>> this
>> > > > > > >>>>>>> to
>> > > > > > >>>>>>>>> work
>> > > > > > >>>>>>>>>>>>> however we need another, simpler connector that
>> > creates
>> > > > > > >>>> this
>> > > > > > >>>>>>>> table.
>> > > > > > >>>>>>>>>>>>>
>> > > > > > >>>>>>>>>>>>> +1 for state metadata as a separate
>> connector/table,
>> > > > > > >>>> instead
>> > > > > > >>>>>>> of
>> > > > > > >>>>>>>>>> adding
>> > > > > > >>>>>>>>>>>>> virtual columns and adhoc catalog metadata that is
>> > hard
>> > > > > > >>>> to
>> > > > > > >>>>> use
>> > > > > > >>>>>>>> in a
>> > > > > > >>>>>>>>>>> large
>> > > > > > >>>>>>>>>>>>> number of queries.
>> > > > > > >>>>>>>>>>>>>
>> > > > > > >>>>>>>>>>>>> Cheers,
>> > > > > > >>>>>>>>>>>>> Gyula
>> > > > > > >>>>>>>>>>>>>
>> > > > > > >>>>>>>>>>>>> On Mon, Mar 17, 2025 at 12:44β€―PM Gabor Somogyi <
>> > > > > > >>>>>>>>>>>> gabor.g.somo...@gmail.com>
>> > > > > > >>>>>>>>>>>>> wrote:
>> > > > > > >>>>>>>>>>>>>
>> > > > > > >>>>>>>>>>>>>> 1. State TTL for Value Columns
>> > > > > > >>>>>>>>>>>>>>
>> > > > > > >>>>>>>>>>>>>>> I’m planning on adding this, and we may
>> collaborate
>> > > > > > >>>> on
>> > > > > > >>>>> it
>> > > > > > >>>>>>> in
>> > > > > > >>>>>>>>> the
>> > > > > > >>>>>>>>>>>>> future.
>> > > > > > >>>>>>>>>>>>>>
>> > > > > > >>>>>>>>>>>>>> +1 on this, just ping me.
>> > > > > > >>>>>>>>>>>>>>
>> > > > > > >>>>>>>>>>>>>> 2. Metadata Table vs. Metadata Column
>> > > > > > >>>>>>>>>>>>>>
>> > > > > > >>>>>>>>>>>>>> After some code digging and POC all I can say
>> that
>> > > > > > >> with
>> > > > > > >>>>>>> heavy
>> > > > > > >>>>>>>>>> effort
>> > > > > > >>>>>>>>>>> we
>> > > > > > >>>>>>>>>>>>> can
>> > > > > > >>>>>>>>>>>>>> maybe add such changes that we're able to show
>> > > > > > >> metadata
>> > > > > > >>>>> of a
>> > > > > > >>>>>>>>>>> savepoint
>> > > > > > >>>>>>>>>>>>> from
>> > > > > > >>>>>>>>>>>>>> catalog.
>> > > > > > >>>>>>>>>>>>>> I'm not against that but from user perspective
>> this
>> > > > > > >> has
>> > > > > > >>>>>>> limited
>> > > > > > >>>>>>>>>>> value,
>> > > > > > >>>>>>>>>>>>> let
>> > > > > > >>>>>>>>>>>>>> me explain why.
>> > > > > > >>>>>>>>>>>>>>
>> > > > > > >>>>>>>>>>>>>> From high level perspective I see the following
>> > > > > > >> which I
>> > > > > > >>>>> see
>> > > > > > >>>>>>>>>> agreement
>> > > > > > >>>>>>>>>>>> on:
>> > > > > > >>>>>>>>>>>>>> * We should have a catalog which is representing
>> one
>> > > > > > >> or
>> > > > > > >>>>> more
>> > > > > > >>>>>>>> jobs
>> > > > > > >>>>>>>>>>>>> savepoint
>> > > > > > >>>>>>>>>>>>>> data set (future plan)
>> > > > > > >>>>>>>>>>>>>> * Savepoints should be able to be registered in
>> the
>> > > > > > >>>>> catalog
>> > > > > > >>>>>>>> which
>> > > > > > >>>>>>>>>> are
>> > > > > > >>>>>>>>>>>>> then
>> > > > > > >>>>>>>>>>>>>> databases (future plan)
>> > > > > > >>>>>>>>>>>>>> * There must be a possiblity to create tables
>> from
>> > > > > > >>>>> databases
>> > > > > > >>>>>>>>> where
>> > > > > > >>>>>>>>>>>> users
>> > > > > > >>>>>>>>>>>>>> can read state data (exists already)
>> > > > > > >>>>>>>>>>>>>>
>> > > > > > >>>>>>>>>>>>>> In terms of metadata, If I understand correctly
>> then
>> > > > > > >>>> the
>> > > > > > >>>>>>>>> suggested
>> > > > > > >>>>>>>>>>>>> approach
>> > > > > > >>>>>>>>>>>>>> would be to access
>> > > > > > >>>>>>>>>>>>>> it from the catalog describe command, right?
>> Adding
>> > > > > > >>>> that
>> > > > > > >>>>>>> info
>> > > > > > >>>>>>>>> when
>> > > > > > >>>>>>>>>>>>> specific
>> > > > > > >>>>>>>>>>>>>> database describe command
>> > > > > > >>>>>>>>>>>>>> is executed could be done.
>> > > > > > >>>>>>>>>>>>>>
>> > > > > > >>>>>>>>>>>>>> The question is for instance how can users create
>> > > > > > >> such
>> > > > > > >>>> a
>> > > > > > >>>>>>> logic
>> > > > > > >>>>>>>>> that
>> > > > > > >>>>>>>>>>>> tells
>> > > > > > >>>>>>>>>>>>>> them what is
>> > > > > > >>>>>>>>>>>>>> the difference between multiple savepoints?
>> > > > > > >>>>>>>>>>>>>> Just to give some examples:
>> > > > > > >>>>>>>>>>>>>> * per operator size changes between savepoints
>> > > > > > >>>>>>>>>>>>>> * show values from operator data where state size
>> > > > > > >>>> reaches
>> > > > > > >>>>> a
>> > > > > > >>>>>>>>>> boundary
>> > > > > > >>>>>>>>>>>>>> * in general "find which checkpoint ruined
>> things"
>> > is
>> > > > > > >>>>> quite
>> > > > > > >>>>>>>>> common
>> > > > > > >>>>>>>>>>>>> pattern
>> > > > > > >>>>>>>>>>>>>> What I would like to highlight here is that from
>> > > > > > >> Flink
>> > > > > > >>>>>>> point of
>> > > > > > >>>>>>>>>> view
>> > > > > > >>>>>>>>>>>> the
>> > > > > > >>>>>>>>>>>>>> metadata can be
>> > > > > > >>>>>>>>>>>>>> considered as a static side output information
>> but
>> > > > > > >> for
>> > > > > > >>>>> users
>> > > > > > >>>>>>>>> these
>> > > > > > >>>>>>>>>>>> values
>> > > > > > >>>>>>>>>>>>>> are actual real data
>> > > > > > >>>>>>>>>>>>>> where logic is planned to build around.
>> > > > > > >>>>>>>>>>>>>>
>> > > > > > >>>>>>>>>>>>>>> The metadata is more like one-time information
>> > > > > > >>>> instead
>> > > > > > >>>>> of
>> > > > > > >>>>>>> a
>> > > > > > >>>>>>>>>>> streaming
>> > > > > > >>>>>>>>>>>>>> data that changes all
>> > > > > > >>>>>>>>>>>>>> the time, so a single connector seems to be an
>> > > > > > >>>> overkill.
>> > > > > > >>>>>>>>>>>>>>
>> > > > > > >>>>>>>>>>>>>> State data is also static within a savepoint and
>> > > > > > >> that's
>> > > > > > >>>>> the
>> > > > > > >>>>>>>>> reason
>> > > > > > >>>>>>>>>>> why
>> > > > > > >>>>>>>>>>>>> the
>> > > > > > >>>>>>>>>>>>>> state processor API is working in batch mode.
>> > > > > > >>>>>>>>>>>>>> When we handle multiple checkpoints in a
>> streaming
>> > > > > > >>>> fashion
>> > > > > > >>>>>>> then
>> > > > > > >>>>>>>>>> this
>> > > > > > >>>>>>>>>>>> can
>> > > > > > >>>>>>>>>>>>> be
>> > > > > > >>>>>>>>>>>>>> viewed from another angle.
>> > > > > > >>>>>>>>>>>>>>
>> > > > > > >>>>>>>>>>>>>> We can come up with more lightweight solution
>> other
>> > > > > > >>>> than a
>> > > > > > >>>>>>> new
>> > > > > > >>>>>>>>>>>> connector
>> > > > > > >>>>>>>>>>>>>> but enforcing users to parse the catalog
>> > > > > > >>>>>>>>>>>>>> describe command output in order to compare
>> multiple
>> > > > > > >>>>>>> savepoints
>> > > > > > >>>>>>>>>>> doesn't
>> > > > > > >>>>>>>>>>>>>> sound smooth user experience.
>> > > > > > >>>>>>>>>>>>>> Honestly I've no other idea how exposing
>> metadata as
>> > > > > > >>>> real
>> > > > > > >>>>>>> user
>> > > > > > >>>>>>>>> data
>> > > > > > >>>>>>>>>>> so
>> > > > > > >>>>>>>>>>>>>> waiting on other approaches.
>> > > > > > >>>>>>>>>>>>>>
>> > > > > > >>>>>>>>>>>>>> BR,
>> > > > > > >>>>>>>>>>>>>> G
>> > > > > > >>>>>>>>>>>>>>
>> > > > > > >>>>>>>>>>>>>>
>> > > > > > >>>>>>>>>>>>>> On Thu, Mar 13, 2025 at 2:44β€―AM Shengkai Fang <
>> > > > > > >>>>>>>> fskm...@gmail.com
>> > > > > > >>>>>>>>>>
>> > > > > > >>>>>>>>>>>> wrote:
>> > > > > > >>>>>>>>>>>>>>
>> > > > > > >>>>>>>>>>>>>>> Looking forward to hearing the good news!
>> > > > > > >>>>>>>>>>>>>>>
>> > > > > > >>>>>>>>>>>>>>> Best,
>> > > > > > >>>>>>>>>>>>>>> Shengkai
>> > > > > > >>>>>>>>>>>>>>>
>> > > > > > >>>>>>>>>>>>>>> Gabor Somogyi <gabor.g.somo...@gmail.com>
>> > > > > > >>>> 于2025εΉ΄3月12ζ—₯周三
>> > > > > > >>>>>>>>> 22:24ε†™ι“οΌš
>> > > > > > >>>>>>>>>>>>>>>
>> > > > > > >>>>>>>>>>>>>>>> Thanks for both the valuable input!
>> > > > > > >>>>>>>>>>>>>>>>
>> > > > > > >>>>>>>>>>>>>>>> Let me take a closer look at the suggestions,
>> > > > > > >> like
>> > > > > > >>>> the
>> > > > > > >>>>>>>>> Catalog
>> > > > > > >>>>>>>>>>>>>>> capabilities
>> > > > > > >>>>>>>>>>>>>>>> and possibility of embedding TypeInformation or
>> > > > > > >>>>>>>>>>>>>>>> StateDescriptor metadata directly into the raw
>> > > > > > >>>> state
>> > > > > > >>>>>>>> files...
>> > > > > > >>>>>>>>>>>>>>>>
>> > > > > > >>>>>>>>>>>>>>>> BR,
>> > > > > > >>>>>>>>>>>>>>>> G
>> > > > > > >>>>>>>>>>>>>>>>
>> > > > > > >>>>>>>>>>>>>>>>
>> > > > > > >>>>>>>>>>>>>>>> On Wed, Mar 12, 2025 at 8:17β€―AM Shengkai Fang <
>> > > > > > >>>>>>>>>> fskm...@gmail.com
>> > > > > > >>>>>>>>>>>>
>> > > > > > >>>>>>>>>>>>>> wrote:
>> > > > > > >>>>>>>>>>>>>>>>
>> > > > > > >>>>>>>>>>>>>>>>> Thanks for Zakelly's clarification.
>> > > > > > >>>>>>>>>>>>>>>>>
>> > > > > > >>>>>>>>>>>>>>>>> 1. State TTL for Value Columns
>> > > > > > >>>>>>>>>>>>>>>>>
>> > > > > > >>>>>>>>>>>>>>>>> +1 to delay the discussion about this.
>> > > > > > >>>>>>>>>>>>>>>>>
>> > > > > > >>>>>>>>>>>>>>>>> 2. Metadata Table vs. Metadata Column
>> > > > > > >>>>>>>>>>>>>>>>>
>> > > > > > >>>>>>>>>>>>>>>>> I’d like to share my perspective on the State
>> > > > > > >>>>> Catalog
>> > > > > > >>>>>>>>>> proposal.
>> > > > > > >>>>>>>>>>>>> While
>> > > > > > >>>>>>>>>>>>>>>>> introducing this capability is beneficial,
>> > > > > > >> there
>> > > > > > >>>> is
>> > > > > > >>>>> a
>> > > > > > >>>>>>>>>> blocker:
>> > > > > > >>>>>>>>>>>> the
>> > > > > > >>>>>>>>>>>>>>>> current
>> > > > > > >>>>>>>>>>>>>>>>> StateBackend architecture does not permit
>> > > > > > >>>> operators
>> > > > > > >>>>> to
>> > > > > > >>>>>>>>> encode
>> > > > > > >>>>>>>>>>>>>>>>> TypeInformation into the stateβ€”it only
>> > > > > > >> preserves
>> > > > > > >>>> the
>> > > > > > >>>>>>>>>>> Serializer.
>> > > > > > >>>>>>>>>>>>> This
>> > > > > > >>>>>>>>>>>>>>>>> limitation creates an asymmetry, as operators
>> > > > > > >>>> alone
>> > > > > > >>>>>>>> retain
>> > > > > > >>>>>>>>>>>>> knowledge
>> > > > > > >>>>>>>>>>>>>> of
>> > > > > > >>>>>>>>>>>>>>>> the
>> > > > > > >>>>>>>>>>>>>>>>> data structure’s schema.
>> > > > > > >>>>>>>>>>>>>>>>>
>> > > > > > >>>>>>>>>>>>>>>>> To address this, I suggest allowing operators
>> > > > > > >> to
>> > > > > > >>>>> embed
>> > > > > > >>>>>>>>>>>>>> TypeInformation
>> > > > > > >>>>>>>>>>>>>>> or
>> > > > > > >>>>>>>>>>>>>>>>> StateDescriptor metadata directly into the raw
>> > > > > > >>>> state
>> > > > > > >>>>>>>> files.
>> > > > > > >>>>>>>>>>> Such
>> > > > > > >>>>>>>>>>>> a
>> > > > > > >>>>>>>>>>>>>>> design
>> > > > > > >>>>>>>>>>>>>>>>> would enable the Catalog to:
>> > > > > > >>>>>>>>>>>>>>>>>
>> > > > > > >>>>>>>>>>>>>>>>> 1. Parse state files and programmatically
>> > > > > > >> derive
>> > > > > > >>>> the
>> > > > > > >>>>>>>> schema
>> > > > > > >>>>>>>>>> and
>> > > > > > >>>>>>>>>>>>>>>> structural
>> > > > > > >>>>>>>>>>>>>>>>> guarantees for each state.
>> > > > > > >>>>>>>>>>>>>>>>> 2. Leverage existing Flink Table utilities,
>> > > > > > >> such
>> > > > > > >>>> as
>> > > > > > >>>>>>>>>>>>>>>>> LegacyTypeInfoDataTypeConverter (in
>> > > > > > >>>>>>>>>>>>>>> org.apache.flink.table.types.utils),
>> > > > > > >>>>>>>>>>>>>>>> to
>> > > > > > >>>>>>>>>>>>>>>>> bridge TypeInformation and DataType
>> > > > > > >> conversions.
>> > > > > > >>>>>>>>>>>>>>>>>
>> > > > > > >>>>>>>>>>>>>>>>> If we can not store the TypeInformation or
>> > > > > > >>>>>>>> StateDescriptor
>> > > > > > >>>>>>>>>> into
>> > > > > > >>>>>>>>>>>> the
>> > > > > > >>>>>>>>>>>>>> raw
>> > > > > > >>>>>>>>>>>>>>>>> state files, I am +1 for this FLIP to use
>> > > > > > >>>> metadata
>> > > > > > >>>>>>> column
>> > > > > > >>>>>>>>> to
>> > > > > > >>>>>>>>>>>>> retrieve
>> > > > > > >>>>>>>>>>>>>>>>> information.
>> > > > > > >>>>>>>>>>>>>>>>>
>> > > > > > >>>>>>>>>>>>>>>>> Best,
>> > > > > > >>>>>>>>>>>>>>>>> Shengkai
>> > > > > > >>>>>>>>>>>>>>>>>
>> > > > > > >>>>>>>>>>>>>>>>>
>> > > > > > >>>>>>>>>>>>>>>>>
>> > > > > > >>>>>>>>>>>>>>>>>
>> > > > > > >>>>>>>>>>>>>>>>> Zakelly Lan <zakelly....@gmail.com>
>> > > > > > >>>> 于2025εΉ΄3月12ζ—₯周三
>> > > > > > >>>>>>>> 12:43ε†™ι“οΌš
>> > > > > > >>>>>>>>>>>>>>>>>
>> > > > > > >>>>>>>>>>>>>>>>>> Hi Gabor and Shengkai,
>> > > > > > >>>>>>>>>>>>>>>>>>
>> > > > > > >>>>>>>>>>>>>>>>>> Thanks for sharing your thoughts! This is a
>> > > > > > >>>> long
>> > > > > > >>>>>>>>> discussion
>> > > > > > >>>>>>>>>>> and
>> > > > > > >>>>>>>>>>>>>> sorry
>> > > > > > >>>>>>>>>>>>>>>> for
>> > > > > > >>>>>>>>>>>>>>>>>> the late reply (I'm busy catching up with
>> > > > > > >>>> release
>> > > > > > >>>>>>> 2.0
>> > > > > > >>>>>>>>> these
>> > > > > > >>>>>>>>>>>>> days).
>> > > > > > >>>>>>>>>>>>>>>>>>
>> > > > > > >>>>>>>>>>>>>>>>>> 1. State TTL for Value Columns
>> > > > > > >>>>>>>>>>>>>>>>>>
>> > > > > > >>>>>>>>>>>>>>>>>>
>> > > > > > >>>>>>>>>>>>>>>>>> Let me first clarify your thoughts to ensure
>> > > > > > >> I
>> > > > > > >>>>>>>> understand
>> > > > > > >>>>>>>>>>>>>> correctly.
>> > > > > > >>>>>>>>>>>>>>>>> IIUC,
>> > > > > > >>>>>>>>>>>>>>>>>> there is no persistent configuration for
>> > > > > > >> state
>> > > > > > >>>> TTL
>> > > > > > >>>>>>> in
>> > > > > > >>>>>>>> the
>> > > > > > >>>>>>>>>>>>>> checkpoint.
>> > > > > > >>>>>>>>>>>>>>>>> While
>> > > > > > >>>>>>>>>>>>>>>>>> you can infer that TTL is enabled by reading
>> > > > > > >>>> the
>> > > > > > >>>>>>>>>> serializer,
>> > > > > > >>>>>>>>>>>> the
>> > > > > > >>>>>>>>>>>>>>>>> checkpoint
>> > > > > > >>>>>>>>>>>>>>>>>> itself only stores the last access time for
>> > > > > > >>>> each
>> > > > > > >>>>>>> value.
>> > > > > > >>>>>>>>> So
>> > > > > > >>>>>>>>>>> the
>> > > > > > >>>>>>>>>>>>> only
>> > > > > > >>>>>>>>>>>>>>>> thing
>> > > > > > >>>>>>>>>>>>>>>>>> we can show is the last access time for each
>> > > > > > >>>>> value.
>> > > > > > >>>>>>> But
>> > > > > > >>>>>>>>> it
>> > > > > > >>>>>>>>>> is
>> > > > > > >>>>>>>>>>>> not
>> > > > > > >>>>>>>>>>>>>>>>> required
>> > > > > > >>>>>>>>>>>>>>>>>> for all state backends to store this, as they
>> > > > > > >>>> may
>> > > > > > >>>>>>>>> directly
>> > > > > > >>>>>>>>>>>> store
>> > > > > > >>>>>>>>>>>>>> the
>> > > > > > >>>>>>>>>>>>>>>>>> expired time. This will also increase the
>> > > > > > >>>>>>> difficulty of
>> > > > > > >>>>>>>>>>>>>>> implementation
>> > > > > > >>>>>>>>>>>>>>>> &
>> > > > > > >>>>>>>>>>>>>>>>>> maintenance.
>> > > > > > >>>>>>>>>>>>>>>>>>
>> > > > > > >>>>>>>>>>>>>>>>>> This once again reiterates the importance of
>> > > > > > >>>>> unified
>> > > > > > >>>>>>>>>> metadata
>> > > > > > >>>>>>>>>>>> for
>> > > > > > >>>>>>>>>>>>>>>>>> checkpoints. I’m planning on adding this, and
>> > > > > > >>>> we
>> > > > > > >>>>> may
>> > > > > > >>>>>>>>>>>> collaborate
>> > > > > > >>>>>>>>>>>>> on
>> > > > > > >>>>>>>>>>>>>>> it
>> > > > > > >>>>>>>>>>>>>>>> in
>> > > > > > >>>>>>>>>>>>>>>>>> the future.
>> > > > > > >>>>>>>>>>>>>>>>>>
>> > > > > > >>>>>>>>>>>>>>>>>> 2. Metadata Table vs. Metadata Column
>> > > > > > >>>>>>>>>>>>>>>>>>
>> > > > > > >>>>>>>>>>>>>>>>>>
>> > > > > > >>>>>>>>>>>>>>>>>> I'm not in favor of adding a new connector
>> > > > > > >> for
>> > > > > > >>>>>>>> metadata.
>> > > > > > >>>>>>>>>> The
>> > > > > > >>>>>>>>>>>>>> metadata
>> > > > > > >>>>>>>>>>>>>>>> is
>> > > > > > >>>>>>>>>>>>>>>>>> more like one-time information instead of a
>> > > > > > >>>>>>> streaming
>> > > > > > >>>>>>>>> data
>> > > > > > >>>>>>>>>>> that
>> > > > > > >>>>>>>>>>>>>>> changes
>> > > > > > >>>>>>>>>>>>>>>>> all
>> > > > > > >>>>>>>>>>>>>>>>>> the time, so a single connector seems to be
>> > > > > > >> an
>> > > > > > >>>>>>>> overkill.
>> > > > > > >>>>>>>>> It
>> > > > > > >>>>>>>>>>> is
>> > > > > > >>>>>>>>>>>>> not
>> > > > > > >>>>>>>>>>>>>>> easy
>> > > > > > >>>>>>>>>>>>>>>>> to
>> > > > > > >>>>>>>>>>>>>>>>>> withdraw a connector if we have a better
>> > > > > > >>>> solution
>> > > > > > >>>>> in
>> > > > > > >>>>>>>>>> future.
>> > > > > > >>>>>>>>>>>> I'm
>> > > > > > >>>>>>>>>>>>>> not
>> > > > > > >>>>>>>>>>>>>>>>>> familiar with current Catalog capabilities,
>> > > > > > >>>> and if
>> > > > > > >>>>>>> it
>> > > > > > >>>>>>>>> could
>> > > > > > >>>>>>>>>>>>> extract
>> > > > > > >>>>>>>>>>>>>>> and
>> > > > > > >>>>>>>>>>>>>>>>>> show some operator-level information from
>> > > > > > >>>>> savepoint,
>> > > > > > >>>>>>>> that
>> > > > > > >>>>>>>>>>> would
>> > > > > > >>>>>>>>>>>>> be
>> > > > > > >>>>>>>>>>>>>>>> great.
>> > > > > > >>>>>>>>>>>>>>>>>>
>> > > > > > >>>>>>>>>>>>>>>>>> If the Catalog can't do that, I would
>> > > > > > >> consider
>> > > > > > >>>> the
>> > > > > > >>>>>>>>> current
>> > > > > > >>>>>>>>>>> FLIP
>> > > > > > >>>>>>>>>>>>> to
>> > > > > > >>>>>>>>>>>>>>> be a
>> > > > > > >>>>>>>>>>>>>>>>>> compromise solution.
>> > > > > > >>>>>>>>>>>>>>>>>>
>> > > > > > >>>>>>>>>>>>>>>>>> And if we have that unified metadata for
>> > > > > > >>>>>>>>>> checkpoint/savepoint
>> > > > > > >>>>>>>>>>>> in
>> > > > > > >>>>>>>>>>>>>>>> future,
>> > > > > > >>>>>>>>>>>>>>>>> we
>> > > > > > >>>>>>>>>>>>>>>>>> may directly register savepoint in catalog,
>> > > > > > >> and
>> > > > > > >>>>>>> create
>> > > > > > >>>>>>>> a
>> > > > > > >>>>>>>>>>> source
>> > > > > > >>>>>>>>>>>>>>> without
>> > > > > > >>>>>>>>>>>>>>>>>> specifying complex columns, as well as
>> > > > > > >> describe
>> > > > > > >>>>> the
>> > > > > > >>>>>>>>>> savepoint
>> > > > > > >>>>>>>>>>>>>> catalog
>> > > > > > >>>>>>>>>>>>>>>> to
>> > > > > > >>>>>>>>>>>>>>>>>> get the metadata. That's a good solution in
>> > > > > > >> my
>> > > > > > >>>>> mind.
>> > > > > > >>>>>>>>>>>>>>>>>>
>> > > > > > >>>>>>>>>>>>>>>>>>
>> > > > > > >>>>>>>>>>>>>>>>>> Best,
>> > > > > > >>>>>>>>>>>>>>>>>> Zakelly
>> > > > > > >>>>>>>>>>>>>>>>>>
>> > > > > > >>>>>>>>>>>>>>>>>> On Wed, Mar 12, 2025 at 10:35β€―AM Shengkai
>> > > > > > >> Fang
>> > > > > > >>>> <
>> > > > > > >>>>>>>>>>>>> fskm...@gmail.com>
>> > > > > > >>>>>>>>>>>>>>>>> wrote:
>> > > > > > >>>>>>>>>>>>>>>>>>
>> > > > > > >>>>>>>>>>>>>>>>>>> Hi Gabor,
>> > > > > > >>>>>>>>>>>>>>>>>>>
>> > > > > > >>>>>>>>>>>>>>>>>>>> 2. Adding a new connector with
>> > > > > > >>>>>>> `savepoint-metadata`
>> > > > > > >>>>>>>>>>>>>>>>>>>
>> > > > > > >>>>>>>>>>>>>>>>>>> I would argue against introducing a new
>> > > > > > >>>>> connector
>> > > > > > >>>>>>>> type
>> > > > > > >>>>>>>>>>> named
>> > > > > > >>>>>>>>>>>>>>>>>>> savepoint-metadata, as the existing Catalog
>> > > > > > >>>>>>> mechanism
>> > > > > > >>>>>>>>> can
>> > > > > > >>>>>>>>>>>>>>> inherently
>> > > > > > >>>>>>>>>>>>>>>>>>> provide the necessary connector factory
>> > > > > > >>>>>>> capabilities.
>> > > > > > >>>>>>>>>> I’ve
>> > > > > > >>>>>>>>>>>>>> detailed
>> > > > > > >>>>>>>>>>>>>>>>> this
>> > > > > > >>>>>>>>>>>>>>>>>>> proposal in branch[1]. Please take a moment
>> > > > > > >>>> to
>> > > > > > >>>>>>> review
>> > > > > > >>>>>>>>> it.
>> > > > > > >>>>>>>>>>>>>>>>>>>
>> > > > > > >>>>>>>>>>>>>>>>>>> If we introduce a connector named
>> > > > > > >>>>>>>> `savepoint-metadata`,
>> > > > > > >>>>>>>>>> it
>> > > > > > >>>>>>>>>>>>> means
>> > > > > > >>>>>>>>>>>>>>> user
>> > > > > > >>>>>>>>>>>>>>>>> can
>> > > > > > >>>>>>>>>>>>>>>>>>> create a temporary table with connector
>> > > > > > >>>>>>>>>>> `savepoint-metadata`
>> > > > > > >>>>>>>>>>>>> and
>> > > > > > >>>>>>>>>>>>>>> the
>> > > > > > >>>>>>>>>>>>>>>>>>> connector needs to check whether table
>> > > > > > >>>> schema is
>> > > > > > >>>>>>> same
>> > > > > > >>>>>>>>> to
>> > > > > > >>>>>>>>>>> the
>> > > > > > >>>>>>>>>>>>>> schema
>> > > > > > >>>>>>>>>>>>>>>> we
>> > > > > > >>>>>>>>>>>>>>>>>>> proposed in the FLIP. On the other hand,
>> > > > > > >> it's
>> > > > > > >>>>> not
>> > > > > > >>>>>>>> easy
>> > > > > > >>>>>>>>>> work
>> > > > > > >>>>>>>>>>>> for
>> > > > > > >>>>>>>>>>>>>>>> others
>> > > > > > >>>>>>>>>>>>>>>>> to
>> > > > > > >>>>>>>>>>>>>>>>>>> users a metadata table with same schema.
>> > > > > > >>>>>>>>>>>>>>>>>>>
>> > > > > > >>>>>>>>>>>>>>>>>>> [1]
>> > > > > > >>>>>>>>>>>>>>>>>>>
>> > > > > > >>>>>>>>>>>>>>>>>>>
>> > > > > > >>>>>>>>>>>>>>>>>>
>> > > > > > >>>>>>>>>>>>>>>>>
>> > > > > > >>>>>>>>>>>>>>>>
>> > > > > > >>>>>>>>>>>>>>>
>> > > > > > >>>>>>>>>>>>>>
>> > > > > > >>>>>>>>>>>>>
>> > > > > > >>>>>>>>>>>>
>> > > > > > >>>>>>>>>>>
>> > > > > > >>>>>>>>>>
>> > > > > > >>>>>>>>>
>> > > > > > >>>>>>>>
>> > > > > > >>>>>>>
>> > > > > > >>>>>
>> > > > > > >>>>
>> > > > > > >>
>> > > > > >
>> > > > >
>> > > >
>> > >
>> >
>> https://github.com/apache/flink/compare/master...fsk119:flink:state-metadata?expand=1#diff-712a7bc92fe46c405fb0e61b475bb2a005cb7a72bab7df28bbb92744bcb5f465R63
>> > > > > > >>>>>>>>>>>>>>>>>>>
>> > > > > > >>>>>>>>>>>>>>>>>>> Best,
>> > > > > > >>>>>>>>>>>>>>>>>>> Shengkai
>> > > > > > >>>>>>>>>>>>>>>>>>>
>> > > > > > >>>>>>>>>>>>>>>>>>> Gabor Somogyi <gabor.g.somo...@gmail.com>
>> > > > > > >>>>>>>>> 于2025εΉ΄3月11ζ—₯ε‘¨δΊŒ
>> > > > > > >>>>>>>>>>>>> 16:56ε†™ι“οΌš
>> > > > > > >>>>>>>>>>>>>>>>>>>
>> > > > > > >>>>>>>>>>>>>>>>>>>> Hi Shengkai,
>> > > > > > >>>>>>>>>>>>>>>>>>>>
>> > > > > > >>>>>>>>>>>>>>>>>>>>> 1. State TTL for Value Columns
>> > > > > > >>>>>>>>>>>>>>>>>>>>
>> > > > > > >>>>>>>>>>>>>>>>>>>> From directional perspective I agree your
>> > > > > > >>>> idea
>> > > > > > >>>>>>> how
>> > > > > > >>>>>>>> it
>> > > > > > >>>>>>>>>> can
>> > > > > > >>>>>>>>>>>> be
>> > > > > > >>>>>>>>>>>>>>>>>> implemented.
>> > > > > > >>>>>>>>>>>>>>>>>>>> Previously I've mentioned that TTL
>> > > > > > >>>> information
>> > > > > > >>>>>>> is
>> > > > > > >>>>>>>> not
>> > > > > > >>>>>>>>>>>> exposed
>> > > > > > >>>>>>>>>>>>>> on
>> > > > > > >>>>>>>>>>>>>>>> the
>> > > > > > >>>>>>>>>>>>>>>>>>> state
>> > > > > > >>>>>>>>>>>>>>>>>>>> processor API (which the SQL state
>> > > > > > >>>> connector
>> > > > > > >>>>>>> uses
>> > > > > > >>>>>>>> to
>> > > > > > >>>>>>>>>> read
>> > > > > > >>>>>>>>>>>>> data)
>> > > > > > >>>>>>>>>>>>>>>>>>>> and unless somebody show me the opposite
>> > > > > > >>>> this
>> > > > > > >>>>>>> FLIP
>> > > > > > >>>>>>>> is
>> > > > > > >>>>>>>>>> not
>> > > > > > >>>>>>>>>>>>> going
>> > > > > > >>>>>>>>>>>>>>> to
>> > > > > > >>>>>>>>>>>>>>>>>>> address
>> > > > > > >>>>>>>>>>>>>>>>>>>> this to avoid feature creep. Our users
>> > > > > > >> are
>> > > > > > >>>>> also
>> > > > > > >>>>>>>>>>> interested
>> > > > > > >>>>>>>>>>>> in
>> > > > > > >>>>>>>>>>>>>> TTL
>> > > > > > >>>>>>>>>>>>>>>> so
>> > > > > > >>>>>>>>>>>>>>>>>>>> sooner or later we're going to expose it,
>> > > > > > >>>> this
>> > > > > > >>>>>>> is
>> > > > > > >>>>>>>>>> matter
>> > > > > > >>>>>>>>>>> of
>> > > > > > >>>>>>>>>>>>>>>>> scheduling.
>> > > > > > >>>>>>>>>>>>>>>>>>>>
>> > > > > > >>>>>>>>>>>>>>>>>>>>> 2. Adding a new connector with
>> > > > > > >>>>>>>> `savepoint-metadata`
>> > > > > > >>>>>>>>>>>>>>>>>>>>
>> > > > > > >>>>>>>>>>>>>>>>>>>> Not sure I understand your point at all
>> > > > > > >>>>> related
>> > > > > > >>>>>>>>>>>> StateCatalog.
>> > > > > > >>>>>>>>>>>>>>> First
>> > > > > > >>>>>>>>>>>>>>>>> of
>> > > > > > >>>>>>>>>>>>>>>>>>> all
>> > > > > > >>>>>>>>>>>>>>>>>>>> I can't agree more that StateCatalog is
>> > > > > > >>>> needed
>> > > > > > >>>>>>> and
>> > > > > > >>>>>>>>> is a
>> > > > > > >>>>>>>>>>>>> planned
>> > > > > > >>>>>>>>>>>>>>>>>> building
>> > > > > > >>>>>>>>>>>>>>>>>>>> block in an upcoming
>> > > > > > >>>>>>>>>>>>>>>>>>>> FLIP but not sure how can it help now? No
>> > > > > > >>>>> matter
>> > > > > > >>>>>>>>> what,
>> > > > > > >>>>>>>>>>> your
>> > > > > > >>>>>>>>>>>>>>>> knowledge
>> > > > > > >>>>>>>>>>>>>>>>>> is
>> > > > > > >>>>>>>>>>>>>>>>>>>> essential when we add StateCatalog. Let
>> > > > > > >> me
>> > > > > > >>>>>>> expose
>> > > > > > >>>>>>>> my
>> > > > > > >>>>>>>>>>>>>>> understanding
>> > > > > > >>>>>>>>>>>>>>>> in
>> > > > > > >>>>>>>>>>>>>>>>>>> this
>> > > > > > >>>>>>>>>>>>>>>>>>>> area:
>> > > > > > >>>>>>>>>>>>>>>>>>>> * First we need create table statements
>> > > > > > >> to
>> > > > > > >>>>>>> access
>> > > > > > >>>>>>>>> state
>> > > > > > >>>>>>>>>>>> data
>> > > > > > >>>>>>>>>>>>>> and
>> > > > > > >>>>>>>>>>>>>>>>>> metadata
>> > > > > > >>>>>>>>>>>>>>>>>>>> * When we have that then we can add
>> > > > > > >>>>> StateCatalog
>> > > > > > >>>>>>>>> which
>> > > > > > >>>>>>>>>>>> could
>> > > > > > >>>>>>>>>>>>>>>>>> potentially
>> > > > > > >>>>>>>>>>>>>>>>>>>> ease the life of users by for ex. giving
>> > > > > > >>>>>>>>> off-the-shelf
>> > > > > > >>>>>>>>>>>> tables
>> > > > > > >>>>>>>>>>>>>>>> without
>> > > > > > >>>>>>>>>>>>>>>>>>>> sweating with create table statements
>> > > > > > >>>>>>>>>>>>>>>>>>>>
>> > > > > > >>>>>>>>>>>>>>>>>>>> User expectations:
>> > > > > > >>>>>>>>>>>>>>>>>>>> * See state data (this is fulfilled with
>> > > > > > >>>> the
>> > > > > > >>>>>>>> existing
>> > > > > > >>>>>>>>>>>>>> connector)
>> > > > > > >>>>>>>>>>>>>>>>>>>> * See metadata about state data like TTL
>> > > > > > >>>> (this
>> > > > > > >>>>>>> can
>> > > > > > >>>>>>>> be
>> > > > > > >>>>>>>>>>> added
>> > > > > > >>>>>>>>>>>>> as
>> > > > > > >>>>>>>>>>>>>>>>> metadata
>> > > > > > >>>>>>>>>>>>>>>>>>>> column as you suggested since it belongs
>> > > > > > >> to
>> > > > > > >>>>> the
>> > > > > > >>>>>>>> data)
>> > > > > > >>>>>>>>>>>>>>>>>>>> * See metadata about operators (this can
>> > > > > > >> be
>> > > > > > >>>>>>> added
>> > > > > > >>>>>>>>> from
>> > > > > > >>>>>>>>>>>>>>>>>>> savepoint-metadata)
>> > > > > > >>>>>>>>>>>>>>>>>>>>
>> > > > > > >>>>>>>>>>>>>>>>>>>> Important to highlight that state data
>> > > > > > >>>> table
>> > > > > > >>>>>>> format
>> > > > > > >>>>>>>>>>> differs
>> > > > > > >>>>>>>>>>>>>> from
>> > > > > > >>>>>>>>>>>>>>>>> state
>> > > > > > >>>>>>>>>>>>>>>>>>>> metadata table format. Namely one table
>> > > > > > >> has
>> > > > > > >>>>> rows
>> > > > > > >>>>>>>> for
>> > > > > > >>>>>>>>>>> state
>> > > > > > >>>>>>>>>>>>>> values
>> > > > > > >>>>>>>>>>>>>>>> and
>> > > > > > >>>>>>>>>>>>>>>>>>>> another has rows for operators, right?
>> > > > > > >>>>>>>>>>>>>>>>>>>> I think that's the reason why you've
>> > > > > > >>>>> pinpointed
>> > > > > > >>>>>>> out
>> > > > > > >>>>>>>>>> that
>> > > > > > >>>>>>>>>>>> the
>> > > > > > >>>>>>>>>>>>>>>>> suggested
>> > > > > > >>>>>>>>>>>>>>>>>>>> metadata columns are somewhat clunky.
>> > > > > > >>>>>>>>>>>>>>>>>>>>
>> > > > > > >>>>>>>>>>>>>>>>>>>> As a conclusion I agree to add
>> > > > > > >>>>> ${state-name}_ttl
>> > > > > > >>>>>>>>>> metadata
>> > > > > > >>>>>>>>>>>>>> column
>> > > > > > >>>>>>>>>>>>>>>>> later
>> > > > > > >>>>>>>>>>>>>>>>>> on
>> > > > > > >>>>>>>>>>>>>>>>>>>> since it belongs to the state value and
>> > > > > > >>>>> adding a
>> > > > > > >>>>>>>> new
>> > > > > > >>>>>>>>>>> table
>> > > > > > >>>>>>>>>>>>> type
>> > > > > > >>>>>>>>>>>>>>>> (like
>> > > > > > >>>>>>>>>>>>>>>>>> you
>> > > > > > >>>>>>>>>>>>>>>>>>>> suggested similar to PG [1])
>> > > > > > >>>>>>>>>>>>>>>>>>>> for metadata. Please see how Spark does
>> > > > > > >>>> that
>> > > > > > >>>>> too
>> > > > > > >>>>>>>> [2].
>> > > > > > >>>>>>>>>>>>>>>>>>>>
>> > > > > > >>>>>>>>>>>>>>>>>>>> If you have better approach then please
>> > > > > > >>>>>>> elaborate
>> > > > > > >>>>>>>>> with
>> > > > > > >>>>>>>>>>> more
>> > > > > > >>>>>>>>>>>>>>> details
>> > > > > > >>>>>>>>>>>>>>>>> and
>> > > > > > >>>>>>>>>>>>>>>>>>>> help me to understand your point.
>> > > > > > >>>>>>>>>>>>>>>>>>>>
>> > > > > > >>>>>>>>>>>>>>>>>>>>> Up until now we've seen even in TB
>> > > > > > >>>>> savepoints
>> > > > > > >>>>>>>> that
>> > > > > > >>>>>>>>>> the
>> > > > > > >>>>>>>>>>>>> number
>> > > > > > >>>>>>>>>>>>>>> of
>> > > > > > >>>>>>>>>>>>>>>>> keys
>> > > > > > >>>>>>>>>>>>>>>>>>> can
>> > > > > > >>>>>>>>>>>>>>>>>>>>> be extremely huge but not the per key
>> > > > > > >>>> state
>> > > > > > >>>>>>>> itself.
>> > > > > > >>>>>>>>>>>>>>>>>>>>> But again, this is a good feature as-is
>> > > > > > >>>> and
>> > > > > > >>>>>>> can
>> > > > > > >>>>>>>> be
>> > > > > > >>>>>>>>>>>> handled
>> > > > > > >>>>>>>>>>>>>> in a
>> > > > > > >>>>>>>>>>>>>>>>>>> separate
>> > > > > > >>>>>>>>>>>>>>>>>>>>> jira.
>> > > > > > >>>>>>>>>>>>>>>>>>>>
>> > > > > > >>>>>>>>>>>>>>>>>>>> I've just created
>> > > > > > >>>>>>>>>>>>>>>>
>> > > > > > >> https://issues.apache.org/jira/browse/FLINK-37456.
>> > > > > > >>>>>>>>>>>>>>>>>>>>
>> > > > > > >>>>>>>>>>>>>>>>>>>> [1]
>> > > > > > >>>>>>>>>>>>>>
>> > > > > > >>>>>
>> https://www.postgresql.org/docs/current/view-pg-tables.html
>> > > > > > >>>>>>>>>>>>>>>>>>>> [2]
>> > > > > > >>>>>>>>>>>>>>>>>>>>
>> > > > > > >>>>>>>>>>>>>>>>>>>>
>> > > > > > >>>>>>>>>>>>>>>>>>>
>> > > > > > >>>>>>>>>>>>>>>>>>
>> > > > > > >>>>>>>>>>>>>>>>>
>> > > > > > >>>>>>>>>>>>>>>>
>> > > > > > >>>>>>>>>>>>>>>
>> > > > > > >>>>>>>>>>>>>>
>> > > > > > >>>>>>>>>>>>>
>> > > > > > >>>>>>>>>>>>
>> > > > > > >>>>>>>>>>>
>> > > > > > >>>>>>>>>>
>> > > > > > >>>>>>>>>
>> > > > > > >>>>>>>>
>> > > > > > >>>>>>>
>> > > > > > >>>>>
>> > > > > > >>>>
>> > > > > > >>
>> > > > > >
>> > > > >
>> > > >
>> > >
>> >
>> https://www.databricks.com/blog/announcing-state-reader-api-new-statestore-data-source
>> > > > > > >>>>>>>>>>>>>>>>>>>>
>> > > > > > >>>>>>>>>>>>>>>>>>>> BR,
>> > > > > > >>>>>>>>>>>>>>>>>>>> G
>> > > > > > >>>>>>>>>>>>>>>>>>>>
>> > > > > > >>>>>>>>>>>>>>>>>>>>
>> > > > > > >>>>>>>>>>>>>>>>>>>> On Tue, Mar 11, 2025 at 3:55β€―AM Shengkai
>> > > > > > >>>> Fang
>> > > > > > >>>>> <
>> > > > > > >>>>>>>>>>>>>> fskm...@gmail.com
>> > > > > > >>>>>>>>>>>>>>>>
>> > > > > > >>>>>>>>>>>>>>>>>> wrote:
>> > > > > > >>>>>>>>>>>>>>>>>>>>
>> > > > > > >>>>>>>>>>>>>>>>>>>>> Hi, Gabor. Thanks for your response.
>> > > > > > >>>>>>>>>>>>>>>>>>>>>
>> > > > > > >>>>>>>>>>>>>>>>>>>>>> 1. State TTL for Value Columns
>> > > > > > >>>>>>>>>>>>>>>>>>>>>
>> > > > > > >>>>>>>>>>>>>>>>>>>>> Thank you for addressing the
>> > > > > > >> limitations
>> > > > > > >>>>> here.
>> > > > > > >>>>>>>>>>> However, I
>> > > > > > >>>>>>>>>>>>>>> believe
>> > > > > > >>>>>>>>>>>>>>>>> it
>> > > > > > >>>>>>>>>>>>>>>>>>>> would
>> > > > > > >>>>>>>>>>>>>>>>>>>>> be beneficial to further clarify the
>> > > > > > >> API
>> > > > > > >>>> in
>> > > > > > >>>>>>> this
>> > > > > > >>>>>>>>> FLIP
>> > > > > > >>>>>>>>>>>>>> regarding
>> > > > > > >>>>>>>>>>>>>>>> how
>> > > > > > >>>>>>>>>>>>>>>>>>> users
>> > > > > > >>>>>>>>>>>>>>>>>>>>> can specify the TTL column.
>> > > > > > >>>>>>>>>>>>>>>>>>>>>
>> > > > > > >>>>>>>>>>>>>>>>>>>>> One potential approach that comes to
>> > > > > > >>>> mind is
>> > > > > > >>>>>>>> using
>> > > > > > >>>>>>>>> a
>> > > > > > >>>>>>>>>>>>>>> standardized
>> > > > > > >>>>>>>>>>>>>>>>>>> naming
>> > > > > > >>>>>>>>>>>>>>>>>>>>> convention such as ${state-name}_ttl
>> > > > > > >> for
>> > > > > > >>>> the
>> > > > > > >>>>>>>>> metadata
>> > > > > > >>>>>>>>>>>>> column
>> > > > > > >>>>>>>>>>>>>>> that
>> > > > > > >>>>>>>>>>>>>>>>>>> defines
>> > > > > > >>>>>>>>>>>>>>>>>>>>> the TTL value. In terms of
>> > > > > > >>>> implementation,
>> > > > > > >>>>> the
>> > > > > > >>>>>>>>>>>>>>>> listReadableMetadata
>> > > > > > >>>>>>>>>>>>>>>>>>>>> function could:
>> > > > > > >>>>>>>>>>>>>>>>>>>>>
>> > > > > > >>>>>>>>>>>>>>>>>>>>> 1. Read the table’s columns and
>> > > > > > >>>>> configuration,
>> > > > > > >>>>>>>>>>>>>>>>>>>>> 2. Extract all defined state names, and
>> > > > > > >>>>>>>>>>>>>>>>>>>>> 3. Return a structured list of metadata
>> > > > > > >>>>>>> entries
>> > > > > > >>>>>>>>>>> formatted
>> > > > > > >>>>>>>>>>>>> as
>> > > > > > >>>>>>>>>>>>>>>>>>>>> ${state-name}_ttl.
>> > > > > > >>>>>>>>>>>>>>>>>>>>>
>> > > > > > >>>>>>>>>>>>>>>>>>>>> WDYT?
>> > > > > > >>>>>>>>>>>>>>>>>>>>>
>> > > > > > >>>>>>>>>>>>>>>>>>>>>> 2. Adding a new connector with
>> > > > > > >>>>>>>>> `savepoint-metadata`
>> > > > > > >>>>>>>>>>>>>>>>>>>>>
>> > > > > > >>>>>>>>>>>>>>>>>>>>> Introducing a new connector type at
>> > > > > > >> this
>> > > > > > >>>>> stage
>> > > > > > >>>>>>>> may
>> > > > > > >>>>>>>>>>>>>>> unnecessarily
>> > > > > > >>>>>>>>>>>>>>>>>>>> complicate
>> > > > > > >>>>>>>>>>>>>>>>>>>>> the system. Given that every table
>> > > > > > >>>> already
>> > > > > > >>>>>>>> belongs
>> > > > > > >>>>>>>>>> to a
>> > > > > > >>>>>>>>>>>>>>> Catalog,
>> > > > > > >>>>>>>>>>>>>>>>>> which
>> > > > > > >>>>>>>>>>>>>>>>>>> is
>> > > > > > >>>>>>>>>>>>>>>>>>>>> designed to provide a Factory for
>> > > > > > >>>> building
>> > > > > > >>>>>>> source
>> > > > > > >>>>>>>>> or
>> > > > > > >>>>>>>>>>> sink
>> > > > > > >>>>>>>>>>>>>>>>>> connectors, I
>> > > > > > >>>>>>>>>>>>>>>>>>>>> propose integrating a dedicated
>> > > > > > >>>> StateCatalog
>> > > > > > >>>>>>>>> instead.
>> > > > > > >>>>>>>>>>>> This
>> > > > > > >>>>>>>>>>>>>>>> approach
>> > > > > > >>>>>>>>>>>>>>>>>>> would
>> > > > > > >>>>>>>>>>>>>>>>>>>>> allow us to:
>> > > > > > >>>>>>>>>>>>>>>>>>>>>
>> > > > > > >>>>>>>>>>>>>>>>>>>>> 1. Leverage the Catalog’s existing
>> > > > > > >>>>>>> capabilities
>> > > > > > >>>>>>>> to
>> > > > > > >>>>>>>>>>> manage
>> > > > > > >>>>>>>>>>>>> TTL
>> > > > > > >>>>>>>>>>>>>>>>>> metadata
>> > > > > > >>>>>>>>>>>>>>>>>>>>> (e.g., state names and TTL logic)
>> > > > > > >> without
>> > > > > > >>>>>>>>> duplicating
>> > > > > > >>>>>>>>>>>>>>>>> functionality.
>> > > > > > >>>>>>>>>>>>>>>>>>>>> 2. Provide a unified interface for
>> > > > > > >>>> connector
>> > > > > > >>>>>>>>>>>> instantiation
>> > > > > > >>>>>>>>>>>>>> and
>> > > > > > >>>>>>>>>>>>>>>>>> metadata
>> > > > > > >>>>>>>>>>>>>>>>>>>>> handling through the Catalog’s Factory
>> > > > > > >>>>>>> pattern.
>> > > > > > >>>>>>>>>>>>>>>>>>>>>
>> > > > > > >>>>>>>>>>>>>>>>>>>>> Would this design decision better align
>> > > > > > >>>> with
>> > > > > > >>>>>>> our
>> > > > > > >>>>>>>>>>>>>> architecture’s
>> > > > > > >>>>>>>>>>>>>>>>>>>>> extensibility and reduce redundancy?
>> > > > > > >>>>>>>>>>>>>>>>>>>>>
>> > > > > > >>>>>>>>>>>>>>>>>>>>>> Up until now we've seen even in TB
>> > > > > > >>>>>>> savepoints
>> > > > > > >>>>>>>>> that
>> > > > > > >>>>>>>>>>> the
>> > > > > > >>>>>>>>>>>>>> number
>> > > > > > >>>>>>>>>>>>>>>> of
>> > > > > > >>>>>>>>>>>>>>>>>> keys
>> > > > > > >>>>>>>>>>>>>>>>>>>> can
>> > > > > > >>>>>>>>>>>>>>>>>>>>>> be extremely huge but not the per key
>> > > > > > >>>>> state
>> > > > > > >>>>>>>>> itself.
>> > > > > > >>>>>>>>>>>>>>>>>>>>>> But again, this is a good feature
>> > > > > > >> as-is
>> > > > > > >>>>> and
>> > > > > > >>>>>>> can
>> > > > > > >>>>>>>>> be
>> > > > > > >>>>>>>>>>>>> handled
>> > > > > > >>>>>>>>>>>>>>> in a
>> > > > > > >>>>>>>>>>>>>>>>>>>> separate
>> > > > > > >>>>>>>>>>>>>>>>>>>>>> jira.
>> > > > > > >>>>>>>>>>>>>>>>>>>>>
>> > > > > > >>>>>>>>>>>>>>>>>>>>> +1 for a separate jira.
>> > > > > > >>>>>>>>>>>>>>>>>>>>>
>> > > > > > >>>>>>>>>>>>>>>>>>>>> Best,
>> > > > > > >>>>>>>>>>>>>>>>>>>>> Shengkai
>> > > > > > >>>>>>>>>>>>>>>>>>>>>
>> > > > > > >>>>>>>>>>>>>>>>>>>>> Gabor Somogyi <
>> > > > > > >> gabor.g.somo...@gmail.com
>> > > > > > >>>>>
>> > > > > > >>>>>>>>>>> 于2025εΉ΄3月10ζ—₯周一
>> > > > > > >>>>>>>>>>>>>>> 19:05ε†™ι“οΌš
>> > > > > > >>>>>>>>>>>>>>>>>>>>>
>> > > > > > >>>>>>>>>>>>>>>>>>>>>> Hi Shengkai,
>> > > > > > >>>>>>>>>>>>>>>>>>>>>>
>> > > > > > >>>>>>>>>>>>>>>>>>>>>> Please see my comments inline.
>> > > > > > >>>>>>>>>>>>>>>>>>>>>>
>> > > > > > >>>>>>>>>>>>>>>>>>>>>> BR,
>> > > > > > >>>>>>>>>>>>>>>>>>>>>> G
>> > > > > > >>>>>>>>>>>>>>>>>>>>>>
>> > > > > > >>>>>>>>>>>>>>>>>>>>>>
>> > > > > > >>>>>>>>>>>>>>>>>>>>>> On Mon, Mar 3, 2025 at 7:07β€―AM
>> > > > > > >> Shengkai
>> > > > > > >>>>>>> Fang <
>> > > > > > >>>>>>>>>>>>>>>> fskm...@gmail.com>
>> > > > > > >>>>>>>>>>>>>>>>>>>> wrote:
>> > > > > > >>>>>>>>>>>>>>>>>>>>>>
>> > > > > > >>>>>>>>>>>>>>>>>>>>>>> Hi, Gabor. Thanks for your the
>> > > > > > >> FLIP.
>> > > > > > >>>> I
>> > > > > > >>>>>>> have
>> > > > > > >>>>>>>>> some
>> > > > > > >>>>>>>>>>>>>> questions
>> > > > > > >>>>>>>>>>>>>>>>> about
>> > > > > > >>>>>>>>>>>>>>>>>>> the
>> > > > > > >>>>>>>>>>>>>>>>>>>>>> FLIP:
>> > > > > > >>>>>>>>>>>>>>>>>>>>>>>
>> > > > > > >>>>>>>>>>>>>>>>>>>>>>> 1. State TTL for Value Columns
>> > > > > > >>>>>>>>>>>>>>>>>>>>>>> How can users retrieve the state
>> > > > > > >> TTL
>> > > > > > >>>>>>>>>> (Time-to-Live)
>> > > > > > >>>>>>>>>>>> for
>> > > > > > >>>>>>>>>>>>>>> each
>> > > > > > >>>>>>>>>>>>>>>>>> value
>> > > > > > >>>>>>>>>>>>>>>>>>>>>> column?
>> > > > > > >>>>>>>>>>>>>>>>>>>>>>> From my understanding of the
>> > > > > > >> current
>> > > > > > >>>>>>> design,
>> > > > > > >>>>>>>> it
>> > > > > > >>>>>>>>>>> seems
>> > > > > > >>>>>>>>>>>>>> that
>> > > > > > >>>>>>>>>>>>>>>> this
>> > > > > > >>>>>>>>>>>>>>>>>>>>>>> functionality is not supported.
>> > > > > > >> Could
>> > > > > > >>>>> you
>> > > > > > >>>>>>>>> clarify
>> > > > > > >>>>>>>>>>> if
>> > > > > > >>>>>>>>>>>>>> there
>> > > > > > >>>>>>>>>>>>>>>> are
>> > > > > > >>>>>>>>>>>>>>>>>>> plans
>> > > > > > >>>>>>>>>>>>>>>>>>>> to
>> > > > > > >>>>>>>>>>>>>>>>>>>>>>> address this limitation?
>> > > > > > >>>>>>>>>>>>>>>>>>>>>>>
>> > > > > > >>>>>>>>>>>>>>>>>>>>>>
>> > > > > > >>>>>>>>>>>>>>>>>>>>>> Since the state processor API is not
>> > > > > > >>>> yet
>> > > > > > >>>>>>>> exposing
>> > > > > > >>>>>>>>>>> this
>> > > > > > >>>>>>>>>>>>>>>>> information
>> > > > > > >>>>>>>>>>>>>>>>>>> this
>> > > > > > >>>>>>>>>>>>>>>>>>>>>> would require several steps.
>> > > > > > >>>>>>>>>>>>>>>>>>>>>> First, the state processor API
>> > > > > > >> support
>> > > > > > >>>>>>> needs to
>> > > > > > >>>>>>>>> be
>> > > > > > >>>>>>>>>>>> added
>> > > > > > >>>>>>>>>>>>>>> which
>> > > > > > >>>>>>>>>>>>>>>>> can
>> > > > > > >>>>>>>>>>>>>>>>>> be
>> > > > > > >>>>>>>>>>>>>>>>>>>>> then
>> > > > > > >>>>>>>>>>>>>>>>>>>>>> exposed on the SQL API.
>> > > > > > >>>>>>>>>>>>>>>>>>>>>> This is definitely a future
>> > > > > > >> improvement
>> > > > > > >>>>>>> which
>> > > > > > >>>>>>>> is
>> > > > > > >>>>>>>>>>> useful
>> > > > > > >>>>>>>>>>>>> and
>> > > > > > >>>>>>>>>>>>>>> can
>> > > > > > >>>>>>>>>>>>>>>>> be
>> > > > > > >>>>>>>>>>>>>>>>>>>>> handled
>> > > > > > >>>>>>>>>>>>>>>>>>>>>> in a separate jira.
>> > > > > > >>>>>>>>>>>>>>>>>>>>>>
>> > > > > > >>>>>>>>>>>>>>>>>>>>>>
>> > > > > > >>>>>>>>>>>>>>>>>>>>>>> 2. Metadata Table vs. Metadata
>> > > > > > >> Column
>> > > > > > >>>>>>>>>>>>>>>>>>>>>>> The metadata information described
>> > > > > > >> in
>> > > > > > >>>>> the
>> > > > > > >>>>>>>> FLIP
>> > > > > > >>>>>>>>>>>> appears
>> > > > > > >>>>>>>>>>>>> to
>> > > > > > >>>>>>>>>>>>>>> be
>> > > > > > >>>>>>>>>>>>>>>>>>> intended
>> > > > > > >>>>>>>>>>>>>>>>>>>>> to
>> > > > > > >>>>>>>>>>>>>>>>>>>>>>> describe the state files stored at
>> > > > > > >> a
>> > > > > > >>>>>>> specific
>> > > > > > >>>>>>>>>>>> location.
>> > > > > > >>>>>>>>>>>>>> To
>> > > > > > >>>>>>>>>>>>>>>> me,
>> > > > > > >>>>>>>>>>>>>>>>>> this
>> > > > > > >>>>>>>>>>>>>>>>>>>>>> concept
>> > > > > > >>>>>>>>>>>>>>>>>>>>>>> aligns more closely with system
>> > > > > > >>>> tables
>> > > > > > >>>>>>> like
>> > > > > > >>>>>>>>>>> pg_tables
>> > > > > > >>>>>>>>>>>>> in
>> > > > > > >>>>>>>>>>>>>>>>>> PostgreSQL
>> > > > > > >>>>>>>>>>>>>>>>>>>> [1]
>> > > > > > >>>>>>>>>>>>>>>>>>>>>> or
>> > > > > > >>>>>>>>>>>>>>>>>>>>>>> the INFORMATION_SCHEMA in MySQL
>> > > > > > >> [2].
>> > > > > > >>>>>>>>>>>>>>>>>>>>>>>
>> > > > > > >>>>>>>>>>>>>>>>>>>>>>
>> > > > > > >>>>>>>>>>>>>>>>>>>>>> Adding a new connector with
>> > > > > > >>>>>>>> `savepoint-metadata`
>> > > > > > >>>>>>>>>> is a
>> > > > > > >>>>>>>>>>>>>>>> possibility
>> > > > > > >>>>>>>>>>>>>>>>>>> where
>> > > > > > >>>>>>>>>>>>>>>>>>>>> we
>> > > > > > >>>>>>>>>>>>>>>>>>>>>> can create such functionality.
>> > > > > > >>>>>>>>>>>>>>>>>>>>>> I'm not against that, just want to
>> > > > > > >>>> have a
>> > > > > > >>>>>>>> common
>> > > > > > >>>>>>>>>>>>> agreement
>> > > > > > >>>>>>>>>>>>>>> that
>> > > > > > >>>>>>>>>>>>>>>>> we
>> > > > > > >>>>>>>>>>>>>>>>>>>> would
>> > > > > > >>>>>>>>>>>>>>>>>>>>>> like to move that direction.
>> > > > > > >>>>>>>>>>>>>>>>>>>>>> (As a side note not just PG but Spark
>> > > > > > >>>> also
>> > > > > > >>>>>>> has
>> > > > > > >>>>>>>>>>> similar
>> > > > > > >>>>>>>>>>>>>>> approach
>> > > > > > >>>>>>>>>>>>>>>>>> and I
>> > > > > > >>>>>>>>>>>>>>>>>>>>>> basically like the idea).
>> > > > > > >>>>>>>>>>>>>>>>>>>>>> If we would go that direction
>> > > > > > >> savepoint
>> > > > > > >>>>>>>> metadata
>> > > > > > >>>>>>>>>> can
>> > > > > > >>>>>>>>>>> be
>> > > > > > >>>>>>>>>>>>>>> reached
>> > > > > > >>>>>>>>>>>>>>>>> in
>> > > > > > >>>>>>>>>>>>>>>>>> a
>> > > > > > >>>>>>>>>>>>>>>>>>>> way
>> > > > > > >>>>>>>>>>>>>>>>>>>>>> that one row would represent
>> > > > > > >>>>>>>>>>>>>>>>>>>>>> an operator with it's values
>> > > > > > >> something
>> > > > > > >>>>> like
>> > > > > > >>>>>>>> this:
>> > > > > > >>>>>>>>>>>>>>>>>>>>>>
>> > > > > > >>>>>>>>>>>>>>>>>>>>>>
>> > > > > > >>>>>>>>>>>>>>>>>>>>>>
>> > > > > > >>>>>>>>>>>>>>>>>>>>>
>> > > > > > >>>>>>>>>>>>>>>>>>>>
>> > > > > > >>>>>>>>>>>>>>>>>>>
>> > > > > > >>>>>>>>>>>>>>>>>>
>> > > > > > >>>>>>>>>>>>>>>>>
>> > > > > > >>>>>>>>>>>>>>>>
>> > > > > > >>>>>>>>>>>>>>>
>> > > > > > >>>>>>>>>>>>>>
>> > > > > > >>>>>>>>>>>>>
>> > > > > > >>>>>>>>>>>>
>> > > > > > >>>>>>>>>>>
>> > > > > > >>>>>>>>>>
>> > > > > > >>>>>>>>>
>> > > > > > >>>>>>>>
>> > > > > > >>>>>>>
>> > > > > > >>>>>
>> > > > > > >>>>
>> > > > > > >>
>> > > > > >
>> > > > >
>> > > >
>> > >
>> >
>> β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”¬β”€β”€β”€β”€β”€β”€β”€β”€β”€β”¬β”€β”€β”€β”€β”€β”€β”€β”€β”€β”¬β”€β”€β”€β”€β”€β”€β”€β”€β”€β”¬β”€β”€β”€β”€β”€β”€β”€β”€β”€β”¬β”€β”€β”€β”€β”€β”€β”€β”€β”€β”¬β”€β”€β”€β”€β”€β”€β”€β”€β”€β”¬β”€β”€β”€β”€β”€β”€β”€β”€β”
>> > > > > > >>>>>>>>>>>>>>>>>>>>>>
>> > > > > > >>>>>>>>>>>>>>>>>>>>>>
>> > > > > > >>>>>>>>>>>>>>>>>>>>>
>> > > > > > >>>>>>>>>>>>>>>>>>>>
>> > > > > > >>>>>>>>>>>>>>>>>>>
>> > > > > > >>>>>>>>>>>>>>>>>>
>> > > > > > >>>>>>>>>>>>>>>>>
>> > > > > > >>>>>>>>>>>>>>>>
>> > > > > > >>>>>>>>>>>>>>>
>> > > > > > >>>>>>>>>>>>>>
>> > > > > > >>>>>>>>>>>>>
>> > > > > > >>>>>>>>>>>>
>> > > > > > >>>>>>>>>>>
>> > > > > > >>>>>>>>>>
>> > > > > > >>>>>>>>>
>> > > > > > >>>>>>>>
>> > > > > > >>>>>>>
>> > > > > > >>>>>
>> > > > > > >>>>
>> > > > > > >>
>> > > > > >
>> > > > >
>> > > >
>> > >
>> >
>> β”‚operatorNβ”‚operatorUβ”‚operatorHβ”‚paralleliβ”‚maxParallβ”‚subtaskStβ”‚coordinatβ”‚totalStaβ”‚
>> > > > > > >>>>>>>>>>>>>>>>>>>>>> β”‚ame      β”‚id       β”‚ash      β”‚sm
>> > > > > > >>>>>>> β”‚elism
>> > > > > > >>>>>>>>>>>>>>>>>>>>>> β”‚atesCountβ”‚orStateSiβ”‚tesSizeIβ”‚
>> > > > > > >>>>>>>>>>>>>>>>>>>>>> β”‚         β”‚         β”‚         β”‚
>> > > > > > >>>> β”‚
>> > > > > > >>>>>>>>> β”‚
>> > > > > > >>>>>>>>>>>>>>>>>>>>>> β”‚zeInBytesβ”‚nBytes  β”‚
>> > > > > > >>>>>>>>>>>>>>>>>>>>>>
>> > > > > > >>>>>>>>>>>>>>>>>>>>>>
>> > > > > > >>>>>>>>>>>>>>>>>>>>>
>> > > > > > >>>>>>>>>>>>>>>>>>>>
>> > > > > > >>>>>>>>>>>>>>>>>>>
>> > > > > > >>>>>>>>>>>>>>>>>>
>> > > > > > >>>>>>>>>>>>>>>>>
>> > > > > > >>>>>>>>>>>>>>>>
>> > > > > > >>>>>>>>>>>>>>>
>> > > > > > >>>>>>>>>>>>>>
>> > > > > > >>>>>>>>>>>>>
>> > > > > > >>>>>>>>>>>>
>> > > > > > >>>>>>>>>>>
>> > > > > > >>>>>>>>>>
>> > > > > > >>>>>>>>>
>> > > > > > >>>>>>>>
>> > > > > > >>>>>>>
>> > > > > > >>>>>
>> > > > > > >>>>
>> > > > > > >>
>> > > > > >
>> > > > >
>> > > >
>> > >
>> >
>> β”œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”Όβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”Όβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”Όβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”Όβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”Όβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”Όβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”Όβ”€β”€β”€β”€β”€β”€β”€β”€β”€
>> > > > > > >>>>>>>>>>>>>>>>>>>>>> β”‚Source:  β”‚datagen-sβ”‚47aee9439β”‚2
>> > > > > > >>>>> β”‚128
>> > > > > > >>>>>>>>>> β”‚2
>> > > > > > >>>>>>>>>>>>>>> β”‚16
>> > > > > > >>>>>>>>>>>>>>>>>>>>>> β”‚546     β”‚
>> > > > > > >>>>>>>>>>>>>>>>>>>>>> β”‚datagen-sβ”‚ource-uidβ”‚4d6ea26e2β”‚
>> > > > > > >>>> β”‚
>> > > > > > >>>>>>>>> β”‚
>> > > > > > >>>>>>>>>>>>>> β”‚
>> > > > > > >>>>>>>>>>>>>>>>>>> β”‚
>> > > > > > >>>>>>>>>>>>>>>>>>>>>>    β”‚
>> > > > > > >>>>>>>>>>>>>>>>>>>>>> β”‚ource    β”‚         β”‚d544bef0aβ”‚
>> > > > > > >>>> β”‚
>> > > > > > >>>>>>>>> β”‚
>> > > > > > >>>>>>>>>>>>>> β”‚
>> > > > > > >>>>>>>>>>>>>>>>>>> β”‚
>> > > > > > >>>>>>>>>>>>>>>>>>>>>>    β”‚
>> > > > > > >>>>>>>>>>>>>>>>>>>>>> β”‚         β”‚         β”‚37bb5    β”‚
>> > > > > > >>>> β”‚
>> > > > > > >>>>>>>>> β”‚
>> > > > > > >>>>>>>>>>>>>> β”‚
>> > > > > > >>>>>>>>>>>>>>>>>>> β”‚
>> > > > > > >>>>>>>>>>>>>>>>>>>>>>    β”‚
>> > > > > > >>>>>>>>>>>>>>>>>>>>>>
>> > > > > > >>>>>>>>>>>>>>>>>>>>>>
>> > > > > > >>>>>>>>>>>>>>>>>>>>>
>> > > > > > >>>>>>>>>>>>>>>>>>>>
>> > > > > > >>>>>>>>>>>>>>>>>>>
>> > > > > > >>>>>>>>>>>>>>>>>>
>> > > > > > >>>>>>>>>>>>>>>>>
>> > > > > > >>>>>>>>>>>>>>>>
>> > > > > > >>>>>>>>>>>>>>>
>> > > > > > >>>>>>>>>>>>>>
>> > > > > > >>>>>>>>>>>>>
>> > > > > > >>>>>>>>>>>>
>> > > > > > >>>>>>>>>>>
>> > > > > > >>>>>>>>>>
>> > > > > > >>>>>>>>>
>> > > > > > >>>>>>>>
>> > > > > > >>>>>>>
>> > > > > > >>>>>
>> > > > > > >>>>
>> > > > > > >>
>> > > > > >
>> > > > >
>> > > >
>> > >
>> >
>> β”œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”Όβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”Όβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”Όβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”Όβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”Όβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”Όβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”Όβ”€β”€β”€β”€β”€β”€β”€β”€β”€
>> > > > > > >>>>>>>>>>>>>>>>>>>>>> β”‚long-udf-β”‚long-udf-β”‚6ed3f40bfβ”‚2
>> > > > > > >>>>> β”‚128
>> > > > > > >>>>>>>>>> β”‚2
>> > > > > > >>>>>>>>>>>>>>> β”‚0
>> > > > > > >>>>>>>>>>>>>>>>>>>> β”‚0
>> > > > > > >>>>>>>>>>>>>>>>>>>>>>     β”‚
>> > > > > > >>>>>>>>>>>>>>>>>>>>>> β”‚with-mastβ”‚with-mastβ”‚f3c8dfcdfβ”‚
>> > > > > > >>>> β”‚
>> > > > > > >>>>>>>>> β”‚
>> > > > > > >>>>>>>>>>>>>> β”‚
>> > > > > > >>>>>>>>>>>>>>>>>>> β”‚
>> > > > > > >>>>>>>>>>>>>>>>>>>>>>    β”‚
>> > > > > > >>>>>>>>>>>>>>>>>>>>>> β”‚er-hook  β”‚er-hook-uβ”‚cb95128a1β”‚
>> > > > > > >>>> β”‚
>> > > > > > >>>>>>>>> β”‚
>> > > > > > >>>>>>>>>>>>>> β”‚
>> > > > > > >>>>>>>>>>>>>>>>>>> β”‚
>> > > > > > >>>>>>>>>>>>>>>>>>>>>>    β”‚
>> > > > > > >>>>>>>>>>>>>>>>>>>>>> β”‚         β”‚id       β”‚018f1    β”‚
>> > > > > > >>>> β”‚
>> > > > > > >>>>>>>>> β”‚
>> > > > > > >>>>>>>>>>>>>> β”‚
>> > > > > > >>>>>>>>>>>>>>>>>>> β”‚
>> > > > > > >>>>>>>>>>>>>>>>>>>>>>    β”‚
>> > > > > > >>>>>>>>>>>>>>>>>>>>>>
>> > > > > > >>>>>>>>>>>>>>>>>>>>>>
>> > > > > > >>>>>>>>>>>>>>>>>>>>>
>> > > > > > >>>>>>>>>>>>>>>>>>>>
>> > > > > > >>>>>>>>>>>>>>>>>>>
>> > > > > > >>>>>>>>>>>>>>>>>>
>> > > > > > >>>>>>>>>>>>>>>>>
>> > > > > > >>>>>>>>>>>>>>>>
>> > > > > > >>>>>>>>>>>>>>>
>> > > > > > >>>>>>>>>>>>>>
>> > > > > > >>>>>>>>>>>>>
>> > > > > > >>>>>>>>>>>>
>> > > > > > >>>>>>>>>>>
>> > > > > > >>>>>>>>>>
>> > > > > > >>>>>>>>>
>> > > > > > >>>>>>>>
>> > > > > > >>>>>>>
>> > > > > > >>>>>
>> > > > > > >>>>
>> > > > > > >>
>> > > > > >
>> > > > >
>> > > >
>> > >
>> >
>> β”œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”Όβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”Όβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”Όβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”Όβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”Όβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”Όβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”Όβ”€β”€β”€β”€β”€β”€β”€β”€β”€
>> > > > > > >>>>>>>>>>>>>>>>>>>>>> β”‚value-proβ”‚value-proβ”‚ca4f5fe9aβ”‚2
>> > > > > > >>>>> β”‚128
>> > > > > > >>>>>>>>>> β”‚2
>> > > > > > >>>>>>>>>>>>>>> β”‚0
>> > > > > > >>>>>>>>>>>>>>>>>>>>>> β”‚40726   β”‚
>> > > > > > >>>>>>>>>>>>>>>>>>>>>> β”‚cess     β”‚cess-uid β”‚637b656f0β”‚
>> > > > > > >>>> β”‚
>> > > > > > >>>>>>>>> β”‚
>> > > > > > >>>>>>>>>>>>>> β”‚
>> > > > > > >>>>>>>>>>>>>>>>>>> β”‚
>> > > > > > >>>>>>>>>>>>>>>>>>>>>>    β”‚
>> > > > > > >>>>>>>>>>>>>>>>>>>>>> β”‚         β”‚         β”‚9ea78b3e7β”‚
>> > > > > > >>>> β”‚
>> > > > > > >>>>>>>>> β”‚
>> > > > > > >>>>>>>>>>>>>> β”‚
>> > > > > > >>>>>>>>>>>>>>>>>>> β”‚
>> > > > > > >>>>>>>>>>>>>>>>>>>>>>    β”‚
>> > > > > > >>>>>>>>>>>>>>>>>>>>>> β”‚         β”‚         β”‚a15b9    β”‚
>> > > > > > >>>> β”‚
>> > > > > > >>>>>>>>> β”‚
>> > > > > > >>>>>>>>>>>>>> β”‚
>> > > > > > >>>>>>>>>>>>>>>>>>> β”‚
>> > > > > > >>>>>>>>>>>>>>>>>>>>>>    β”‚
>> > > > > > >>>>>>>>>>>>>>>>>>>>>>
>> > > > > > >>>>>>>>>>>>>>>>>>>>>>
>> > > > > > >>>>>>>>>>>>>>>>>>>>>
>> > > > > > >>>>>>>>>>>>>>>>>>>>
>> > > > > > >>>>>>>>>>>>>>>>>>>
>> > > > > > >>>>>>>>>>>>>>>>>>
>> > > > > > >>>>>>>>>>>>>>>>>
>> > > > > > >>>>>>>>>>>>>>>>
>> > > > > > >>>>>>>>>>>>>>>
>> > > > > > >>>>>>>>>>>>>>
>> > > > > > >>>>>>>>>>>>>
>> > > > > > >>>>>>>>>>>>
>> > > > > > >>>>>>>>>>>
>> > > > > > >>>>>>>>>>
>> > > > > > >>>>>>>>>
>> > > > > > >>>>>>>>
>> > > > > > >>>>>>>
>> > > > > > >>>>>
>> > > > > > >>>>
>> > > > > > >>
>> > > > > >
>> > > > >
>> > > >
>> > >
>> >
>> β”œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”Όβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”Όβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”Όβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”Όβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”Όβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”Όβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”Όβ”€β”€β”€β”€β”€β”€β”€β”€β”€
>> > > > > > >>>>>>>>>>>>>>>>>>>>>>
>> > > > > > >>>>>>>>>>>>>>>>>>>>>> This table can then be joined with
>> > > > > > >> the
>> > > > > > >>>>>>> actually
>> > > > > > >>>>>>>>>>>> existing
>> > > > > > >>>>>>>>>>>>>>>>>> `savepoint`
>> > > > > > >>>>>>>>>>>>>>>>>>>>>> connector created tables based on UID
>> > > > > > >>>> hash
>> > > > > > >>>>>>>> (which
>> > > > > > >>>>>>>>>> is
>> > > > > > >>>>>>>>>>>>> unique
>> > > > > > >>>>>>>>>>>>>>> and
>> > > > > > >>>>>>>>>>>>>>>>>>> always
>> > > > > > >>>>>>>>>>>>>>>>>>>>>> exists).
>> > > > > > >>>>>>>>>>>>>>>>>>>>>> This would mean that the already
>> > > > > > >>>> existing
>> > > > > > >>>>>>> table
>> > > > > > >>>>>>>>>> would
>> > > > > > >>>>>>>>>>>>> need
>> > > > > > >>>>>>>>>>>>>>>> only a
>> > > > > > >>>>>>>>>>>>>>>>>>>> single
>> > > > > > >>>>>>>>>>>>>>>>>>>>>> metadata column which is the UID
>> > > > > > >> hash.
>> > > > > > >>>>>>>>>>>>>>>>>>>>>> WDYT?
>> > > > > > >>>>>>>>>>>>>>>>>>>>>> @zakelly, plz share your thoughts
>> > > > > > >> too.
>> > > > > > >>>>>>>>>>>>>>>>>>>>>>
>> > > > > > >>>>>>>>>>>>>>>>>>>>>>
>> > > > > > >>>>>>>>>>>>>>>>>>>>>>> If we opt to use metadata columns,
>> > > > > > >>>> every
>> > > > > > >>>>>>>> record
>> > > > > > >>>>>>>>>> in
>> > > > > > >>>>>>>>>>>> the
>> > > > > > >>>>>>>>>>>>>>> table
>> > > > > > >>>>>>>>>>>>>>>>>> would
>> > > > > > >>>>>>>>>>>>>>>>>>>> end
>> > > > > > >>>>>>>>>>>>>>>>>>>>> up
>> > > > > > >>>>>>>>>>>>>>>>>>>>>>> having identical values for these
>> > > > > > >>>>> columns
>> > > > > > >>>>>>>>> (please
>> > > > > > >>>>>>>>>>>>> correct
>> > > > > > >>>>>>>>>>>>>>> me
>> > > > > > >>>>>>>>>>>>>>>> if
>> > > > > > >>>>>>>>>>>>>>>>>> I’m
>> > > > > > >>>>>>>>>>>>>>>>>>>>>>> mistaken). On the other hand, the
>> > > > > > >>>> state
>> > > > > > >>>>>>>>> connector
>> > > > > > >>>>>>>>>>>>>> requires
>> > > > > > >>>>>>>>>>>>>>>>> users
>> > > > > > >>>>>>>>>>>>>>>>>> to
>> > > > > > >>>>>>>>>>>>>>>>>>>>>> specify
>> > > > > > >>>>>>>>>>>>>>>>>>>>>>> an operator UID or operator UID
>> > > > > > >> hash,
>> > > > > > >>>>>>> after
>> > > > > > >>>>>>>>> which
>> > > > > > >>>>>>>>>>> it
>> > > > > > >>>>>>>>>>>>>>> outputs
>> > > > > > >>>>>>>>>>>>>>>>>>>>> user-defined
>> > > > > > >>>>>>>>>>>>>>>>>>>>>>> values in its records. This
>> > > > > > >> approach
>> > > > > > >>>>> feels
>> > > > > > >>>>>>>>>> somewhat
>> > > > > > >>>>>>>>>>>>>>> redundant
>> > > > > > >>>>>>>>>>>>>>>>> to
>> > > > > > >>>>>>>>>>>>>>>>>>> me.
>> > > > > > >>>>>>>>>>>>>>>>>>>>>>>
>> > > > > > >>>>>>>>>>>>>>>>>>>>>>
>> > > > > > >>>>>>>>>>>>>>>>>>>>>> If we would add a new
>> > > > > > >>>> `savepoint-metadata`
>> > > > > > >>>>>>>>>> connector
>> > > > > > >>>>>>>>>>>> then
>> > > > > > >>>>>>>>>>>>>>> this
>> > > > > > >>>>>>>>>>>>>>>>> can
>> > > > > > >>>>>>>>>>>>>>>>>> be
>> > > > > > >>>>>>>>>>>>>>>>>>>>>> addressed.
>> > > > > > >>>>>>>>>>>>>>>>>>>>>> On the other hand UID and UID hash
>> > > > > > >> are
>> > > > > > >>>>>>> having
>> > > > > > >>>>>>>>>>> either-or
>> > > > > > >>>>>>>>>>>>>>>>>> relationship
>> > > > > > >>>>>>>>>>>>>>>>>>>> from
>> > > > > > >>>>>>>>>>>>>>>>>>>>>> config perspective,
>> > > > > > >>>>>>>>>>>>>>>>>>>>>> so when a user provides the UID then
>> > > > > > >>>>> he/she
>> > > > > > >>>>>>> can
>> > > > > > >>>>>>>>> be
>> > > > > > >>>>>>>>>>>>>> interested
>> > > > > > >>>>>>>>>>>>>>>> in
>> > > > > > >>>>>>>>>>>>>>>>>> the
>> > > > > > >>>>>>>>>>>>>>>>>>>> hash
>> > > > > > >>>>>>>>>>>>>>>>>>>>>> for further calculations
>> > > > > > >>>>>>>>>>>>>>>>>>>>>> (the whole Flink internals are
>> > > > > > >>>> depending
>> > > > > > >>>>> on
>> > > > > > >>>>>>> the
>> > > > > > >>>>>>>>>>> hash).
>> > > > > > >>>>>>>>>>>>>>> Printing
>> > > > > > >>>>>>>>>>>>>>>>> out
>> > > > > > >>>>>>>>>>>>>>>>>>> the
>> > > > > > >>>>>>>>>>>>>>>>>>>>>> human readable UID
>> > > > > > >>>>>>>>>>>>>>>>>>>>>> is an explicit requirement from the
>> > > > > > >>>> user
>> > > > > > >>>>>>> side
>> > > > > > >>>>>>>>>> because
>> > > > > > >>>>>>>>>>>>>> hashes
>> > > > > > >>>>>>>>>>>>>>>> are
>> > > > > > >>>>>>>>>>>>>>>>>> not
>> > > > > > >>>>>>>>>>>>>>>>>>>>> human
>> > > > > > >>>>>>>>>>>>>>>>>>>>>> readable.
>> > > > > > >>>>>>>>>>>>>>>>>>>>>>
>> > > > > > >>>>>>>>>>>>>>>>>>>>>>
>> > > > > > >>>>>>>>>>>>>>>>>>>>>>> 3. Handling LIST and MAP States in
>> > > > > > >>>> the
>> > > > > > >>>>>>> State
>> > > > > > >>>>>>>>>>>> Connector
>> > > > > > >>>>>>>>>>>>>>>>>>>>>>> I have concerns about how the
>> > > > > > >> current
>> > > > > > >>>>>>> design
>> > > > > > >>>>>>>>>>> handles
>> > > > > > >>>>>>>>>>>>> LIST
>> > > > > > >>>>>>>>>>>>>>> and
>> > > > > > >>>>>>>>>>>>>>>>> MAP
>> > > > > > >>>>>>>>>>>>>>>>>>>>> states.
>> > > > > > >>>>>>>>>>>>>>>>>>>>>>> Specifically, the state connector
>> > > > > > >>>> uses
>> > > > > > >>>>>>> Flink
>> > > > > > >>>>>>>>>> SQL’s
>> > > > > > >>>>>>>>>>>> MAP
>> > > > > > >>>>>>>>>>>>>> and
>> > > > > > >>>>>>>>>>>>>>>>> ARRAY
>> > > > > > >>>>>>>>>>>>>>>>>>>> types,
>> > > > > > >>>>>>>>>>>>>>>>>>>>>>> which implies that it attempts to
>> > > > > > >>>> load
>> > > > > > >>>>>>> entire
>> > > > > > >>>>>>>>> MAP
>> > > > > > >>>>>>>>>>> or
>> > > > > > >>>>>>>>>>>>> LIST
>> > > > > > >>>>>>>>>>>>>>>>> states
>> > > > > > >>>>>>>>>>>>>>>>>>> into
>> > > > > > >>>>>>>>>>>>>>>>>>>>>>> memory.
>> > > > > > >>>>>>>>>>>>>>>>>>>>>>>
>> > > > > > >>>>>>>>>>>>>>>>>>>>>>> However, in many real-world
>> > > > > > >>>> scenarios,
>> > > > > > >>>>>>> these
>> > > > > > >>>>>>>>>> states
>> > > > > > >>>>>>>>>>>> can
>> > > > > > >>>>>>>>>>>>>>> grow
>> > > > > > >>>>>>>>>>>>>>>>> very
>> > > > > > >>>>>>>>>>>>>>>>>>>>> large.
>> > > > > > >>>>>>>>>>>>>>>>>>>>>>> Typically, the state API addresses
>> > > > > > >>>> this
>> > > > > > >>>>> by
>> > > > > > >>>>>>>>>>> providing
>> > > > > > >>>>>>>>>>>> an
>> > > > > > >>>>>>>>>>>>>>>>> iterator
>> > > > > > >>>>>>>>>>>>>>>>>> to
>> > > > > > >>>>>>>>>>>>>>>>>>>>>>> traverse elements within the state
>> > > > > > >>>>>>>>> incrementally.
>> > > > > > >>>>>>>>>>> I’m
>> > > > > > >>>>>>>>>>>>>>> unsure
>> > > > > > >>>>>>>>>>>>>>>>>>> whether
>> > > > > > >>>>>>>>>>>>>>>>>>>>> I’ve
>> > > > > > >>>>>>>>>>>>>>>>>>>>>>> missed something in FLIP-496 or
>> > > > > > >>>>> FLIP-512,
>> > > > > > >>>>>>> but
>> > > > > > >>>>>>>>> it
>> > > > > > >>>>>>>>>>>> seems
>> > > > > > >>>>>>>>>>>>>> that
>> > > > > > >>>>>>>>>>>>>>>> the
>> > > > > > >>>>>>>>>>>>>>>>>>>> current
>> > > > > > >>>>>>>>>>>>>>>>>>>>>>> design might struggle with
>> > > > > > >>>> scalability
>> > > > > > >>>>> in
>> > > > > > >>>>>>>> such
>> > > > > > >>>>>>>>>>> cases.
>> > > > > > >>>>>>>>>>>>>>>>>>>>>>>
>> > > > > > >>>>>>>>>>>>>>>>>>>>>>
>> > > > > > >>>>>>>>>>>>>>>>>>>>>> You see it good, the current
>> > > > > > >>>>> implementation
>> > > > > > >>>>>>>> keeps
>> > > > > > >>>>>>>>>>> state
>> > > > > > >>>>>>>>>>>>>> for a
>> > > > > > >>>>>>>>>>>>>>>>>> single
>> > > > > > >>>>>>>>>>>>>>>>>>>> key
>> > > > > > >>>>>>>>>>>>>>>>>>>>> in
>> > > > > > >>>>>>>>>>>>>>>>>>>>>> memory.
>> > > > > > >>>>>>>>>>>>>>>>>>>>>> Back in the days we've considered
>> > > > > > >> this
>> > > > > > >>>>>>>> potential
>> > > > > > >>>>>>>>>>> issue
>> > > > > > >>>>>>>>>>>>> and
>> > > > > > >>>>>>>>>>>>>>>>>> concluded
>> > > > > > >>>>>>>>>>>>>>>>>>>> that
>> > > > > > >>>>>>>>>>>>>>>>>>>>>> this is not necessarily
>> > > > > > >>>>>>>>>>>>>>>>>>>>>> needed for the initial version and
>> > > > > > >> can
>> > > > > > >>>> be
>> > > > > > >>>>>>> done
>> > > > > > >>>>>>>>> as a
>> > > > > > >>>>>>>>>>>> later
>> > > > > > >>>>>>>>>>>>>>>>>>> improvement.
>> > > > > > >>>>>>>>>>>>>>>>>>>>>>
>> > > > > > >>>>>>>>>>>>>>>>>>>>>> Up until now we've seen even in TB
>> > > > > > >>>>>>> savepoints
>> > > > > > >>>>>>>>> that
>> > > > > > >>>>>>>>>>> the
>> > > > > > >>>>>>>>>>>>>> number
>> > > > > > >>>>>>>>>>>>>>>> of
>> > > > > > >>>>>>>>>>>>>>>>>> keys
>> > > > > > >>>>>>>>>>>>>>>>>>>> can
>> > > > > > >>>>>>>>>>>>>>>>>>>>>> be extremely huge but not the per key
>> > > > > > >>>>> state
>> > > > > > >>>>>>>>> itself.
>> > > > > > >>>>>>>>>>>>>>>>>>>>>> But again, this is a good feature
>> > > > > > >> as-is
>> > > > > > >>>>> and
>> > > > > > >>>>>>> can
>> > > > > > >>>>>>>>> be
>> > > > > > >>>>>>>>>>>>> handled
>> > > > > > >>>>>>>>>>>>>>> in a
>> > > > > > >>>>>>>>>>>>>>>>>>>> separate
>> > > > > > >>>>>>>>>>>>>>>>>>>>>> jira.
>> > > > > > >>>>>>>>>>>>>>>>>>>>>>
>> > > > > > >>>>>>>>>>>>>>>>>>>>>>
>> > > > > > >>>>>>>>>>>>>>>>>>>>>>>
>> > > > > > >>>>>>>>>>>>>>>>>>>>>>> Best,
>> > > > > > >>>>>>>>>>>>>>>>>>>>>>> Shengkai
>> > > > > > >>>>>>>>>>>>>>>>>>>>>>>
>> > > > > > >>>>>>>>>>>>>>>>>>>>>>> [1]
>> > > > > > >>>>>>>>>>>>>>>>>
>> > > > > > >>>>>>>>>
>> > > https://www.postgresql.org/docs/current/view-pg-tables.html
>> > > > > > >>>>>>>>>>>>>>>>>>>>>>> [2]
>> > > > > > >>>>>>>>>>>>>>>>>>>>>>>
>> > > > > > >>>>>>>>>>>>>>>>>>>>>>>
>> > > > > > >>>>>>>>>>>>>>>>>>>>>>
>> > > > > > >>>>>>>>>>>>>>>>>>>>>
>> > > > > > >>>>>>>>>>>>>>>>>>>>
>> > > > > > >>>>>>>>>>>>>>>>>>>
>> > > > > > >>>>>>>>>>>>>>>>>>
>> > > > > > >>>>>>>>>>>>>>>>>
>> > > > > > >>>>>>>>>>>>>>>>
>> > > > > > >>>>>>>>>>>>>>>
>> > > > > > >>>>>>>>>>>>>>
>> > > > > > >>>>>>>>>>>>>
>> > > > > > >>>>>>>>>>>>
>> > > > > > >>>>>>>>>>>
>> > > > > > >>>>>>>>>>
>> > > > > > >>>>>>>>>
>> > > > > > >>>>>>>>
>> > > > > > >>>>>>>
>> > > > > > >>>>>
>> > > > > > >>>>
>> > > > > > >>
>> > > > > >
>> > > > >
>> > > >
>> > >
>> >
>> https://dev.mysql.com/doc/refman/8.4/en/information-schema-tables-table.html
>> > > > > > >>>>>>>>>>>>>>>>>>>>>>>
>> > > > > > >>>>>>>>>>>>>>>>>>>>>>> Gabor Somogyi <
>> > > > > > >>>>> gabor.g.somo...@gmail.com>
>> > > > > > >>>>>>>>>>>> 于2025εΉ΄3月3ζ—₯周一
>> > > > > > >>>>>>>>>>>>>>>>> 02:00ε†™ι“οΌš
>> > > > > > >>>>>>>>>>>>>>>>>>>>>>>
>> > > > > > >>>>>>>>>>>>>>>>>>>>>>>> Hi Zakelly,
>> > > > > > >>>>>>>>>>>>>>>>>>>>>>>>
>> > > > > > >>>>>>>>>>>>>>>>>>>>>>>> In order to shoot for simplicity
>> > > > > > >>>>>>> `METADATA
>> > > > > > >>>>>>>>>>> VIRTUAL`
>> > > > > > >>>>>>>>>>>>> as
>> > > > > > >>>>>>>>>>>>>>> key
>> > > > > > >>>>>>>>>>>>>>>>>> words
>> > > > > > >>>>>>>>>>>>>>>>>>>> for
>> > > > > > >>>>>>>>>>>>>>>>>>>>>>>> definition is the target.
>> > > > > > >>>>>>>>>>>>>>>>>>>>>>>> When it's not super complex the
>> > > > > > >>>> latter
>> > > > > > >>>>>>> can
>> > > > > > >>>>>>>> be
>> > > > > > >>>>>>>>>>> added
>> > > > > > >>>>>>>>>>>>>> too.
>> > > > > > >>>>>>>>>>>>>>>>>>>>>>>>
>> > > > > > >>>>>>>>>>>>>>>>>>>>>>>> BR,
>> > > > > > >>>>>>>>>>>>>>>>>>>>>>>> G
>> > > > > > >>>>>>>>>>>>>>>>>>>>>>>>
>> > > > > > >>>>>>>>>>>>>>>>>>>>>>>>
>> > > > > > >>>>>>>>>>>>>>>>>>>>>>>> On Sun, Mar 2, 2025 at 3:37β€―PM
>> > > > > > >>>> Zakelly
>> > > > > > >>>>>>> Lan
>> > > > > > >>>>>>>> <
>> > > > > > >>>>>>>>>>>>>>>>>>> zakelly....@gmail.com>
>> > > > > > >>>>>>>>>>>>>>>>>>>>>>> wrote:
>> > > > > > >>>>>>>>>>>>>>>>>>>>>>>>
>> > > > > > >>>>>>>>>>>>>>>>>>>>>>>>> Hi Gabor,
>> > > > > > >>>>>>>>>>>>>>>>>>>>>>>>>
>> > > > > > >>>>>>>>>>>>>>>>>>>>>>>>> +1 for this.
>> > > > > > >>>>>>>>>>>>>>>>>>>>>>>>>
>> > > > > > >>>>>>>>>>>>>>>>>>>>>>>>> Will the metadata column use
>> > > > > > >>>>> `METADATA
>> > > > > > >>>>>>>>>> VIRTUAL`
>> > > > > > >>>>>>>>>>>> as
>> > > > > > >>>>>>>>>>>>>> key
>> > > > > > >>>>>>>>>>>>>>>>> words
>> > > > > > >>>>>>>>>>>>>>>>>>> for
>> > > > > > >>>>>>>>>>>>>>>>>>>>>>>>> definition, or `METADATA FROM
>> > > > > > >> xxx
>> > > > > > >>>>>>>> VIRTUAL`
>> > > > > > >>>>>>>>>> for
>> > > > > > >>>>>>>>>>>>>>> renaming,
>> > > > > > >>>>>>>>>>>>>>>>> just
>> > > > > > >>>>>>>>>>>>>>>>>>>> like
>> > > > > > >>>>>>>>>>>>>>>>>>>>>> the
>> > > > > > >>>>>>>>>>>>>>>>>>>>>>>>> Kafka table?
>> > > > > > >>>>>>>>>>>>>>>>>>>>>>>>>
>> > > > > > >>>>>>>>>>>>>>>>>>>>>>>>>
>> > > > > > >>>>>>>>>>>>>>>>>>>>>>>>> Best,
>> > > > > > >>>>>>>>>>>>>>>>>>>>>>>>> Zakelly
>> > > > > > >>>>>>>>>>>>>>>>>>>>>>>>>
>> > > > > > >>>>>>>>>>>>>>>>>>>>>>>>> On Sat, Mar 1, 2025 at 1:31β€―PM
>> > > > > > >>>> Gabor
>> > > > > > >>>>>>>>> Somogyi
>> > > > > > >>>>>>>>>> <
>> > > > > > >>>>>>>>>>>>>>>>>>>>>>> gabor.g.somo...@gmail.com>
>> > > > > > >>>>>>>>>>>>>>>>>>>>>>>>> wrote:
>> > > > > > >>>>>>>>>>>>>>>>>>>>>>>>>
>> > > > > > >>>>>>>>>>>>>>>>>>>>>>>>>> Hi All,
>> > > > > > >>>>>>>>>>>>>>>>>>>>>>>>>>
>> > > > > > >>>>>>>>>>>>>>>>>>>>>>>>>> I'd like to start a
>> > > > > > >> discussion
>> > > > > > >>>> of
>> > > > > > >>>>>>>>> FLIP-512:
>> > > > > > >>>>>>>>>>> Add
>> > > > > > >>>>>>>>>>>>>> meta
>> > > > > > >>>>>>>>>>>>>>>>>>>> information
>> > > > > > >>>>>>>>>>>>>>>>>>>>> to
>> > > > > > >>>>>>>>>>>>>>>>>>>>>>> SQL
>> > > > > > >>>>>>>>>>>>>>>>>>>>>>>>>> state connector [1].
>> > > > > > >>>>>>>>>>>>>>>>>>>>>>>>>> Feel free to add your
>> > > > > > >> thoughts
>> > > > > > >>>> to
>> > > > > > >>>>>>> make
>> > > > > > >>>>>>>>> this
>> > > > > > >>>>>>>>>>>>> feature
>> > > > > > >>>>>>>>>>>>>>>>> better.
>> > > > > > >>>>>>>>>>>>>>>>>>>>>>>>>>
>> > > > > > >>>>>>>>>>>>>>>>>>>>>>>>>> [1]
>> > > > > > >>>>>>>>>>>>>>>>>>>>>>>>>>
>> > > > > > >>>>>>>>>>>>>>>>>>>>>>>>>>
>> > > > > > >>>>>>>>>>>>>>>>>>>>>>>>>
>> > > > > > >>>>>>>>>>>>>>>>>>>>>>>>
>> > > > > > >>>>>>>>>>>>>>>>>>>>>>>
>> > > > > > >>>>>>>>>>>>>>>>>>>>>>
>> > > > > > >>>>>>>>>>>>>>>>>>>>>
>> > > > > > >>>>>>>>>>>>>>>>>>>>
>> > > > > > >>>>>>>>>>>>>>>>>>>
>> > > > > > >>>>>>>>>>>>>>>>>>
>> > > > > > >>>>>>>>>>>>>>>>>
>> > > > > > >>>>>>>>>>>>>>>>
>> > > > > > >>>>>>>>>>>>>>>
>> > > > > > >>>>>>>>>>>>>>
>> > > > > > >>>>>>>>>>>>>
>> > > > > > >>>>>>>>>>>>
>> > > > > > >>>>>>>>>>>
>> > > > > > >>>>>>>>>>
>> > > > > > >>>>>>>>>
>> > > > > > >>>>>>>>
>> > > > > > >>>>>>>
>> > > > > > >>>>>
>> > > > > > >>>>
>> > > > > > >>
>> > > > > >
>> > > > >
>> > > >
>> > >
>> >
>> https://cwiki.apache.org/confluence/display/FLINK/FLIP-512%3A+Add+meta+information+to+SQL+state+connector
>> > > > > > >>>>>>>>>>>>>>>>>>>>>>>>>>
>> > > > > > >>>>>>>>>>>>>>>>>>>>>>>>>> BR,
>> > > > > > >>>>>>>>>>>>>>>>>>>>>>>>>> G
>> > > > > > >>>>>>>>>>>>>>>>>>>>>>>>>>
>> > > > > > >>>>>>>>>>>>>>>>>>>>>>>>>
>> > > > > > >>>>>>>>>>>>>>>>>>>>>>>>
>> > > > > > >>>>>>>>>>>>>>>>>>>>>>>
>> > > > > > >>>>>>>>>>>>>>>>>>>>>>
>> > > > > > >>>>>>>>>>>>>>>>>>>>>
>> > > > > > >>>>>>>>>>>>>>>>>>>>
>> > > > > > >>>>>>>>>>>>>>>>>>>
>> > > > > > >>>>>>>>>>>>>>>>>>
>> > > > > > >>>>>>>>>>>>>>>>>
>> > > > > > >>>>>>>>>>>>>>>>
>> > > > > > >>>>>>>>>>>>>>>
>> > > > > > >>>>>>>>>>>>>>
>> > > > > > >>>>>>>>>>>>>
>> > > > > > >>>>>>>>>>>>
>> > > > > > >>>>>>>>>>>
>> > > > > > >>>>>>>>>>
>> > > > > > >>>>>>>>>
>> > > > > > >>>>>>>>
>> > > > > > >>>>>>>
>> > > > > > >>>>>>
>> > > > > > >>>>>
>> > > > > > >>>>
>> > > > > > >>>
>> > > > > > >>
>> > > > > >
>> > > > > >
>> > > > >
>> > > >
>> > >
>> >
>>
>

Reply via email to