Re: [DISCUSS] FLIP-512: Add meta information to SQL state connector

Leonard Xu Wed, 26 Mar 2025 19:08:02 -0700

Your link is broken, Shengkai 

Best,
Leonard


> 2025年3月27日 10:01，Shengkai Fang <[email protected]> 写道：
> 
> Hi, All.
> 
> I write a simple demo to illustrate my idea. Hope this helps.
> 
> Best,
> Shengkai
> 
> https://github.com/apache/flink/compare/master...fsk119:flink:example?expand=1
> 
> Gabor Somogyi <[email protected]> 于2025年3月26日周三 15:54写道：
> 
>>> I'm fine with a seperate SQL connector for metadata, so maybe we could
>> update the FLIP about our discussion?
>> 
>> Sorry, I've forgotten this part. Yeah, no matter we choose I'm going to
>> update the FLIP.
>> 
>> G
>> 
>> 
>> On Wed, Mar 26, 2025 at 8:51 AM Gabor Somogyi <[email protected]>
>> wrote:
>> 
>>> Hi All,
>>> 
>>> I've also lack of the knowledge of PTF so I've read just the motivation
>>> part:
>>> 
>>> "The SQL 2016 standard introduced a way of defining custom SQL operators
>>> defined by ISO/IEC 19075-7:2021 (Part 7: Polymorphic table functions).
>>> ~200 pages define how this new kind of function can consume and produce
>>> tables with various execution properties.
>>> Unfortunately, this part of the standard is not publicly available."
>>> 
>>> Of course we can take a look at some examples but do we really want to
>>> expose state data with this construct
>>> which is described in ~200 pages and part of the standard is not publicly
>>> available? 🙂
>>> I mean the dataset is couple of rows and the use-case is join with
>> another
>>> table like with state data.
>>> If somebody can give advantages I would buy that but from my limited
>>> understanding this would be an overkill here.
>>> 
>>> BR,
>>> G
>>> 
>>> 
>>> On Wed, Mar 26, 2025 at 8:28 AM Gyula Fóra <[email protected]> wrote:
>>> 
>>>> Hi Zakelly , Shengkai!
>>>> 
>>>> I don't know too much about PTFs, it would be interesting to see how the
>>>> usage would look in practice.
>>>> 
>>>> Do you have some mockup/example in mind how the PTF would look for
>> example
>>>> when want to:
>>>> - Simply display/aggregate whats in the metadata
>>>> - Join keyed state with some metadata columns
>>>> 
>>>> Thanks
>>>> Gyula
>>>> 
>>>> On Wed, Mar 26, 2025 at 7:33 AM Zakelly Lan <[email protected]>
>>>> wrote:
>>>> 
>>>>> Hi everyone,
>>>>> 
>>>>> I'm fine with a seperate SQL connector for metadata, so maybe we could
>>>>> update the FLIP about our discussion? And Shengkai provides a PTF
>>>>> implementation, does that also meet the requirement?
>>>>> 
>>>>> 
>>>>> Best,
>>>>> Zakelly
>>>>> 
>>>>> On Thu, Mar 20, 2025 at 4:47 PM Gabor Somogyi <
>>>> [email protected]>
>>>>> wrote:
>>>>> 
>>>>>> Hi All,
>>>>>> 
>>>>>> @Zakelly: Gyula summarised it correctly what I meant so please treat
>>>> the
>>>>>> content as mine.
>>>>>> As an addition I'm not against to add CLI at all, I'm just stating
>>>> that
>>>>> in
>>>>>> some cases like this, users would like to have
>>>>>> a self-serving solution where they can provide SQL statements which
>>>> can
>>>>>> trigger alerts automatically.
>>>>>> 
>>>>>> My personal opinion is that CLI would be beneficial for several
>>>> cases. A
>>>>>> good example is when users want to restart job
>>>>>> from specific Kafka offsets which are persisted in a savepoint. For
>>>> such
>>>>>> scenario users are more than happy since they
>>>>>> expect manual intervention with full control. So all in all one can
>>>> count
>>>>>> on my +1 when CLI FLIP would come up...
>>>>>> 
>>>>>> BR,
>>>>>> G
>>>>>> 
>>>>>> 
>>>>>> On Thu, Mar 20, 2025 at 8:20 AM Gyula Fóra <[email protected]>
>>>> wrote:
>>>>>> 
>>>>>>> Hi!
>>>>>>> 
>>>>>>> @Zakelly Lan <[email protected]>
>>>>>>> I think what Gabor means is that users want to have predefined SQL
>>>>> scripts
>>>>>>> to perform state analysis tasks to debug/identify problems.
>>>>>>> Such as write a SQL script that joins the metadata table with the
>>>> state
>>>>>>> and
>>>>>>> do some analytics on it.
>>>>>>> 
>>>>>>> If we have a meta table then the SQL script that can do this is
>> fixed
>>>>> and
>>>>>>> users can trigger this on demand by simply providing a new
>> savepoint
>>>>> path.
>>>>>>> 
>>>>>>> If we have a different mechanism to extract metadata that is not
>> SQL
>>>>>>> native
>>>>>>> then manual steps need to be executed and a custom SQL script would
>>>> need
>>>>>>> to
>>>>>>> be written that adds the manually extracted metadata into the
>> script.
>>>>>>> 
>>>>>>> Cheers,
>>>>>>> Gyula
>>>>>>> 
>>>>>>> On Thu, Mar 20, 2025 at 4:32 AM Zakelly Lan <[email protected]
>>> 
>>>>>>> wrote:
>>>>>>> 
>>>>>>>> Hi all,
>>>>>>>> 
>>>>>>>> Thanks for your answers! Getting everyone aligned on this topic
>> is
>>>>>>>> challenging, but it’s definitely worth the effort since it will
>>>> help
>>>>>>>> streamline things moving forward.
>>>>>>>> 
>>>>>>>> @Gabor are you saying that users are using some scripts to define
>>>> the
>>>>>>> SQL
>>>>>>>> metadata connector and get the information, right? If so, would a
>>>> CLI
>>>>>>> tool
>>>>>>>> be more convenient? It's easy to invoke and can get the result
>>>>> swiftly.
>>>>>>> And
>>>>>>>> there should be some other systems to track the checkpoint
>> lineage
>>>> and
>>>>>>>> analyze if there are outliers in metadata (e.g. state size of one
>>>>>>> operator)
>>>>>>>> right? Well, maybe I missed something so please correct me if I'm
>>>>> wrong.
>>>>>>>> 
>>>>>>>> I think the overall vision in Flink SQL is to provide a SQL
>> native
>>>>>>>>> environment where we can serve complex use-cases like you would
>>>>> expect
>>>>>>>> in a
>>>>>>>>> regular database.
>>>>>>>> 
>>>>>>>> 
>>>>>>>> @Gyula Well, this is a good point. From the perspective of
>>>>> comprehensive
>>>>>>>> SQL experience, I'd +1 for treating metadata as data. Although I
>>>> doubt
>>>>>>> if
>>>>>>>> there is a need for processing metadata, I won't be against a
>>>> separate
>>>>>>>> connector.
>>>>>>>> 
>>>>>>>> Regarding the CLI tool, I still think it’s worth implementing.
>>>> Such a
>>>>>>> tool
>>>>>>>> could provide savepoint information before resuming from a
>>>> savepoint,
>>>>>>> which
>>>>>>>> would enhance the user experience in CLI-based workflows. It
>> would
>>>> be
>>>>>>> good
>>>>>>>> if someone could implement this feature. We shouldn’t worry about
>>>>>>> whether
>>>>>>>> this tool might be retired in the future. Regardless of the
>>>> SQL-based
>>>>>>>> solution we eventually adopt, this capability will remain
>> essential
>>>>> for
>>>>>>> CLI
>>>>>>>> users. This is another topic.
>>>>>>>> 
>>>>>>>> 
>>>>>>>> Best,
>>>>>>>> Zakelly
>>>>>>>> 
>>>>>>>> 
>>>>>>>> On Thu, Mar 20, 2025 at 10:37 AM Shengkai Fang <
>> [email protected]>
>>>>>>> wrote:
>>>>>>>> 
>>>>>>>>> Hi.
>>>>>>>>> 
>>>>>>>>> After reading the doc[1], I think Spark provides a function for
>>>>> users
>>>>>>> to
>>>>>>>>> consume the metadata from the savepoint.  In Flink SQL, similar
>>>>>>>>> functionality is implemented through Polymorphic Table
>> Functions
>>>>>>> (PTF) as
>>>>>>>>> proposed in FLIP-440[2]. Below is a code example[3]
>> illustrating
>>>>> this
>>>>>>>>> concept:
>>>>>>>>> 
>>>>>>>>> ```
>>>>>>>>>    public static class ScalarArgsFunction extends
>>>>>>>>> TestProcessTableFunctionBase {
>>>>>>>>>        public void eval(Integer i, Boolean b) {
>>>>>>>>>            collectObjects(i, b);
>>>>>>>>>        }
>>>>>>>>>    }
>>>>>>>>> ```
>>>>>>>>> 
>>>>>>>>> ```
>>>>>>>>> INSERT INTO sink SELECT * FROM f(i => 42, b => CAST('TRUE' AS
>>>>>>> BOOLEAN))
>>>>>>>>> ``
>>>>>>>>> 
>>>>>>>>> So we can add a builtin function named `read_state_metadata` to
>>>> read
>>>>>>>>> savepoint data.
>>>>>>>>> 
>>>>>>>>> Best,
>>>>>>>>> Shengkai
>>>>>>>>> 
>>>>>>>>> [1]
>>>>>>>>> 
>>>>>>>>> 
>>>>>>>> 
>>>>>>> 
>>>>> 
>>>> 
>> https://docs.databricks.com/aws/en/structured-streaming/read-state?language=SQL
>>>>>>>>> [2]
>>>>>>>>> 
>>>>>>>> 
>>>>>>> 
>>>>> 
>>>> 
>> https://cwiki.apache.org/confluence/pages/viewpage.action?pageId=298781093
>>>>>>>>> [3]
>>>>>>>>> 
>>>>>>>>> 
>>>>>>>> 
>>>>>>> 
>>>>> 
>>>> 
>> https://github.com/apache/flink/blob/master/flink-table/flink-table-planner/src/test/java/org/apache/flink/table/planner/plan/nodes/exec/stream/ProcessTableFunctionTestPrograms.java#L140
>>>>>>>>> 
>>>>>>>>> Gyula Fóra <[email protected]> 于2025年3月19日周三 18:37写道：
>>>>>>>>> 
>>>>>>>>>> Hi All!
>>>>>>>>>> 
>>>>>>>>>> Thank you for the answers and concerns from everyone.
>>>>>>>>>> 
>>>>>>>>>> On the CLI vs State Metadata Connector/Table question I would
>>>> also
>>>>>>> like
>>>>>>>>> to
>>>>>>>>>> step back a little and look at the bigger picture.
>>>>>>>>>> 
>>>>>>>>>> I think the overall vision in Flink SQL is to provide a SQL
>>>> native
>>>>>>>>>> environment where we can serve complex use-cases like you
>> would
>>>>>>> expect
>>>>>>>>> in a
>>>>>>>>>> regular database.
>>>>>>>>>> Most features, developments in the recent years have gone
>> this
>>>>> way.
>>>>>>>>>> 
>>>>>>>>>> The State Metadata Table would be a natural and
>> straightforward
>>>>> fit
>>>>>>>> here.
>>>>>>>>>> So from my side, +1 for that.
>>>>>>>>>> 
>>>>>>>>>> However I could understand if we are not ready to add a new
>>>>>>>>>> connector/format due to maintenance concerns (and in general
>>>>> concern
>>>>>>>>> about
>>>>>>>>>> the design).
>>>>>>>>>> If that's the issue then we should spend more time on the
>>>> design
>>>>> to
>>>>>>> get
>>>>>>>>>> comfortable with the approach and seek feedback from the
>> wider
>>>>>>>> community
>>>>>>>>>> 
>>>>>>>>>> I am -1 for the CLI/tooling approach as that will not provide
>>>> the
>>>>>>>>>> featureset we are looking for that is not already covered by
>>>> the
>>>>>>> Java
>>>>>>>>>> connector. And that approach would come with the same
>>>> maintenance
>>>>>>>>>> implications.
>>>>>>>>>> 
>>>>>>>>>> Cheers
>>>>>>>>>> Gyula
>>>>>>>>>> 
>>>>>>>>>> 
>>>>>>>>>> On Wed, Mar 19, 2025 at 11:24 AM Gabor Somogyi <
>>>>>>>>> [email protected]>
>>>>>>>>>> wrote:
>>>>>>>>>> 
>>>>>>>>>>> Hi Zaklely, Shengkai
>>>>>>>>>>> 
>>>>>>>>>>> Several topics are going on so adding gist answers to them.
>>>> When
>>>>>>> some
>>>>>>>>>> topic
>>>>>>>>>>> is not touched please highlight it.
>>>>>>>>>>> 
>>>>>>>>>>> @Shengkai: I've read through all the previous FLIPs related
>>>>>>> catalogs
>>>>>>>>> and
>>>>>>>>>> if
>>>>>>>>>>> we would like to keep the concepts there
>>>>>>>>>>> then one-to-one mapping relationship between savepoint and
>>>>> catalog
>>>>>>>> is a
>>>>>>>>>>> reasonable direction. In short I'm happy that
>>>>>>>>>>> you've highlighted this and agree as a whole. I've written
>> it
>>>>> down
>>>>>>>>>>> previously, just want to double confirm that state catalog
>> is
>>>>>>>>>>> essential and planned. When we reach this point then your
>>>> input
>>>>> is
>>>>>>>> more
>>>>>>>>>>> than welcome.
>>>>>>>>>>> 
>>>>>>>>>>> @Zakelly: We've tried the CLI and separate library
>> approaches
>>>>> with
>>>>>>>>> users
>>>>>>>>>>> already and these are not something which is welcome
>> because
>>>> of
>>>>>>> the
>>>>>>>>>>> following:
>>>>>>>>>>> * Users want to have automated tasks and not manual
>>>> CLI/library
>>>>>>>> output
>>>>>>>>>>> parsing. This can be hacked around but our experience is
>>>>> negative
>>>>>>> on
>>>>>>>>> this
>>>>>>>>>>> because it's just brittle.
>>>>>>>>>>> * From development perspective It's way much bigger effort
>>>> than
>>>>> a
>>>>>>>>>> connector
>>>>>>>>>>> (hard to test, packaging/version handling is and extra
>> layer
>>>> of
>>>>>>>>>> complexity,
>>>>>>>>>>> external FS authentication is pain for users, expecting
>> them
>>>> to
>>>>>>>>> download
>>>>>>>>>>> savepoints also)
>>>>>>>>>>> * Purely personal opinion but if we would find better ways
>>>> later
>>>>>>> then
>>>>>>>>>>> retire a CLI is not more lightweight than retire a
>> connector
>>>>>>>>>>> 
>>>>>>>>>>>> It would be great if you give some examples on how user
>>>> could
>>>>>>>>> leverage
>>>>>>>>>>> the separate connector to process the metadata.
>>>>>>>>>>> 
>>>>>>>>>>> The most simplest cases:
>>>>>>>>>>> * give me the overgroving state uids
>>>>>>>>>>> * give me the not known (new or renamed) state uids
>>>>>>>>>>> * give me the state uids where state size drastically
>> dropped
>>>>>>> compare
>>>>>>>>> to
>>>>>>>>>> a
>>>>>>>>>>> previous savepoint (accidental state loss)
>>>>>>>>>>> 
>>>>>>>>>>> Since it was mentioned: as a general offtopic teaser, yeah
>> it
>>>>>>> would
>>>>>>>> be
>>>>>>>>>> good
>>>>>>>>>>> to have some sort of checkpoint/savepoint lineage or
>> however
>>>> we
>>>>>>> call
>>>>>>>>> it.
>>>>>>>>>>> Since we've not yet reached this point there are no
>> technical
>>>>>>>> details,
>>>>>>>>>> it's
>>>>>>>>>>> more like a vision. It's a common pattern that
>>>>>>>>>>> jobs are physically running but somehow the state
>> processing
>>>> is
>>>>>>> stuck
>>>>>>>>> and
>>>>>>>>>>> it would be good to add some way to find it out
>>>> automatically.
>>>>>>>>>>> The important saying here is automation and not manual
>>>>> evaluation
>>>>>>>> since
>>>>>>>>>>> handling 10k+ jobs is just not allowing that.
>>>>>>>>>>> 
>>>>>>>>>>> BR,
>>>>>>>>>>> G
>>>>>>>>>>> 
>>>>>>>>>>> 
>>>>>>>>>>> On Wed, Mar 19, 2025 at 6:46 AM Shengkai Fang <
>>>>> [email protected]>
>>>>>>>>> wrote:
>>>>>>>>>>> 
>>>>>>>>>>>> Hi, All.
>>>>>>>>>>>> 
>>>>>>>>>>>> About State Catalog, I want to share more thoughts about
>>>> this.
>>>>>>>>>>>> 
>>>>>>>>>>>> In the initial design concept, I understood that a
>>>> savepoint
>>>>>>> and a
>>>>>>>>>> state
>>>>>>>>>>>> catalog have a one-to-one mapping relationship. Each
>>>> operator
>>>>>>>>>> corresponds
>>>>>>>>>>>> to a database, and the state of each operator is
>>>> represented
>>>>> as
>>>>>>>>>>> individual
>>>>>>>>>>>> tables. The rationale behind this design is:
>>>>>>>>>>>> 
>>>>>>>>>>>> *State Diversity*: An operator may involve multiple types
>>>> of
>>>>>>>> states.
>>>>>>>>>> For
>>>>>>>>>>>> example, in our VVR design, a "multi-join" operator uses
>>>> keyed
>>>>>>>> states
>>>>>>>>>> for
>>>>>>>>>>>> two input streams and a broadcast state for the third
>>>> stream.
>>>>>>> This
>>>>>>>>>> makes
>>>>>>>>>>> it
>>>>>>>>>>>> challenging to represent all states of an operator
>> within a
>>>>>>> single
>>>>>>>>>> table.
>>>>>>>>>>>> *Scalability*: Internally, an operator might have
>> multiple
>>>>> keyed
>>>>>>>>> states
>>>>>>>>>>>> (e.g., value state and list state). However, large list
>>>> states
>>>>>>> may
>>>>>>>>> not
>>>>>>>>>>> fit
>>>>>>>>>>>> entirely in memory. To address this, we recommend
>>>> implementing
>>>>>>> each
>>>>>>>>>> state
>>>>>>>>>>>> as a separate table.
>>>>>>>>>>>> 
>>>>>>>>>>>> To resolve the loosely coupled relationships between
>>>> operator
>>>>>>>> states,
>>>>>>>>>> we
>>>>>>>>>>>> propose embedding predefined views within the catalog.
>>>> These
>>>>>>> views
>>>>>>>>>>> simplify
>>>>>>>>>>>> user understanding of operator implementations and
>> provide
>>>> a
>>>>>>> more
>>>>>>>>>>> intuitive
>>>>>>>>>>>> perspective. For instance, a join operator may have
>>>> multiple
>>>>>>> state
>>>>>>>>>>>> implementations (depending on whether the join key
>> includes
>>>>>>> unique
>>>>>>>>>>>> attributes), but users primarily care about the data
>>>>> associated
>>>>>>>> with
>>>>>>>>> a
>>>>>>>>>>>> specific join key across input streams.
>>>>>>>>>>>> 
>>>>>>>>>>>> Returning to the one-to-one mapping between savepoints
>> and
>>>>>>>> catalogs,
>>>>>>>>> we
>>>>>>>>>>> aim
>>>>>>>>>>>> to manage multiple user state catalogs through a catalog
>>>>> store.
>>>>>>>> When
>>>>>>>>> a
>>>>>>>>>>> user
>>>>>>>>>>>> triggers a savepoint for a job on the platform:
>>>>>>>>>>>> 
>>>>>>>>>>>> 1. The platform sends a REST request to the JobManager.
>>>>>>>>>>>> 2. Simultaneously, it registers a new state catalog in
>> the
>>>>>>> catalog
>>>>>>>>>> store,
>>>>>>>>>>>> enabling immediate analysis of state data on the
>> platform.
>>>>>>>>>>>> 3. Deleting a savepoint would also trigger the removal of
>>>> its
>>>>>>>>>> associated
>>>>>>>>>>>> catalog.
>>>>>>>>>>>> 
>>>>>>>>>>>> This vision assumes that states are self-describing or
>>>> that a
>>>>>>> state
>>>>>>>>>>>> metaservice is introduced to analyze savepoint
>> structures.
>>>>>>>>>>>> 
>>>>>>>>>>>>> How can users create logic to identify differences
>>>> between
>>>>>>>> multiple
>>>>>>>>>>>> savepoints?
>>>>>>>>>>>> 
>>>>>>>>>>>> Since savepoints and state catalogs are one-to-one
>> mapped,
>>>>> users
>>>>>>>> can
>>>>>>>>>>> query
>>>>>>>>>>>> metadata via their respective catalogs. For example:
>>>>>>>>>>>> 
>>>>>>>>>>>> 1.
>>>>> `savepoint-${id}`.`system`.`metadata_table`.`<operator-name>`
>>>>>>>>>> provides
>>>>>>>>>>>> operator-specific metadata (e.g., state size, type).
>>>>>>>>>>>> 2. Comparing metadata tables (e.g., schema versions,
>> state
>>>>> entry
>>>>>>>>>> counts)
>>>>>>>>>>>> across catalogs reveals structural or quantitative
>>>>> differences.
>>>>>>>>>>>> 3. For deeper analysis, users could write SQL queries to
>>>>> compare
>>>>>>>>>> specific
>>>>>>>>>>>> state partitions or leverage the metaservice to track
>> state
>>>>>>>> evolution
>>>>>>>>>>>> (e.g., added/removed operators, modified state
>>>>> configurations).
>>>>>>>>>>>> 
>>>>>>>>>>>> If we plan to introduce a state catalog in the future, I
>>>> would
>>>>>>> lean
>>>>>>>>>>> toward
>>>>>>>>>>>> using metadata tables. If a utility tool can address the
>>>>>>> challenges
>>>>>>>>> we
>>>>>>>>>>>> face, could we avoid introducing an additional connector?
>>>>>>>>>>>> 
>>>>>>>>>>>> Best,
>>>>>>>>>>>> Shengkai
>>>>>>>>>>>> 
>>>>>>>>>>>> Gyula Fóra <[email protected]> 于2025年3月17日周一 20:25写道：
>>>>>>>>>>>> 
>>>>>>>>>>>>> Hi All!
>>>>>>>>>>>>> 
>>>>>>>>>>>>> Without going into too much detail here are my 2 cents
>>>>>>> regarding
>>>>>>>>> the
>>>>>>>>>>>>> virtual column / catalog metadata / table (connector)
>>>>>>> discussion
>>>>>>>>> for
>>>>>>>>>>> the
>>>>>>>>>>>>> State metadata.
>>>>>>>>>>>>> 
>>>>>>>>>>>>> State metadata such as the types of states, their
>>>>> properties,
>>>>>>>>> names,
>>>>>>>>>>>> sizes
>>>>>>>>>>>>> etc are all valuable information that can be used to
>>>> enrich
>>>>>>> the
>>>>>>>>>>>>> computations we do on state.
>>>>>>>>>>>>> We can either analyze it standalone (such as discover
>>>>>>> anomalies,
>>>>>>>>> for
>>>>>>>>>>>> large
>>>>>>>>>>>>> jobs with many states), across multiple savepoints
>>>> (discover
>>>>>>> how
>>>>>>>>>> state
>>>>>>>>>>>>> changed over time) or by joining it with keyed or
>>>> non-keyed
>>>>>>> state
>>>>>>>>>> data
>>>>>>>>>>> to
>>>>>>>>>>>>> serve more complex queries on the state.
>>>>>>>>>>>>> 
>>>>>>>>>>>>> The only solution that seems to serve all these
>> use-cases
>>>>> and
>>>>>>>>>>>> requirements
>>>>>>>>>>>>> in a straightforward and SQL canonical way is to simply
>>>>> expose
>>>>>>>> the
>>>>>>>>>>> state
>>>>>>>>>>>>> metadata as a separate table. This is a metadata table
>>>> but
>>>>> you
>>>>>>>> can
>>>>>>>>>> also
>>>>>>>>>>>>> think of it as data table, it makes no practical
>>>> difference
>>>>>>> here.
>>>>>>>>>>>>> 
>>>>>>>>>>>>> Once we have a catalog later, the catalog can offer
>> this
>>>>> table
>>>>>>>> out
>>>>>>>>> of
>>>>>>>>>>> the
>>>>>>>>>>>>> box, the same way databases provide metadata tables.
>> For
>>>>> this
>>>>>>> to
>>>>>>>>> work
>>>>>>>>>>>>> however we need another, simpler connector that creates
>>>> this
>>>>>>>> table.
>>>>>>>>>>>>> 
>>>>>>>>>>>>> +1 for state metadata as a separate connector/table,
>>>> instead
>>>>>>> of
>>>>>>>>>> adding
>>>>>>>>>>>>> virtual columns and adhoc catalog metadata that is hard
>>>> to
>>>>> use
>>>>>>>> in a
>>>>>>>>>>> large
>>>>>>>>>>>>> number of queries.
>>>>>>>>>>>>> 
>>>>>>>>>>>>> Cheers,
>>>>>>>>>>>>> Gyula
>>>>>>>>>>>>> 
>>>>>>>>>>>>> On Mon, Mar 17, 2025 at 12:44 PM Gabor Somogyi <
>>>>>>>>>>>> [email protected]>
>>>>>>>>>>>>> wrote:
>>>>>>>>>>>>> 
>>>>>>>>>>>>>> 1. State TTL for Value Columns
>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>> I’m planning on adding this, and we may collaborate
>>>> on
>>>>> it
>>>>>>> in
>>>>>>>>> the
>>>>>>>>>>>>> future.
>>>>>>>>>>>>>> 
>>>>>>>>>>>>>> +1 on this, just ping me.
>>>>>>>>>>>>>> 
>>>>>>>>>>>>>> 2. Metadata Table vs. Metadata Column
>>>>>>>>>>>>>> 
>>>>>>>>>>>>>> After some code digging and POC all I can say that
>> with
>>>>>>> heavy
>>>>>>>>>> effort
>>>>>>>>>>> we
>>>>>>>>>>>>> can
>>>>>>>>>>>>>> maybe add such changes that we're able to show
>> metadata
>>>>> of a
>>>>>>>>>>> savepoint
>>>>>>>>>>>>> from
>>>>>>>>>>>>>> catalog.
>>>>>>>>>>>>>> I'm not against that but from user perspective this
>> has
>>>>>>> limited
>>>>>>>>>>> value,
>>>>>>>>>>>>> let
>>>>>>>>>>>>>> me explain why.
>>>>>>>>>>>>>> 
>>>>>>>>>>>>>> From high level perspective I see the following
>> which I
>>>>> see
>>>>>>>>>> agreement
>>>>>>>>>>>> on:
>>>>>>>>>>>>>> * We should have a catalog which is representing one
>> or
>>>>> more
>>>>>>>> jobs
>>>>>>>>>>>>> savepoint
>>>>>>>>>>>>>> data set (future plan)
>>>>>>>>>>>>>> * Savepoints should be able to be registered in the
>>>>> catalog
>>>>>>>> which
>>>>>>>>>> are
>>>>>>>>>>>>> then
>>>>>>>>>>>>>> databases (future plan)
>>>>>>>>>>>>>> * There must be a possiblity to create tables from
>>>>> databases
>>>>>>>>> where
>>>>>>>>>>>> users
>>>>>>>>>>>>>> can read state data (exists already)
>>>>>>>>>>>>>> 
>>>>>>>>>>>>>> In terms of metadata, If I understand correctly then
>>>> the
>>>>>>>>> suggested
>>>>>>>>>>>>> approach
>>>>>>>>>>>>>> would be to access
>>>>>>>>>>>>>> it from the catalog describe command, right? Adding
>>>> that
>>>>>>> info
>>>>>>>>> when
>>>>>>>>>>>>> specific
>>>>>>>>>>>>>> database describe command
>>>>>>>>>>>>>> is executed could be done.
>>>>>>>>>>>>>> 
>>>>>>>>>>>>>> The question is for instance how can users create
>> such
>>>> a
>>>>>>> logic
>>>>>>>>> that
>>>>>>>>>>>> tells
>>>>>>>>>>>>>> them what is
>>>>>>>>>>>>>> the difference between multiple savepoints?
>>>>>>>>>>>>>> Just to give some examples:
>>>>>>>>>>>>>> * per operator size changes between savepoints
>>>>>>>>>>>>>> * show values from operator data where state size
>>>> reaches
>>>>> a
>>>>>>>>>> boundary
>>>>>>>>>>>>>> * in general "find which checkpoint ruined things" is
>>>>> quite
>>>>>>>>> common
>>>>>>>>>>>>> pattern
>>>>>>>>>>>>>> What I would like to highlight here is that from
>> Flink
>>>>>>> point of
>>>>>>>>>> view
>>>>>>>>>>>> the
>>>>>>>>>>>>>> metadata can be
>>>>>>>>>>>>>> considered as a static side output information but
>> for
>>>>> users
>>>>>>>>> these
>>>>>>>>>>>> values
>>>>>>>>>>>>>> are actual real data
>>>>>>>>>>>>>> where logic is planned to build around.
>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>> The metadata is more like one-time information
>>>> instead
>>>>> of
>>>>>>> a
>>>>>>>>>>> streaming
>>>>>>>>>>>>>> data that changes all
>>>>>>>>>>>>>> the time, so a single connector seems to be an
>>>> overkill.
>>>>>>>>>>>>>> 
>>>>>>>>>>>>>> State data is also static within a savepoint and
>> that's
>>>>> the
>>>>>>>>> reason
>>>>>>>>>>> why
>>>>>>>>>>>>> the
>>>>>>>>>>>>>> state processor API is working in batch mode.
>>>>>>>>>>>>>> When we handle multiple checkpoints in a streaming
>>>> fashion
>>>>>>> then
>>>>>>>>>> this
>>>>>>>>>>>> can
>>>>>>>>>>>>> be
>>>>>>>>>>>>>> viewed from another angle.
>>>>>>>>>>>>>> 
>>>>>>>>>>>>>> We can come up with more lightweight solution other
>>>> than a
>>>>>>> new
>>>>>>>>>>>> connector
>>>>>>>>>>>>>> but enforcing users to parse the catalog
>>>>>>>>>>>>>> describe command output in order to compare multiple
>>>>>>> savepoints
>>>>>>>>>>> doesn't
>>>>>>>>>>>>>> sound smooth user experience.
>>>>>>>>>>>>>> Honestly I've no other idea how exposing metadata as
>>>> real
>>>>>>> user
>>>>>>>>> data
>>>>>>>>>>> so
>>>>>>>>>>>>>> waiting on other approaches.
>>>>>>>>>>>>>> 
>>>>>>>>>>>>>> BR,
>>>>>>>>>>>>>> G
>>>>>>>>>>>>>> 
>>>>>>>>>>>>>> 
>>>>>>>>>>>>>> On Thu, Mar 13, 2025 at 2:44 AM Shengkai Fang <
>>>>>>>> [email protected]
>>>>>>>>>> 
>>>>>>>>>>>> wrote:
>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>> Looking forward to hearing the good news!
>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>> Best,
>>>>>>>>>>>>>>> Shengkai
>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>> Gabor Somogyi <[email protected]>
>>>> 于2025年3月12日周三
>>>>>>>>> 22:24写道：
>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>> Thanks for both the valuable input!
>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>> Let me take a closer look at the suggestions,
>> like
>>>> the
>>>>>>>>> Catalog
>>>>>>>>>>>>>>> capabilities
>>>>>>>>>>>>>>>> and possibility of embedding TypeInformation or
>>>>>>>>>>>>>>>> StateDescriptor metadata directly into the raw
>>>> state
>>>>>>>> files...
>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>> BR,
>>>>>>>>>>>>>>>> G
>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>> On Wed, Mar 12, 2025 at 8:17 AM Shengkai Fang <
>>>>>>>>>> [email protected]
>>>>>>>>>>>> 
>>>>>>>>>>>>>> wrote:
>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>>> Thanks for Zakelly's clarification.
>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>>> 1. State TTL for Value Columns
>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>>> +1 to delay the discussion about this.
>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>>> 2. Metadata Table vs. Metadata Column
>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>>> I’d like to share my perspective on the State
>>>>> Catalog
>>>>>>>>>> proposal.
>>>>>>>>>>>>> While
>>>>>>>>>>>>>>>>> introducing this capability is beneficial,
>> there
>>>> is
>>>>> a
>>>>>>>>>> blocker:
>>>>>>>>>>>> the
>>>>>>>>>>>>>>>> current
>>>>>>>>>>>>>>>>> StateBackend architecture does not permit
>>>> operators
>>>>> to
>>>>>>>>> encode
>>>>>>>>>>>>>>>>> TypeInformation into the state—it only
>> preserves
>>>> the
>>>>>>>>>>> Serializer.
>>>>>>>>>>>>> This
>>>>>>>>>>>>>>>>> limitation creates an asymmetry, as operators
>>>> alone
>>>>>>>> retain
>>>>>>>>>>>>> knowledge
>>>>>>>>>>>>>> of
>>>>>>>>>>>>>>>> the
>>>>>>>>>>>>>>>>> data structure’s schema.
>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>>> To address this, I suggest allowing operators
>> to
>>>>> embed
>>>>>>>>>>>>>> TypeInformation
>>>>>>>>>>>>>>> or
>>>>>>>>>>>>>>>>> StateDescriptor metadata directly into the raw
>>>> state
>>>>>>>> files.
>>>>>>>>>>> Such
>>>>>>>>>>>> a
>>>>>>>>>>>>>>> design
>>>>>>>>>>>>>>>>> would enable the Catalog to:
>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>>> 1. Parse state files and programmatically
>> derive
>>>> the
>>>>>>>> schema
>>>>>>>>>> and
>>>>>>>>>>>>>>>> structural
>>>>>>>>>>>>>>>>> guarantees for each state.
>>>>>>>>>>>>>>>>> 2. Leverage existing Flink Table utilities,
>> such
>>>> as
>>>>>>>>>>>>>>>>> LegacyTypeInfoDataTypeConverter (in
>>>>>>>>>>>>>>> org.apache.flink.table.types.utils),
>>>>>>>>>>>>>>>> to
>>>>>>>>>>>>>>>>> bridge TypeInformation and DataType
>> conversions.
>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>>> If we can not store the TypeInformation or
>>>>>>>> StateDescriptor
>>>>>>>>>> into
>>>>>>>>>>>> the
>>>>>>>>>>>>>> raw
>>>>>>>>>>>>>>>>> state files, I am +1 for this FLIP to use
>>>> metadata
>>>>>>> column
>>>>>>>>> to
>>>>>>>>>>>>> retrieve
>>>>>>>>>>>>>>>>> information.
>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>>> Best,
>>>>>>>>>>>>>>>>> Shengkai
>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>>> Zakelly Lan <[email protected]>
>>>> 于2025年3月12日周三
>>>>>>>> 12:43写道：
>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>>>> Hi Gabor and Shengkai,
>>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>>>> Thanks for sharing your thoughts! This is a
>>>> long
>>>>>>>>> discussion
>>>>>>>>>>> and
>>>>>>>>>>>>>> sorry
>>>>>>>>>>>>>>>> for
>>>>>>>>>>>>>>>>>> the late reply (I'm busy catching up with
>>>> release
>>>>>>> 2.0
>>>>>>>>> these
>>>>>>>>>>>>> days).
>>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>>>> 1. State TTL for Value Columns
>>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>>>> Let me first clarify your thoughts to ensure
>> I
>>>>>>>> understand
>>>>>>>>>>>>>> correctly.
>>>>>>>>>>>>>>>>> IIUC,
>>>>>>>>>>>>>>>>>> there is no persistent configuration for
>> state
>>>> TTL
>>>>>>> in
>>>>>>>> the
>>>>>>>>>>>>>> checkpoint.
>>>>>>>>>>>>>>>>> While
>>>>>>>>>>>>>>>>>> you can infer that TTL is enabled by reading
>>>> the
>>>>>>>>>> serializer,
>>>>>>>>>>>> the
>>>>>>>>>>>>>>>>> checkpoint
>>>>>>>>>>>>>>>>>> itself only stores the last access time for
>>>> each
>>>>>>> value.
>>>>>>>>> So
>>>>>>>>>>> the
>>>>>>>>>>>>> only
>>>>>>>>>>>>>>>> thing
>>>>>>>>>>>>>>>>>> we can show is the last access time for each
>>>>> value.
>>>>>>> But
>>>>>>>>> it
>>>>>>>>>> is
>>>>>>>>>>>> not
>>>>>>>>>>>>>>>>> required
>>>>>>>>>>>>>>>>>> for all state backends to store this, as they
>>>> may
>>>>>>>>> directly
>>>>>>>>>>>> store
>>>>>>>>>>>>>> the
>>>>>>>>>>>>>>>>>> expired time. This will also increase the
>>>>>>> difficulty of
>>>>>>>>>>>>>>> implementation
>>>>>>>>>>>>>>>> &
>>>>>>>>>>>>>>>>>> maintenance.
>>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>>>> This once again reiterates the importance of
>>>>> unified
>>>>>>>>>> metadata
>>>>>>>>>>>> for
>>>>>>>>>>>>>>>>>> checkpoints. I’m planning on adding this, and
>>>> we
>>>>> may
>>>>>>>>>>>> collaborate
>>>>>>>>>>>>> on
>>>>>>>>>>>>>>> it
>>>>>>>>>>>>>>>> in
>>>>>>>>>>>>>>>>>> the future.
>>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>>>> 2. Metadata Table vs. Metadata Column
>>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>>>> I'm not in favor of adding a new connector
>> for
>>>>>>>> metadata.
>>>>>>>>>> The
>>>>>>>>>>>>>> metadata
>>>>>>>>>>>>>>>> is
>>>>>>>>>>>>>>>>>> more like one-time information instead of a
>>>>>>> streaming
>>>>>>>>> data
>>>>>>>>>>> that
>>>>>>>>>>>>>>> changes
>>>>>>>>>>>>>>>>> all
>>>>>>>>>>>>>>>>>> the time, so a single connector seems to be
>> an
>>>>>>>> overkill.
>>>>>>>>> It
>>>>>>>>>>> is
>>>>>>>>>>>>> not
>>>>>>>>>>>>>>> easy
>>>>>>>>>>>>>>>>> to
>>>>>>>>>>>>>>>>>> withdraw a connector if we have a better
>>>> solution
>>>>> in
>>>>>>>>>> future.
>>>>>>>>>>>> I'm
>>>>>>>>>>>>>> not
>>>>>>>>>>>>>>>>>> familiar with current Catalog capabilities,
>>>> and if
>>>>>>> it
>>>>>>>>> could
>>>>>>>>>>>>> extract
>>>>>>>>>>>>>>> and
>>>>>>>>>>>>>>>>>> show some operator-level information from
>>>>> savepoint,
>>>>>>>> that
>>>>>>>>>>> would
>>>>>>>>>>>>> be
>>>>>>>>>>>>>>>> great.
>>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>>>> If the Catalog can't do that, I would
>> consider
>>>> the
>>>>>>>>> current
>>>>>>>>>>> FLIP
>>>>>>>>>>>>> to
>>>>>>>>>>>>>>> be a
>>>>>>>>>>>>>>>>>> compromise solution.
>>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>>>> And if we have that unified metadata for
>>>>>>>>>> checkpoint/savepoint
>>>>>>>>>>>> in
>>>>>>>>>>>>>>>> future,
>>>>>>>>>>>>>>>>> we
>>>>>>>>>>>>>>>>>> may directly register savepoint in catalog,
>> and
>>>>>>> create
>>>>>>>> a
>>>>>>>>>>> source
>>>>>>>>>>>>>>> without
>>>>>>>>>>>>>>>>>> specifying complex columns, as well as
>> describe
>>>>> the
>>>>>>>>>> savepoint
>>>>>>>>>>>>>> catalog
>>>>>>>>>>>>>>>> to
>>>>>>>>>>>>>>>>>> get the metadata. That's a good solution in
>> my
>>>>> mind.
>>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>>>> Best,
>>>>>>>>>>>>>>>>>> Zakelly
>>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>>>> On Wed, Mar 12, 2025 at 10:35 AM Shengkai
>> Fang
>>>> <
>>>>>>>>>>>>> [email protected]>
>>>>>>>>>>>>>>>>> wrote:
>>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>>>>> Hi Gabor,
>>>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>>>>>> 2. Adding a new connector with
>>>>>>> `savepoint-metadata`
>>>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>>>>> I would argue against introducing a new
>>>>> connector
>>>>>>>> type
>>>>>>>>>>> named
>>>>>>>>>>>>>>>>>>> savepoint-metadata, as the existing Catalog
>>>>>>> mechanism
>>>>>>>>> can
>>>>>>>>>>>>>>> inherently
>>>>>>>>>>>>>>>>>>> provide the necessary connector factory
>>>>>>> capabilities.
>>>>>>>>>> I’ve
>>>>>>>>>>>>>> detailed
>>>>>>>>>>>>>>>>> this
>>>>>>>>>>>>>>>>>>> proposal in branch[1]. Please take a moment
>>>> to
>>>>>>> review
>>>>>>>>> it.
>>>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>>>>> If we introduce a connector named
>>>>>>>> `savepoint-metadata`,
>>>>>>>>>> it
>>>>>>>>>>>>> means
>>>>>>>>>>>>>>> user
>>>>>>>>>>>>>>>>> can
>>>>>>>>>>>>>>>>>>> create a temporary table with connector
>>>>>>>>>>> `savepoint-metadata`
>>>>>>>>>>>>> and
>>>>>>>>>>>>>>> the
>>>>>>>>>>>>>>>>>>> connector needs to check whether table
>>>> schema is
>>>>>>> same
>>>>>>>>> to
>>>>>>>>>>> the
>>>>>>>>>>>>>> schema
>>>>>>>>>>>>>>>> we
>>>>>>>>>>>>>>>>>>> proposed in the FLIP. On the other hand,
>> it's
>>>>> not
>>>>>>>> easy
>>>>>>>>>> work
>>>>>>>>>>>> for
>>>>>>>>>>>>>>>> others
>>>>>>>>>>>>>>>>> to
>>>>>>>>>>>>>>>>>>> users a metadata table with same schema.
>>>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>>>>> [1]
>>>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>> 
>>>>>>>>>>>>> 
>>>>>>>>>>>> 
>>>>>>>>>>> 
>>>>>>>>>> 
>>>>>>>>> 
>>>>>>>> 
>>>>>>> 
>>>>> 
>>>> 
>> https://github.com/apache/flink/compare/master...fsk119:flink:state-metadata?expand=1#diff-712a7bc92fe46c405fb0e61b475bb2a005cb7a72bab7df28bbb92744bcb5f465R63
>>>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>>>>> Best,
>>>>>>>>>>>>>>>>>>> Shengkai
>>>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>>>>> Gabor Somogyi <[email protected]>
>>>>>>>>> 于2025年3月11日周二
>>>>>>>>>>>>> 16:56写道：
>>>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>>>>>> Hi Shengkai,
>>>>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>>>>>>> 1. State TTL for Value Columns
>>>>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>>>>>> From directional perspective I agree your
>>>> idea
>>>>>>> how
>>>>>>>> it
>>>>>>>>>> can
>>>>>>>>>>>> be
>>>>>>>>>>>>>>>>>> implemented.
>>>>>>>>>>>>>>>>>>>> Previously I've mentioned that TTL
>>>> information
>>>>>>> is
>>>>>>>> not
>>>>>>>>>>>> exposed
>>>>>>>>>>>>>> on
>>>>>>>>>>>>>>>> the
>>>>>>>>>>>>>>>>>>> state
>>>>>>>>>>>>>>>>>>>> processor API (which the SQL state
>>>> connector
>>>>>>> uses
>>>>>>>> to
>>>>>>>>>> read
>>>>>>>>>>>>> data)
>>>>>>>>>>>>>>>>>>>> and unless somebody show me the opposite
>>>> this
>>>>>>> FLIP
>>>>>>>> is
>>>>>>>>>> not
>>>>>>>>>>>>> going
>>>>>>>>>>>>>>> to
>>>>>>>>>>>>>>>>>>> address
>>>>>>>>>>>>>>>>>>>> this to avoid feature creep. Our users
>> are
>>>>> also
>>>>>>>>>>> interested
>>>>>>>>>>>> in
>>>>>>>>>>>>>> TTL
>>>>>>>>>>>>>>>> so
>>>>>>>>>>>>>>>>>>>> sooner or later we're going to expose it,
>>>> this
>>>>>>> is
>>>>>>>>>> matter
>>>>>>>>>>> of
>>>>>>>>>>>>>>>>> scheduling.
>>>>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>>>>>>> 2. Adding a new connector with
>>>>>>>> `savepoint-metadata`
>>>>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>>>>>> Not sure I understand your point at all
>>>>> related
>>>>>>>>>>>> StateCatalog.
>>>>>>>>>>>>>>> First
>>>>>>>>>>>>>>>>> of
>>>>>>>>>>>>>>>>>>> all
>>>>>>>>>>>>>>>>>>>> I can't agree more that StateCatalog is
>>>> needed
>>>>>>> and
>>>>>>>>> is a
>>>>>>>>>>>>> planned
>>>>>>>>>>>>>>>>>> building
>>>>>>>>>>>>>>>>>>>> block in an upcoming
>>>>>>>>>>>>>>>>>>>> FLIP but not sure how can it help now? No
>>>>> matter
>>>>>>>>> what,
>>>>>>>>>>> your
>>>>>>>>>>>>>>>> knowledge
>>>>>>>>>>>>>>>>>> is
>>>>>>>>>>>>>>>>>>>> essential when we add StateCatalog. Let
>> me
>>>>>>> expose
>>>>>>>> my
>>>>>>>>>>>>>>> understanding
>>>>>>>>>>>>>>>> in
>>>>>>>>>>>>>>>>>>> this
>>>>>>>>>>>>>>>>>>>> area:
>>>>>>>>>>>>>>>>>>>> * First we need create table statements
>> to
>>>>>>> access
>>>>>>>>> state
>>>>>>>>>>>> data
>>>>>>>>>>>>>> and
>>>>>>>>>>>>>>>>>> metadata
>>>>>>>>>>>>>>>>>>>> * When we have that then we can add
>>>>> StateCatalog
>>>>>>>>> which
>>>>>>>>>>>> could
>>>>>>>>>>>>>>>>>> potentially
>>>>>>>>>>>>>>>>>>>> ease the life of users by for ex. giving
>>>>>>>>> off-the-shelf
>>>>>>>>>>>> tables
>>>>>>>>>>>>>>>> without
>>>>>>>>>>>>>>>>>>>> sweating with create table statements
>>>>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>>>>>> User expectations:
>>>>>>>>>>>>>>>>>>>> * See state data (this is fulfilled with
>>>> the
>>>>>>>> existing
>>>>>>>>>>>>>> connector)
>>>>>>>>>>>>>>>>>>>> * See metadata about state data like TTL
>>>> (this
>>>>>>> can
>>>>>>>> be
>>>>>>>>>>> added
>>>>>>>>>>>>> as
>>>>>>>>>>>>>>>>> metadata
>>>>>>>>>>>>>>>>>>>> column as you suggested since it belongs
>> to
>>>>> the
>>>>>>>> data)
>>>>>>>>>>>>>>>>>>>> * See metadata about operators (this can
>> be
>>>>>>> added
>>>>>>>>> from
>>>>>>>>>>>>>>>>>>> savepoint-metadata)
>>>>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>>>>>> Important to highlight that state data
>>>> table
>>>>>>> format
>>>>>>>>>>> differs
>>>>>>>>>>>>>> from
>>>>>>>>>>>>>>>>> state
>>>>>>>>>>>>>>>>>>>> metadata table format. Namely one table
>> has
>>>>> rows
>>>>>>>> for
>>>>>>>>>>> state
>>>>>>>>>>>>>> values
>>>>>>>>>>>>>>>> and
>>>>>>>>>>>>>>>>>>>> another has rows for operators, right?
>>>>>>>>>>>>>>>>>>>> I think that's the reason why you've
>>>>> pinpointed
>>>>>>> out
>>>>>>>>>> that
>>>>>>>>>>>> the
>>>>>>>>>>>>>>>>> suggested
>>>>>>>>>>>>>>>>>>>> metadata columns are somewhat clunky.
>>>>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>>>>>> As a conclusion I agree to add
>>>>> ${state-name}_ttl
>>>>>>>>>> metadata
>>>>>>>>>>>>>> column
>>>>>>>>>>>>>>>>> later
>>>>>>>>>>>>>>>>>> on
>>>>>>>>>>>>>>>>>>>> since it belongs to the state value and
>>>>> adding a
>>>>>>>> new
>>>>>>>>>>> table
>>>>>>>>>>>>> type
>>>>>>>>>>>>>>>> (like
>>>>>>>>>>>>>>>>>> you
>>>>>>>>>>>>>>>>>>>> suggested similar to PG [1])
>>>>>>>>>>>>>>>>>>>> for metadata. Please see how Spark does
>>>> that
>>>>> too
>>>>>>>> [2].
>>>>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>>>>>> If you have better approach then please
>>>>>>> elaborate
>>>>>>>>> with
>>>>>>>>>>> more
>>>>>>>>>>>>>>> details
>>>>>>>>>>>>>>>>> and
>>>>>>>>>>>>>>>>>>>> help me to understand your point.
>>>>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>>>>>>> Up until now we've seen even in TB
>>>>> savepoints
>>>>>>>> that
>>>>>>>>>> the
>>>>>>>>>>>>> number
>>>>>>>>>>>>>>> of
>>>>>>>>>>>>>>>>> keys
>>>>>>>>>>>>>>>>>>> can
>>>>>>>>>>>>>>>>>>>>> be extremely huge but not the per key
>>>> state
>>>>>>>> itself.
>>>>>>>>>>>>>>>>>>>>> But again, this is a good feature as-is
>>>> and
>>>>>>> can
>>>>>>>> be
>>>>>>>>>>>> handled
>>>>>>>>>>>>>> in a
>>>>>>>>>>>>>>>>>>> separate
>>>>>>>>>>>>>>>>>>>>> jira.
>>>>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>>>>>> I've just created
>>>>>>>>>>>>>>>> 
>> https://issues.apache.org/jira/browse/FLINK-37456.
>>>>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>>>>>> [1]
>>>>>>>>>>>>>> 
>>>>> https://www.postgresql.org/docs/current/view-pg-tables.html
>>>>>>>>>>>>>>>>>>>> [2]
>>>>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>> 
>>>>>>>>>>>>> 
>>>>>>>>>>>> 
>>>>>>>>>>> 
>>>>>>>>>> 
>>>>>>>>> 
>>>>>>>> 
>>>>>>> 
>>>>> 
>>>> 
>> https://www.databricks.com/blog/announcing-state-reader-api-new-statestore-data-source
>>>>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>>>>>> BR,
>>>>>>>>>>>>>>>>>>>> G
>>>>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>>>>>> On Tue, Mar 11, 2025 at 3:55 AM Shengkai
>>>> Fang
>>>>> <
>>>>>>>>>>>>>> [email protected]
>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>>>> wrote:
>>>>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>>>>>>> Hi, Gabor. Thanks for your response.
>>>>>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>>>>>>>> 1. State TTL for Value Columns
>>>>>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>>>>>>> Thank you for addressing the
>> limitations
>>>>> here.
>>>>>>>>>>> However, I
>>>>>>>>>>>>>>> believe
>>>>>>>>>>>>>>>>> it
>>>>>>>>>>>>>>>>>>>> would
>>>>>>>>>>>>>>>>>>>>> be beneficial to further clarify the
>> API
>>>> in
>>>>>>> this
>>>>>>>>> FLIP
>>>>>>>>>>>>>> regarding
>>>>>>>>>>>>>>>> how
>>>>>>>>>>>>>>>>>>> users
>>>>>>>>>>>>>>>>>>>>> can specify the TTL column.
>>>>>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>>>>>>> One potential approach that comes to
>>>> mind is
>>>>>>>> using
>>>>>>>>> a
>>>>>>>>>>>>>>> standardized
>>>>>>>>>>>>>>>>>>> naming
>>>>>>>>>>>>>>>>>>>>> convention such as ${state-name}_ttl
>> for
>>>> the
>>>>>>>>> metadata
>>>>>>>>>>>>> column
>>>>>>>>>>>>>>> that
>>>>>>>>>>>>>>>>>>> defines
>>>>>>>>>>>>>>>>>>>>> the TTL value. In terms of
>>>> implementation,
>>>>> the
>>>>>>>>>>>>>>>> listReadableMetadata
>>>>>>>>>>>>>>>>>>>>> function could:
>>>>>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>>>>>>> 1. Read the table’s columns and
>>>>> configuration,
>>>>>>>>>>>>>>>>>>>>> 2. Extract all defined state names, and
>>>>>>>>>>>>>>>>>>>>> 3. Return a structured list of metadata
>>>>>>> entries
>>>>>>>>>>> formatted
>>>>>>>>>>>>> as
>>>>>>>>>>>>>>>>>>>>> ${state-name}_ttl.
>>>>>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>>>>>>> WDYT?
>>>>>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>>>>>>>> 2. Adding a new connector with
>>>>>>>>> `savepoint-metadata`
>>>>>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>>>>>>> Introducing a new connector type at
>> this
>>>>> stage
>>>>>>>> may
>>>>>>>>>>>>>>> unnecessarily
>>>>>>>>>>>>>>>>>>>> complicate
>>>>>>>>>>>>>>>>>>>>> the system. Given that every table
>>>> already
>>>>>>>> belongs
>>>>>>>>>> to a
>>>>>>>>>>>>>>> Catalog,
>>>>>>>>>>>>>>>>>> which
>>>>>>>>>>>>>>>>>>> is
>>>>>>>>>>>>>>>>>>>>> designed to provide a Factory for
>>>> building
>>>>>>> source
>>>>>>>>> or
>>>>>>>>>>> sink
>>>>>>>>>>>>>>>>>> connectors, I
>>>>>>>>>>>>>>>>>>>>> propose integrating a dedicated
>>>> StateCatalog
>>>>>>>>> instead.
>>>>>>>>>>>> This
>>>>>>>>>>>>>>>> approach
>>>>>>>>>>>>>>>>>>> would
>>>>>>>>>>>>>>>>>>>>> allow us to:
>>>>>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>>>>>>> 1. Leverage the Catalog’s existing
>>>>>>> capabilities
>>>>>>>> to
>>>>>>>>>>> manage
>>>>>>>>>>>>> TTL
>>>>>>>>>>>>>>>>>> metadata
>>>>>>>>>>>>>>>>>>>>> (e.g., state names and TTL logic)
>> without
>>>>>>>>> duplicating
>>>>>>>>>>>>>>>>> functionality.
>>>>>>>>>>>>>>>>>>>>> 2. Provide a unified interface for
>>>> connector
>>>>>>>>>>>> instantiation
>>>>>>>>>>>>>> and
>>>>>>>>>>>>>>>>>> metadata
>>>>>>>>>>>>>>>>>>>>> handling through the Catalog’s Factory
>>>>>>> pattern.
>>>>>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>>>>>>> Would this design decision better align
>>>> with
>>>>>>> our
>>>>>>>>>>>>>> architecture’s
>>>>>>>>>>>>>>>>>>>>> extensibility and reduce redundancy?
>>>>>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>>>>>>>> Up until now we've seen even in TB
>>>>>>> savepoints
>>>>>>>>> that
>>>>>>>>>>> the
>>>>>>>>>>>>>> number
>>>>>>>>>>>>>>>> of
>>>>>>>>>>>>>>>>>> keys
>>>>>>>>>>>>>>>>>>>> can
>>>>>>>>>>>>>>>>>>>>>> be extremely huge but not the per key
>>>>> state
>>>>>>>>> itself.
>>>>>>>>>>>>>>>>>>>>>> But again, this is a good feature
>> as-is
>>>>> and
>>>>>>> can
>>>>>>>>> be
>>>>>>>>>>>>> handled
>>>>>>>>>>>>>>> in a
>>>>>>>>>>>>>>>>>>>> separate
>>>>>>>>>>>>>>>>>>>>>> jira.
>>>>>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>>>>>>> +1 for a separate jira.
>>>>>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>>>>>>> Best,
>>>>>>>>>>>>>>>>>>>>> Shengkai
>>>>>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>>>>>>> Gabor Somogyi <
>> [email protected]
>>>>> 
>>>>>>>>>>> 于2025年3月10日周一
>>>>>>>>>>>>>>> 19:05写道：
>>>>>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>>>>>>>> Hi Shengkai,
>>>>>>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>>>>>>>> Please see my comments inline.
>>>>>>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>>>>>>>> BR,
>>>>>>>>>>>>>>>>>>>>>> G
>>>>>>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>>>>>>>> On Mon, Mar 3, 2025 at 7:07 AM
>> Shengkai
>>>>>>> Fang <
>>>>>>>>>>>>>>>> [email protected]>
>>>>>>>>>>>>>>>>>>>> wrote:
>>>>>>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>>>>>>>>> Hi, Gabor. Thanks for your the
>> FLIP.
>>>> I
>>>>>>> have
>>>>>>>>> some
>>>>>>>>>>>>>> questions
>>>>>>>>>>>>>>>>> about
>>>>>>>>>>>>>>>>>>> the
>>>>>>>>>>>>>>>>>>>>>> FLIP:
>>>>>>>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>>>>>>>>> 1. State TTL for Value Columns
>>>>>>>>>>>>>>>>>>>>>>> How can users retrieve the state
>> TTL
>>>>>>>>>> (Time-to-Live)
>>>>>>>>>>>> for
>>>>>>>>>>>>>>> each
>>>>>>>>>>>>>>>>>> value
>>>>>>>>>>>>>>>>>>>>>> column?
>>>>>>>>>>>>>>>>>>>>>>> From my understanding of the
>> current
>>>>>>> design,
>>>>>>>> it
>>>>>>>>>>> seems
>>>>>>>>>>>>>> that
>>>>>>>>>>>>>>>> this
>>>>>>>>>>>>>>>>>>>>>>> functionality is not supported.
>> Could
>>>>> you
>>>>>>>>> clarify
>>>>>>>>>>> if
>>>>>>>>>>>>>> there
>>>>>>>>>>>>>>>> are
>>>>>>>>>>>>>>>>>>> plans
>>>>>>>>>>>>>>>>>>>> to
>>>>>>>>>>>>>>>>>>>>>>> address this limitation?
>>>>>>>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>>>>>>>> Since the state processor API is not
>>>> yet
>>>>>>>> exposing
>>>>>>>>>>> this
>>>>>>>>>>>>>>>>> information
>>>>>>>>>>>>>>>>>>> this
>>>>>>>>>>>>>>>>>>>>>> would require several steps.
>>>>>>>>>>>>>>>>>>>>>> First, the state processor API
>> support
>>>>>>> needs to
>>>>>>>>> be
>>>>>>>>>>>> added
>>>>>>>>>>>>>>> which
>>>>>>>>>>>>>>>>> can
>>>>>>>>>>>>>>>>>> be
>>>>>>>>>>>>>>>>>>>>> then
>>>>>>>>>>>>>>>>>>>>>> exposed on the SQL API.
>>>>>>>>>>>>>>>>>>>>>> This is definitely a future
>> improvement
>>>>>>> which
>>>>>>>> is
>>>>>>>>>>> useful
>>>>>>>>>>>>> and
>>>>>>>>>>>>>>> can
>>>>>>>>>>>>>>>>> be
>>>>>>>>>>>>>>>>>>>>> handled
>>>>>>>>>>>>>>>>>>>>>> in a separate jira.
>>>>>>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>>>>>>>>> 2. Metadata Table vs. Metadata
>> Column
>>>>>>>>>>>>>>>>>>>>>>> The metadata information described
>> in
>>>>> the
>>>>>>>> FLIP
>>>>>>>>>>>> appears
>>>>>>>>>>>>> to
>>>>>>>>>>>>>>> be
>>>>>>>>>>>>>>>>>>> intended
>>>>>>>>>>>>>>>>>>>>> to
>>>>>>>>>>>>>>>>>>>>>>> describe the state files stored at
>> a
>>>>>>> specific
>>>>>>>>>>>> location.
>>>>>>>>>>>>>> To
>>>>>>>>>>>>>>>> me,
>>>>>>>>>>>>>>>>>> this
>>>>>>>>>>>>>>>>>>>>>> concept
>>>>>>>>>>>>>>>>>>>>>>> aligns more closely with system
>>>> tables
>>>>>>> like
>>>>>>>>>>> pg_tables
>>>>>>>>>>>>> in
>>>>>>>>>>>>>>>>>> PostgreSQL
>>>>>>>>>>>>>>>>>>>> [1]
>>>>>>>>>>>>>>>>>>>>>> or
>>>>>>>>>>>>>>>>>>>>>>> the INFORMATION_SCHEMA in MySQL
>> [2].
>>>>>>>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>>>>>>>> Adding a new connector with
>>>>>>>> `savepoint-metadata`
>>>>>>>>>> is a
>>>>>>>>>>>>>>>> possibility
>>>>>>>>>>>>>>>>>>> where
>>>>>>>>>>>>>>>>>>>>> we
>>>>>>>>>>>>>>>>>>>>>> can create such functionality.
>>>>>>>>>>>>>>>>>>>>>> I'm not against that, just want to
>>>> have a
>>>>>>>> common
>>>>>>>>>>>>> agreement
>>>>>>>>>>>>>>> that
>>>>>>>>>>>>>>>>> we
>>>>>>>>>>>>>>>>>>>> would
>>>>>>>>>>>>>>>>>>>>>> like to move that direction.
>>>>>>>>>>>>>>>>>>>>>> (As a side note not just PG but Spark
>>>> also
>>>>>>> has
>>>>>>>>>>> similar
>>>>>>>>>>>>>>> approach
>>>>>>>>>>>>>>>>>> and I
>>>>>>>>>>>>>>>>>>>>>> basically like the idea).
>>>>>>>>>>>>>>>>>>>>>> If we would go that direction
>> savepoint
>>>>>>>> metadata
>>>>>>>>>> can
>>>>>>>>>>> be
>>>>>>>>>>>>>>> reached
>>>>>>>>>>>>>>>>> in
>>>>>>>>>>>>>>>>>> a
>>>>>>>>>>>>>>>>>>>> way
>>>>>>>>>>>>>>>>>>>>>> that one row would represent
>>>>>>>>>>>>>>>>>>>>>> an operator with it's values
>> something
>>>>> like
>>>>>>>> this:
>>>>>>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>> 
>>>>>>>>>>>>> 
>>>>>>>>>>>> 
>>>>>>>>>>> 
>>>>>>>>>> 
>>>>>>>>> 
>>>>>>>> 
>>>>>>> 
>>>>> 
>>>> 
>> ┌─────────┬─────────┬─────────┬─────────┬─────────┬─────────┬─────────┬────────┐
>>>>>>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>> 
>>>>>>>>>>>>> 
>>>>>>>>>>>> 
>>>>>>>>>>> 
>>>>>>>>>> 
>>>>>>>>> 
>>>>>>>> 
>>>>>>> 
>>>>> 
>>>> 
>> │operatorN│operatorU│operatorH│paralleli│maxParall│subtaskSt│coordinat│totalSta│
>>>>>>>>>>>>>>>>>>>>>> │ame      │id       │ash      │sm
>>>>>>> │elism
>>>>>>>>>>>>>>>>>>>>>> │atesCount│orStateSi│tesSizeI│
>>>>>>>>>>>>>>>>>>>>>> │         │         │         │
>>>> │
>>>>>>>>> │
>>>>>>>>>>>>>>>>>>>>>> │zeInBytes│nBytes  │
>>>>>>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>> 
>>>>>>>>>>>>> 
>>>>>>>>>>>> 
>>>>>>>>>>> 
>>>>>>>>>> 
>>>>>>>>> 
>>>>>>>> 
>>>>>>> 
>>>>> 
>>>> 
>> ├─────────┼─────────┼─────────┼─────────┼─────────┼─────────┼─────────┼────────┤
>>>>>>>>>>>>>>>>>>>>>> │Source:  │datagen-s│47aee9439│2
>>>>> │128
>>>>>>>>>> │2
>>>>>>>>>>>>>>> │16
>>>>>>>>>>>>>>>>>>>>>> │546     │
>>>>>>>>>>>>>>>>>>>>>> │datagen-s│ource-uid│4d6ea26e2│
>>>> │
>>>>>>>>> │
>>>>>>>>>>>>>> │
>>>>>>>>>>>>>>>>>>> │
>>>>>>>>>>>>>>>>>>>>>>    │
>>>>>>>>>>>>>>>>>>>>>> │ource    │         │d544bef0a│
>>>> │
>>>>>>>>> │
>>>>>>>>>>>>>> │
>>>>>>>>>>>>>>>>>>> │
>>>>>>>>>>>>>>>>>>>>>>    │
>>>>>>>>>>>>>>>>>>>>>> │         │         │37bb5    │
>>>> │
>>>>>>>>> │
>>>>>>>>>>>>>> │
>>>>>>>>>>>>>>>>>>> │
>>>>>>>>>>>>>>>>>>>>>>    │
>>>>>>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>> 
>>>>>>>>>>>>> 
>>>>>>>>>>>> 
>>>>>>>>>>> 
>>>>>>>>>> 
>>>>>>>>> 
>>>>>>>> 
>>>>>>> 
>>>>> 
>>>> 
>> ├─────────┼─────────┼─────────┼─────────┼─────────┼─────────┼─────────┼────────┤
>>>>>>>>>>>>>>>>>>>>>> │long-udf-│long-udf-│6ed3f40bf│2
>>>>> │128
>>>>>>>>>> │2
>>>>>>>>>>>>>>> │0
>>>>>>>>>>>>>>>>>>>> │0
>>>>>>>>>>>>>>>>>>>>>>     │
>>>>>>>>>>>>>>>>>>>>>> │with-mast│with-mast│f3c8dfcdf│
>>>> │
>>>>>>>>> │
>>>>>>>>>>>>>> │
>>>>>>>>>>>>>>>>>>> │
>>>>>>>>>>>>>>>>>>>>>>    │
>>>>>>>>>>>>>>>>>>>>>> │er-hook  │er-hook-u│cb95128a1│
>>>> │
>>>>>>>>> │
>>>>>>>>>>>>>> │
>>>>>>>>>>>>>>>>>>> │
>>>>>>>>>>>>>>>>>>>>>>    │
>>>>>>>>>>>>>>>>>>>>>> │         │id       │018f1    │
>>>> │
>>>>>>>>> │
>>>>>>>>>>>>>> │
>>>>>>>>>>>>>>>>>>> │
>>>>>>>>>>>>>>>>>>>>>>    │
>>>>>>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>> 
>>>>>>>>>>>>> 
>>>>>>>>>>>> 
>>>>>>>>>>> 
>>>>>>>>>> 
>>>>>>>>> 
>>>>>>>> 
>>>>>>> 
>>>>> 
>>>> 
>> ├─────────┼─────────┼─────────┼─────────┼─────────┼─────────┼─────────┼────────┤
>>>>>>>>>>>>>>>>>>>>>> │value-pro│value-pro│ca4f5fe9a│2
>>>>> │128
>>>>>>>>>> │2
>>>>>>>>>>>>>>> │0
>>>>>>>>>>>>>>>>>>>>>> │40726   │
>>>>>>>>>>>>>>>>>>>>>> │cess     │cess-uid │637b656f0│
>>>> │
>>>>>>>>> │
>>>>>>>>>>>>>> │
>>>>>>>>>>>>>>>>>>> │
>>>>>>>>>>>>>>>>>>>>>>    │
>>>>>>>>>>>>>>>>>>>>>> │         │         │9ea78b3e7│
>>>> │
>>>>>>>>> │
>>>>>>>>>>>>>> │
>>>>>>>>>>>>>>>>>>> │
>>>>>>>>>>>>>>>>>>>>>>    │
>>>>>>>>>>>>>>>>>>>>>> │         │         │a15b9    │
>>>> │
>>>>>>>>> │
>>>>>>>>>>>>>> │
>>>>>>>>>>>>>>>>>>> │
>>>>>>>>>>>>>>>>>>>>>>    │
>>>>>>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>> 
>>>>>>>>>>>>> 
>>>>>>>>>>>> 
>>>>>>>>>>> 
>>>>>>>>>> 
>>>>>>>>> 
>>>>>>>> 
>>>>>>> 
>>>>> 
>>>> 
>> ├─────────┼─────────┼─────────┼─────────┼─────────┼─────────┼─────────┼────────┤
>>>>>>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>>>>>>>> This table can then be joined with
>> the
>>>>>>> actually
>>>>>>>>>>>> existing
>>>>>>>>>>>>>>>>>> `savepoint`
>>>>>>>>>>>>>>>>>>>>>> connector created tables based on UID
>>>> hash
>>>>>>>> (which
>>>>>>>>>> is
>>>>>>>>>>>>> unique
>>>>>>>>>>>>>>> and
>>>>>>>>>>>>>>>>>>> always
>>>>>>>>>>>>>>>>>>>>>> exists).
>>>>>>>>>>>>>>>>>>>>>> This would mean that the already
>>>> existing
>>>>>>> table
>>>>>>>>>> would
>>>>>>>>>>>>> need
>>>>>>>>>>>>>>>> only a
>>>>>>>>>>>>>>>>>>>> single
>>>>>>>>>>>>>>>>>>>>>> metadata column which is the UID
>> hash.
>>>>>>>>>>>>>>>>>>>>>> WDYT?
>>>>>>>>>>>>>>>>>>>>>> @zakelly, plz share your thoughts
>> too.
>>>>>>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>>>>>>>>> If we opt to use metadata columns,
>>>> every
>>>>>>>> record
>>>>>>>>>> in
>>>>>>>>>>>> the
>>>>>>>>>>>>>>> table
>>>>>>>>>>>>>>>>>> would
>>>>>>>>>>>>>>>>>>>> end
>>>>>>>>>>>>>>>>>>>>> up
>>>>>>>>>>>>>>>>>>>>>>> having identical values for these
>>>>> columns
>>>>>>>>> (please
>>>>>>>>>>>>> correct
>>>>>>>>>>>>>>> me
>>>>>>>>>>>>>>>> if
>>>>>>>>>>>>>>>>>> I’m
>>>>>>>>>>>>>>>>>>>>>>> mistaken). On the other hand, the
>>>> state
>>>>>>>>> connector
>>>>>>>>>>>>>> requires
>>>>>>>>>>>>>>>>> users
>>>>>>>>>>>>>>>>>> to
>>>>>>>>>>>>>>>>>>>>>> specify
>>>>>>>>>>>>>>>>>>>>>>> an operator UID or operator UID
>> hash,
>>>>>>> after
>>>>>>>>> which
>>>>>>>>>>> it
>>>>>>>>>>>>>>> outputs
>>>>>>>>>>>>>>>>>>>>> user-defined
>>>>>>>>>>>>>>>>>>>>>>> values in its records. This
>> approach
>>>>> feels
>>>>>>>>>> somewhat
>>>>>>>>>>>>>>> redundant
>>>>>>>>>>>>>>>>> to
>>>>>>>>>>>>>>>>>>> me.
>>>>>>>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>>>>>>>> If we would add a new
>>>> `savepoint-metadata`
>>>>>>>>>> connector
>>>>>>>>>>>> then
>>>>>>>>>>>>>>> this
>>>>>>>>>>>>>>>>> can
>>>>>>>>>>>>>>>>>> be
>>>>>>>>>>>>>>>>>>>>>> addressed.
>>>>>>>>>>>>>>>>>>>>>> On the other hand UID and UID hash
>> are
>>>>>>> having
>>>>>>>>>>> either-or
>>>>>>>>>>>>>>>>>> relationship
>>>>>>>>>>>>>>>>>>>> from
>>>>>>>>>>>>>>>>>>>>>> config perspective,
>>>>>>>>>>>>>>>>>>>>>> so when a user provides the UID then
>>>>> he/she
>>>>>>> can
>>>>>>>>> be
>>>>>>>>>>>>>> interested
>>>>>>>>>>>>>>>> in
>>>>>>>>>>>>>>>>>> the
>>>>>>>>>>>>>>>>>>>> hash
>>>>>>>>>>>>>>>>>>>>>> for further calculations
>>>>>>>>>>>>>>>>>>>>>> (the whole Flink internals are
>>>> depending
>>>>> on
>>>>>>> the
>>>>>>>>>>> hash).
>>>>>>>>>>>>>>> Printing
>>>>>>>>>>>>>>>>> out
>>>>>>>>>>>>>>>>>>> the
>>>>>>>>>>>>>>>>>>>>>> human readable UID
>>>>>>>>>>>>>>>>>>>>>> is an explicit requirement from the
>>>> user
>>>>>>> side
>>>>>>>>>> because
>>>>>>>>>>>>>> hashes
>>>>>>>>>>>>>>>> are
>>>>>>>>>>>>>>>>>> not
>>>>>>>>>>>>>>>>>>>>> human
>>>>>>>>>>>>>>>>>>>>>> readable.
>>>>>>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>>>>>>>>> 3. Handling LIST and MAP States in
>>>> the
>>>>>>> State
>>>>>>>>>>>> Connector
>>>>>>>>>>>>>>>>>>>>>>> I have concerns about how the
>> current
>>>>>>> design
>>>>>>>>>>> handles
>>>>>>>>>>>>> LIST
>>>>>>>>>>>>>>> and
>>>>>>>>>>>>>>>>> MAP
>>>>>>>>>>>>>>>>>>>>> states.
>>>>>>>>>>>>>>>>>>>>>>> Specifically, the state connector
>>>> uses
>>>>>>> Flink
>>>>>>>>>> SQL’s
>>>>>>>>>>>> MAP
>>>>>>>>>>>>>> and
>>>>>>>>>>>>>>>>> ARRAY
>>>>>>>>>>>>>>>>>>>> types,
>>>>>>>>>>>>>>>>>>>>>>> which implies that it attempts to
>>>> load
>>>>>>> entire
>>>>>>>>> MAP
>>>>>>>>>>> or
>>>>>>>>>>>>> LIST
>>>>>>>>>>>>>>>>> states
>>>>>>>>>>>>>>>>>>> into
>>>>>>>>>>>>>>>>>>>>>>> memory.
>>>>>>>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>>>>>>>>> However, in many real-world
>>>> scenarios,
>>>>>>> these
>>>>>>>>>> states
>>>>>>>>>>>> can
>>>>>>>>>>>>>>> grow
>>>>>>>>>>>>>>>>> very
>>>>>>>>>>>>>>>>>>>>> large.
>>>>>>>>>>>>>>>>>>>>>>> Typically, the state API addresses
>>>> this
>>>>> by
>>>>>>>>>>> providing
>>>>>>>>>>>> an
>>>>>>>>>>>>>>>>> iterator
>>>>>>>>>>>>>>>>>> to
>>>>>>>>>>>>>>>>>>>>>>> traverse elements within the state
>>>>>>>>> incrementally.
>>>>>>>>>>> I’m
>>>>>>>>>>>>>>> unsure
>>>>>>>>>>>>>>>>>>> whether
>>>>>>>>>>>>>>>>>>>>> I’ve
>>>>>>>>>>>>>>>>>>>>>>> missed something in FLIP-496 or
>>>>> FLIP-512,
>>>>>>> but
>>>>>>>>> it
>>>>>>>>>>>> seems
>>>>>>>>>>>>>> that
>>>>>>>>>>>>>>>> the
>>>>>>>>>>>>>>>>>>>> current
>>>>>>>>>>>>>>>>>>>>>>> design might struggle with
>>>> scalability
>>>>> in
>>>>>>>> such
>>>>>>>>>>> cases.
>>>>>>>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>>>>>>>> You see it good, the current
>>>>> implementation
>>>>>>>> keeps
>>>>>>>>>>> state
>>>>>>>>>>>>>> for a
>>>>>>>>>>>>>>>>>> single
>>>>>>>>>>>>>>>>>>>> key
>>>>>>>>>>>>>>>>>>>>> in
>>>>>>>>>>>>>>>>>>>>>> memory.
>>>>>>>>>>>>>>>>>>>>>> Back in the days we've considered
>> this
>>>>>>>> potential
>>>>>>>>>>> issue
>>>>>>>>>>>>> and
>>>>>>>>>>>>>>>>>> concluded
>>>>>>>>>>>>>>>>>>>> that
>>>>>>>>>>>>>>>>>>>>>> this is not necessarily
>>>>>>>>>>>>>>>>>>>>>> needed for the initial version and
>> can
>>>> be
>>>>>>> done
>>>>>>>>> as a
>>>>>>>>>>>> later
>>>>>>>>>>>>>>>>>>> improvement.
>>>>>>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>>>>>>>> Up until now we've seen even in TB
>>>>>>> savepoints
>>>>>>>>> that
>>>>>>>>>>> the
>>>>>>>>>>>>>> number
>>>>>>>>>>>>>>>> of
>>>>>>>>>>>>>>>>>> keys
>>>>>>>>>>>>>>>>>>>> can
>>>>>>>>>>>>>>>>>>>>>> be extremely huge but not the per key
>>>>> state
>>>>>>>>> itself.
>>>>>>>>>>>>>>>>>>>>>> But again, this is a good feature
>> as-is
>>>>> and
>>>>>>> can
>>>>>>>>> be
>>>>>>>>>>>>> handled
>>>>>>>>>>>>>>> in a
>>>>>>>>>>>>>>>>>>>> separate
>>>>>>>>>>>>>>>>>>>>>> jira.
>>>>>>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>>>>>>>>> Best,
>>>>>>>>>>>>>>>>>>>>>>> Shengkai
>>>>>>>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>>>>>>>>> [1]
>>>>>>>>>>>>>>>>> 
>>>>>>>>> https://www.postgresql.org/docs/current/view-pg-tables.html
>>>>>>>>>>>>>>>>>>>>>>> [2]
>>>>>>>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>> 
>>>>>>>>>>>>> 
>>>>>>>>>>>> 
>>>>>>>>>>> 
>>>>>>>>>> 
>>>>>>>>> 
>>>>>>>> 
>>>>>>> 
>>>>> 
>>>> 
>> https://dev.mysql.com/doc/refman/8.4/en/information-schema-tables-table.html
>>>>>>>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>>>>>>>>> Gabor Somogyi <
>>>>> [email protected]>
>>>>>>>>>>>> 于2025年3月3日周一
>>>>>>>>>>>>>>>>> 02:00写道：
>>>>>>>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>>>>>>>>>> Hi Zakelly,
>>>>>>>>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>>>>>>>>>> In order to shoot for simplicity
>>>>>>> `METADATA
>>>>>>>>>>> VIRTUAL`
>>>>>>>>>>>>> as
>>>>>>>>>>>>>>> key
>>>>>>>>>>>>>>>>>> words
>>>>>>>>>>>>>>>>>>>> for
>>>>>>>>>>>>>>>>>>>>>>>> definition is the target.
>>>>>>>>>>>>>>>>>>>>>>>> When it's not super complex the
>>>> latter
>>>>>>> can
>>>>>>>> be
>>>>>>>>>>> added
>>>>>>>>>>>>>> too.
>>>>>>>>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>>>>>>>>>> BR,
>>>>>>>>>>>>>>>>>>>>>>>> G
>>>>>>>>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>>>>>>>>>> On Sun, Mar 2, 2025 at 3:37 PM
>>>> Zakelly
>>>>>>> Lan
>>>>>>>> <
>>>>>>>>>>>>>>>>>>> [email protected]>
>>>>>>>>>>>>>>>>>>>>>>> wrote:
>>>>>>>>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>>>>>>>>>>> Hi Gabor,
>>>>>>>>>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>>>>>>>>>>> +1 for this.
>>>>>>>>>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>>>>>>>>>>> Will the metadata column use
>>>>> `METADATA
>>>>>>>>>> VIRTUAL`
>>>>>>>>>>>> as
>>>>>>>>>>>>>> key
>>>>>>>>>>>>>>>>> words
>>>>>>>>>>>>>>>>>>> for
>>>>>>>>>>>>>>>>>>>>>>>>> definition, or `METADATA FROM
>> xxx
>>>>>>>> VIRTUAL`
>>>>>>>>>> for
>>>>>>>>>>>>>>> renaming,
>>>>>>>>>>>>>>>>> just
>>>>>>>>>>>>>>>>>>>> like
>>>>>>>>>>>>>>>>>>>>>> the
>>>>>>>>>>>>>>>>>>>>>>>>> Kafka table?
>>>>>>>>>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>>>>>>>>>>> Best,
>>>>>>>>>>>>>>>>>>>>>>>>> Zakelly
>>>>>>>>>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>>>>>>>>>>> On Sat, Mar 1, 2025 at 1:31 PM
>>>> Gabor
>>>>>>>>> Somogyi
>>>>>>>>>> <
>>>>>>>>>>>>>>>>>>>>>>> [email protected]>
>>>>>>>>>>>>>>>>>>>>>>>>> wrote:
>>>>>>>>>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>>>>>>>>>>>> Hi All,
>>>>>>>>>>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>>>>>>>>>>>> I'd like to start a
>> discussion
>>>> of
>>>>>>>>> FLIP-512:
>>>>>>>>>>> Add
>>>>>>>>>>>>>> meta
>>>>>>>>>>>>>>>>>>>> information
>>>>>>>>>>>>>>>>>>>>> to
>>>>>>>>>>>>>>>>>>>>>>> SQL
>>>>>>>>>>>>>>>>>>>>>>>>>> state connector [1].
>>>>>>>>>>>>>>>>>>>>>>>>>> Feel free to add your
>> thoughts
>>>> to
>>>>>>> make
>>>>>>>>> this
>>>>>>>>>>>>> feature
>>>>>>>>>>>>>>>>> better.
>>>>>>>>>>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>>>>>>>>>>>> [1]
>>>>>>>>>>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>> 
>>>>>>>>>>>>> 
>>>>>>>>>>>> 
>>>>>>>>>>> 
>>>>>>>>>> 
>>>>>>>>> 
>>>>>>>> 
>>>>>>> 
>>>>> 
>>>> 
>> https://cwiki.apache.org/confluence/display/FLINK/FLIP-512%3A+Add+meta+information+to+SQL+state+connector
>>>>>>>>>>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>>>>>>>>>>>> BR,
>>>>>>>>>>>>>>>>>>>>>>>>>> G
>>>>>>>>>>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>> 
>>>>>>>>>>>>> 
>>>>>>>>>>>> 
>>>>>>>>>>> 
>>>>>>>>>> 
>>>>>>>>> 
>>>>>>>> 
>>>>>>> 
>>>>>> 
>>>>> 
>>>> 
>>> 
>>

Re: [DISCUSS] FLIP-512: Add meta information to SQL state connector

Reply via email to