Hi Gyula,

Thanks for driving this discussion. The idea of exposing Flink state through a 
standard SQL catalog is compelling, and the overall design is well thought out.
I have a few questions regarding edge cases that the FLIP doesn't seem to 
address explicitly:
Checkpoint retention and data availability. Since the catalog operates 
independently of the running job, there appears to be no mechanism to "pin" a 
checkpoint during a SQL session. This means a checkpoint discovered at session 
start could be subsumed by the time a query actually reads its underlying 
files. How should we handle this race condition — should there be some form of 
reference counting, or is it the user's responsibility to ensure sufficient 
retention?
That also brings up the questions about incremental checkpoints, whose 
integrity depends on multiple previous checkpoints.
Given that checkpoints and native savepoints use backend-specific storage 
formats, does the implementation require a separate reader for each state 
backend? 
Overall, this is an excellent addition to the Flink ecosystem. Looking forward 
to seeing it materialize!

Best,
Han Yin

> 2026年6月29日 17:53,Gyula Fóra <[email protected]> 写道:
> 
> Hi Flink Devs!
> 
> I would like to start the discussion about FLIP-599: State Catalog [1]
> 
> State and stateful processing has always been one of the most fundamental
> features of Flink and a major contributor to its success and global
> adoption.
> 
> Over the years several apis and methods have been developed to address the
> need for external access and analytics such as the state processor
> datastream / java apis, the since deprecated queryable state abstractions
> and more recently a number of table / SQL api connectors to access state
> metadata and keyed states in a somewhat limited way.
> 
> Extending the current capabilities of the state-process-api, this FLIP aims
> to lift state processing,  analytics and observability to a new level by
> introducing the State Catalog.
> 
> State Catalog is a Flink SQL Catalog implementation that allows discovering
> savepoints/checkpoints and mapping their state automatically to SQL tables.
> The tables are derived for the different operators and their keyed states
> with schema matching the state structure. Most importantly it supports
> reading POJO / Avro and other structured and basic type states without the
> original user classes (dependencies) by relying on Flink's transparent and
> efficiently structured serializer formats.
> 
> We have a fully functional prototype implementation developed with Gabor
> Somogyi that we will be happy to share if the community accepts the
> proposal!
> 
> Looking forward to your feedback and suggestions!
> 
> Gyula
> 
> [1]
> https://cwiki.apache.org/confluence/spaces/FLINK/pages/438009922/FLIP-599+State+Catalog

Reply via email to