Re: [DISCUSS] FLIP-599: State Catalog

Gyula Fóra Thu, 02 Jul 2026 05:26:36 -0700

Hi Dennis!

Thank you for the questions. Much recent work in the state connector api
has been done basically towards this type of nice cataloging and flexible
access. There are a few holes and things that have to be changed, not
everything is enumerated in the FLIP but we have to have an open mind and
make all necessary changes as you said to make this truly nice and
comprehensive as much as possible. Most state processor apis are marked
experimental so we can be flexible within reason :)


Now to the concrete questions:

1. Non-keyed state support / scope
I think non-keyed states should definitely be in the scope of the FLIP in
terms of design , and my intention was not to exclude them I just focused
on the keyed state as that is readily available in our prototype
implementation (without much changes to the existing connectors). I will
try to update the FLIP to include non-keyed states more in detail but I
think the case is pretty straightforward. From a table representation
perspective, they can follow a similar pattern such as:
uid_opUID_statename_broadcast  , uid_opUID_statename_list . A corresponding
SQL connector can easily be added to support these based on the existing
datastream connector. I will make sure to add separate tickets for these
types of states once the FLIP is accepted and this work can very easily be
parallelized across different state types within the existing catalog
frameworks. This way keyed/non-keyed states will live directly together in
a single catalog/db.

In the future we can even go a step further and include connector specific
state views such as kafka offsets etc with custom connector specific plugins

2/3. Serializer transparency and robustness
>From a practical standpoint both generated (synthetic) serializers and
custom classes / kryo and pluggable logic could work but the whole catalog
concepts requires a certain behaviour to be useful. The catalog would point
to savepoint directories and discover all state in it (potentially from
multiple jobs). Configuration has to be done in a generic way, I don't see
a problem with introducing configs for specifying custom
serializers/factories either generically for certain specific classes. In
most cases however this won't be necessary as the state snapshot itself
usually has a reference (classname) of the original user classes. If the
catalog process has access to those classes it will use that directly, or
other confugred serializers, and only if not available fall back to
generating serializers for POJO/TUPLE types. There is obviously a limit to
what is possible here initially, Kryo being one exception where you either
have the class or not.

I would like to however point out that we do not have to support everything
initially, we can start with what is currently available, use the classpath
/ generated serializers and as we develop we will find the limits of this
approach and then can extend with configuration as it feels natural instead
of trying to create a super complex initial solution. But I definitely
agree that we should support custom serializer already specified in the
config that is otherwise used by flink for the jobs (but I think this
should more or less work out of the box).

4. The metadata view is currently reused based on the existing table valued
function. Let's take this as a followup under this umbrella to improve /
extend the metadata view. I don't think we need a separate FLIP but it also
feels out of scope here.

Cheers
Gyula




On Thu, Jul 2, 2026 at 1:02 PM Dennis-Mircea Ciupitu <
[email protected]> wrote:

> Hi all,
>
> Thank you for driving this. Being able to discover savepoints/checkpoints
> and query their state as SQL tables without shipping the original user
> classes is a genuinely valuable addition, and it's nice that it builds on
> the existing state-table connector and savepoint_metadata work rather than
> starting from scratch.
>
> A few points and questions, mostly around scope and the serializer story:
>
>    1. Non-keyed state and the DataStream path.
>       - The FLIP scopes out BroadcastState, operator ListState and
>       UnionState because "no readily available Table API connectors exist
> for
>       these state types." That's a fair characterization of the Table
> layer, but
>       the state-processor DataStream API already reads all three today
>       (SavepointReader#readBroadcastState / #readUnionState /
> #readListState). So
>       the limitation is really in the keyed-only SQL mapping
> (KeyedStateReader
>       runs inside a keyed backend), not in the snapshots themselves.
>       - Is the keyed-only scope a deliberate UX/table-mapping decision, or
>       would a DataStream-backed reader be considered so the catalog isn't
>       strictly less capable than the API it extends? Even if non-keyed
> contents
>       stay out of scope initially, it would be good to frame this
> explicitly as a
>       Table-mapping constraint rather than a general one.
>    2. Serializer transparency - the "no user classes" premise vs. custom
>    serializers.
>       - The design relies on Flink's transparent serializer formats to
>       decode state without user dependencies, which is great for
> POJO/Avro/basic
>       types. But two serialization efforts point the other way: FLIP-398
> [1]
>       (released) already lets users configure serializers per type via
>       pipeline.serialization-config, and FLIP-538 [2] (in discussion) adds
>       pluggable custom generic-type serializers (e.g. Apache Fory) and
> promotes
>       TypeSerializer/TypeSerializerSnapshot to @Public. As FLIP-538
> itself notes,
>       state written with a custom serializer becomes dependent on that
> serializer
>       to decode - external tooling without it cannot read those bytes.
>       - Could we make the deserialization side pluggable and config-driven,
>       mirroring FLIP-398's serialization-config, with a graceful fallback
> (e.g.
>       expose the raw bytes / skip the column) when a format isn't
> transparently
>       decodable? There already seems to be a seam for this
>       (SavepointTypeInformationFactory), and making it a first-class,
>       config-selectable option would keep the catalog forward-compatible as
>       serialization support grows.
>    3. Robustness of the transparent decoding path.
>       - Related to (2): reconstructing values by mirroring the binary
>       layout (PojoToRowDataDeserializer) is the most powerful but also the
> most
>       fragile part of the design. How is it expected to behave across
> serializer
>       schema evolution / state migration (a serializer snapshot that
> differs from
>       the writer's), Kryo-fallback fields, nested/generic types, and
> nullability?
>       - It would help to spell out the supported matrix and the failure
>       mode (hard error vs. degrade to raw bytes) up front, since this
> is exactly
>       where "read without the user classes" is most likely to break in
> practice.
>    4. Observability / summary reporting.
>       - The metadata view is a great start. Two small asks:
>          - per-subtask (or per-key-group) size granularity in addition to
>          per-operator, since skew is usually what you are chasing on
> large state;
>          - optionally rounding out the size breakdown with managed/raw
>          operator state and channel state sizes for a full picture (noting
> the
>          latter are in-flight / unaligned-checkpoint buffers rather
> than user state).
>       - A prominent upfront summary of the largest operators / state is
>       often what users want before drilling in.
>
>
> Best regards,
> Dennis
>
> [1]
>
> https://cwiki.apache.org/confluence/spaces/FLINK/pages/282102217/FLIP-398+Improve+Serialization+Configuration+And+Usage+In+Flink
> [2]
>
> https://cwiki.apache.org/confluence/spaces/FLINK/pages/373886828/FLIP-538+Support+Custom+Generic+Type+Serializer
>
> On Mon, Jun 29, 2026 at 12:53 PM Gyula Fóra <[email protected]> wrote:
>
> > Hi Flink Devs!
> >
> > I would like to start the discussion about FLIP-599: State Catalog [1]
> >
> > State and stateful processing has always been one of the most fundamental
> > features of Flink and a major contributor to its success and global
> > adoption.
> >
> > Over the years several apis and methods have been developed to address
> the
> > need for external access and analytics such as the state processor
> > datastream / java apis, the since deprecated queryable state abstractions
> > and more recently a number of table / SQL api connectors to access state
> > metadata and keyed states in a somewhat limited way.
> >
> > Extending the current capabilities of the state-process-api, this FLIP
> aims
> > to lift state processing,  analytics and observability to a new level by
> > introducing the State Catalog.
> >
> > State Catalog is a Flink SQL Catalog implementation that allows
> discovering
> > savepoints/checkpoints and mapping their state automatically to SQL
> tables.
> > The tables are derived for the different operators and their keyed states
> > with schema matching the state structure. Most importantly it supports
> > reading POJO / Avro and other structured and basic type states without
> the
> > original user classes (dependencies) by relying on Flink's transparent
> and
> > efficiently structured serializer formats.
> >
> > We have a fully functional prototype implementation developed with Gabor
> > Somogyi that we will be happy to share if the community accepts the
> > proposal!
> >
> > Looking forward to your feedback and suggestions!
> >
> > Gyula
> >
> > [1]
> >
> >
> https://cwiki.apache.org/confluence/spaces/FLINK/pages/438009922/FLIP-599+State+Catalog
> >
>

Re: [DISCUSS] FLIP-599: State Catalog

Reply via email to