Hi all,

Thank you for driving this. Being able to discover savepoints/checkpoints
and query their state as SQL tables without shipping the original user
classes is a genuinely valuable addition, and it's nice that it builds on
the existing state-table connector and savepoint_metadata work rather than
starting from scratch.

A few points and questions, mostly around scope and the serializer story:

   1. Non-keyed state and the DataStream path.
      - The FLIP scopes out BroadcastState, operator ListState and
      UnionState because "no readily available Table API connectors exist for
      these state types." That's a fair characterization of the Table
layer, but
      the state-processor DataStream API already reads all three today
      (SavepointReader#readBroadcastState / #readUnionState /
#readListState). So
      the limitation is really in the keyed-only SQL mapping (KeyedStateReader
      runs inside a keyed backend), not in the snapshots themselves.
      - Is the keyed-only scope a deliberate UX/table-mapping decision, or
      would a DataStream-backed reader be considered so the catalog isn't
      strictly less capable than the API it extends? Even if non-keyed contents
      stay out of scope initially, it would be good to frame this
explicitly as a
      Table-mapping constraint rather than a general one.
   2. Serializer transparency - the "no user classes" premise vs. custom
   serializers.
      - The design relies on Flink's transparent serializer formats to
      decode state without user dependencies, which is great for
POJO/Avro/basic
      types. But two serialization efforts point the other way: FLIP-398 [1]
      (released) already lets users configure serializers per type via
      pipeline.serialization-config, and FLIP-538 [2] (in discussion) adds
      pluggable custom generic-type serializers (e.g. Apache Fory) and promotes
      TypeSerializer/TypeSerializerSnapshot to @Public. As FLIP-538
itself notes,
      state written with a custom serializer becomes dependent on that
serializer
      to decode - external tooling without it cannot read those bytes.
      - Could we make the deserialization side pluggable and config-driven,
      mirroring FLIP-398's serialization-config, with a graceful fallback (e.g.
      expose the raw bytes / skip the column) when a format isn't transparently
      decodable? There already seems to be a seam for this
      (SavepointTypeInformationFactory), and making it a first-class,
      config-selectable option would keep the catalog forward-compatible as
      serialization support grows.
   3. Robustness of the transparent decoding path.
      - Related to (2): reconstructing values by mirroring the binary
      layout (PojoToRowDataDeserializer) is the most powerful but also the most
      fragile part of the design. How is it expected to behave across
serializer
      schema evolution / state migration (a serializer snapshot that
differs from
      the writer's), Kryo-fallback fields, nested/generic types, and
nullability?
      - It would help to spell out the supported matrix and the failure
      mode (hard error vs. degrade to raw bytes) up front, since this
is exactly
      where "read without the user classes" is most likely to break in practice.
   4. Observability / summary reporting.
      - The metadata view is a great start. Two small asks:
         - per-subtask (or per-key-group) size granularity in addition to
         per-operator, since skew is usually what you are chasing on
large state;
         - optionally rounding out the size breakdown with managed/raw
         operator state and channel state sizes for a full picture (noting the
         latter are in-flight / unaligned-checkpoint buffers rather
than user state).
      - A prominent upfront summary of the largest operators / state is
      often what users want before drilling in.


Best regards,
Dennis

[1]
https://cwiki.apache.org/confluence/spaces/FLINK/pages/282102217/FLIP-398+Improve+Serialization+Configuration+And+Usage+In+Flink
[2]
https://cwiki.apache.org/confluence/spaces/FLINK/pages/373886828/FLIP-538+Support+Custom+Generic+Type+Serializer

On Mon, Jun 29, 2026 at 12:53 PM Gyula Fóra <[email protected]> wrote:

> Hi Flink Devs!
>
> I would like to start the discussion about FLIP-599: State Catalog [1]
>
> State and stateful processing has always been one of the most fundamental
> features of Flink and a major contributor to its success and global
> adoption.
>
> Over the years several apis and methods have been developed to address the
> need for external access and analytics such as the state processor
> datastream / java apis, the since deprecated queryable state abstractions
> and more recently a number of table / SQL api connectors to access state
> metadata and keyed states in a somewhat limited way.
>
> Extending the current capabilities of the state-process-api, this FLIP aims
> to lift state processing,  analytics and observability to a new level by
> introducing the State Catalog.
>
> State Catalog is a Flink SQL Catalog implementation that allows discovering
> savepoints/checkpoints and mapping their state automatically to SQL tables.
> The tables are derived for the different operators and their keyed states
> with schema matching the state structure. Most importantly it supports
> reading POJO / Avro and other structured and basic type states without the
> original user classes (dependencies) by relying on Flink's transparent and
> efficiently structured serializer formats.
>
> We have a fully functional prototype implementation developed with Gabor
> Somogyi that we will be happy to share if the community accepts the
> proposal!
>
> Looking forward to your feedback and suggestions!
>
> Gyula
>
> [1]
>
> https://cwiki.apache.org/confluence/spaces/FLINK/pages/438009922/FLIP-599+State+Catalog
>

Reply via email to