Re: [DISCUSS] FLIP-521: Integrating Variant Type into Flink: Enabling Efficient Semi-Structured Data Processing

Timo Walther Tue, 22 Apr 2025 01:52:03 -0700

+1 for this feature. Having a VARIANT type makes a lot of sense andtogether with an OBJECT type will make semi-structured data processingin Flink easier.

Currently, I'm catching up with notifications after the easter holidays,but happy to give some feedback by tomorrow or Thursday as well.


Thanks,
Timo

On 22.04.25 10:40, Jingsong Li wrote:

Thanks Xuannan for driving this discussion.

At present, communities such as Apache Iceberg, Delta, Spark, Parquet,
etc. are all designing and developing around Variant, and our Flink
support for Variant is very valuable.

After a rough look at the design, there is no overall problem. It is
designed around Parquet's Variant standard, which is similar to the
overall design of Spark SQL.

+1 for this.

Best,
Jingsong

On Mon, Apr 14, 2025 at 6:12 PM Xuannan Su <[email protected]> wrote:


Hi devs,

I’d like to start a discussion around FLIP-521: Integrating Variant
Type into Flink: Enabling Efficient Semi-Structured Data
Processing[1]. Working with semi-structured data has long been a
foundational scenario of the Lakehouse. While JSON has traditionally
served as the primary storage format for such data, its implementation
as serialized strings introduces significant inefficiencies.

In this FLIP, we integrate the Variant encoding, which is a compact
binary representation of semi-structured data[2], to improve the
performance of processing semi-structured data. As Paimon has
supported the Variant type recently[3], this FLIP would allow Flink to
further leverage Paimon's storage-layer optimizations, improving
performance and resource utilization for semi-structured data
pipelines.

Best,
Xuannan

[1] 
https://cwiki.apache.org/confluence/display/FLINK/FLIP-521%3A+Integrating+Variant+Type+into+Flink%3A+Enabling+Efficient+Semi-Structured+Data+Processing
[2] https://github.com/apache/parquet-format/blob/master/VariantEncoding.md
[3] https://github.com/apache/paimon/issues/4471

Re: [DISCUSS] FLIP-521: Integrating Variant Type into Flink: Enabling Efficient Semi-Structured Data Processing

Reply via email to