Re: [DISCUSS] FLIP-521: Integrating Variant Type into Flink: Enabling Efficient Semi-Structured Data Processing

Xuannan Su Fri, 25 Apr 2025 04:47:28 -0700

Hi everyone,

Thank you for all the comments! If there are no further comments, I'd
like to close the discussion and start the voting next Monday.


Best,
Xuannan

On Fri, Apr 25, 2025 at 7:41 PM Lincoln Lee <[email protected]> wrote:
>
> +1 for this FLIP. VARIANT type support will be a great addition to sql.
> Look forward to the detailed design of the subsequent shredding
> optimizations.
>
>
> Best,
> Lincoln Lee
>
>
> Timo Walther <[email protected]> 于2025年4月22日周二 16:51写道：
>
> > +1 for this feature. Having a VARIANT type makes a lot of sense and
> > together with an OBJECT type will make semi-structured data processing
> > in Flink easier.
> >
> > Currently, I'm catching up with notifications after the easter holidays,
> > but happy to give some feedback by tomorrow or Thursday as well.
> >
> > Thanks,
> > Timo
> >
> > On 22.04.25 10:40, Jingsong Li wrote:
> > > Thanks Xuannan for driving this discussion.
> > >
> > > At present, communities such as Apache Iceberg, Delta, Spark, Parquet,
> > > etc. are all designing and developing around Variant, and our Flink
> > > support for Variant is very valuable.
> > >
> > > After a rough look at the design, there is no overall problem. It is
> > > designed around Parquet's Variant standard, which is similar to the
> > > overall design of Spark SQL.
> > >
> > > +1 for this.
> > >
> > > Best,
> > > Jingsong
> > >
> > > On Mon, Apr 14, 2025 at 6:12 PM Xuannan Su <[email protected]>
> > wrote:
> > >>
> > >> Hi devs,
> > >>
> > >> I’d like to start a discussion around FLIP-521: Integrating Variant
> > >> Type into Flink: Enabling Efficient Semi-Structured Data
> > >> Processing[1]. Working with semi-structured data has long been a
> > >> foundational scenario of the Lakehouse. While JSON has traditionally
> > >> served as the primary storage format for such data, its implementation
> > >> as serialized strings introduces significant inefficiencies.
> > >>
> > >> In this FLIP, we integrate the Variant encoding, which is a compact
> > >> binary representation of semi-structured data[2], to improve the
> > >> performance of processing semi-structured data. As Paimon has
> > >> supported the Variant type recently[3], this FLIP would allow Flink to
> > >> further leverage Paimon's storage-layer optimizations, improving
> > >> performance and resource utilization for semi-structured data
> > >> pipelines.
> > >>
> > >> Best,
> > >> Xuannan
> > >>
> > >> [1]
> > https://cwiki.apache.org/confluence/display/FLINK/FLIP-521%3A+Integrating+Variant+Type+into+Flink%3A+Enabling+Efficient+Semi-Structured+Data+Processing
> > >> [2]
> > https://github.com/apache/parquet-format/blob/master/VariantEncoding.md
> > >> [3] https://github.com/apache/paimon/issues/4471
> > >
> >
> >

Re: [DISCUSS] FLIP-521: Integrating Variant Type into Flink: Enabling Efficient Semi-Structured Data Processing

Reply via email to