Hi Xuannan,
This looks like a good addition.

  1.  I was wondering whether it is possible to have a type, but the value be 
null – for example a null value in a Float type and tolerate nulls being 
returned for float getFloat(). If so then maybe we should return an object 
Float instead.
  2.  You mention maps in the Flip text but do not have it has a type. I 
wondered what your thinking is.
  3.  In the new functions PARSE_JSON and TRY_PARSE_JSON, the text says they 
parse to a variant. As we support JSON_OBJECT as well, there could be an 
expectation that json_object would be the expected return type. Maybe we could 
allow the user to choose what gets returned?
  4.  Can variants be turned into json_objects and vice versa.

Kind regards, David.

From: Xuannan Su <suxuanna...@gmail.com>
Date: Friday, 25 April 2025 at 12:47
To: dev@flink.apache.org <dev@flink.apache.org>
Subject: [EXTERNAL] Re: [DISCUSS] FLIP-521: Integrating Variant Type into 
Flink: Enabling Efficient Semi-Structured Data Processing
Hi everyone,

Thank you for all the comments! If there are no further comments, I'd
like to close the discussion and start the voting next Monday.

Best,
Xuannan

On Fri, Apr 25, 2025 at 7:41 PM Lincoln Lee <lincoln.8...@gmail.com> wrote:
>
> +1 for this FLIP. VARIANT type support will be a great addition to sql.
> Look forward to the detailed design of the subsequent shredding
> optimizations.
>
>
> Best,
> Lincoln Lee
>
>
> Timo Walther <twal...@apache.org> 于2025年4月22日周二 16:51写道:
>
> > +1 for this feature. Having a VARIANT type makes a lot of sense and
> > together with an OBJECT type will make semi-structured data processing
> > in Flink easier.
> >
> > Currently, I'm catching up with notifications after the easter holidays,
> > but happy to give some feedback by tomorrow or Thursday as well.
> >
> > Thanks,
> > Timo
> >
> > On 22.04.25 10:40, Jingsong Li wrote:
> > > Thanks Xuannan for driving this discussion.
> > >
> > > At present, communities such as Apache Iceberg, Delta, Spark, Parquet,
> > > etc. are all designing and developing around Variant, and our Flink
> > > support for Variant is very valuable.
> > >
> > > After a rough look at the design, there is no overall problem. It is
> > > designed around Parquet's Variant standard, which is similar to the
> > > overall design of Spark SQL.
> > >
> > > +1 for this.
> > >
> > > Best,
> > > Jingsong
> > >
> > > On Mon, Apr 14, 2025 at 6:12 PM Xuannan Su <suxuanna...@gmail.com>
> > wrote:
> > >>
> > >> Hi devs,
> > >>
> > >> I’d like to start a discussion around FLIP-521: Integrating Variant
> > >> Type into Flink: Enabling Efficient Semi-Structured Data
> > >> Processing[1]. Working with semi-structured data has long been a
> > >> foundational scenario of the Lakehouse. While JSON has traditionally
> > >> served as the primary storage format for such data, its implementation
> > >> as serialized strings introduces significant inefficiencies.
> > >>
> > >> In this FLIP, we integrate the Variant encoding, which is a compact
> > >> binary representation of semi-structured data[2], to improve the
> > >> performance of processing semi-structured data. As Paimon has
> > >> supported the Variant type recently[3], this FLIP would allow Flink to
> > >> further leverage Paimon's storage-layer optimizations, improving
> > >> performance and resource utilization for semi-structured data
> > >> pipelines.
> > >>
> > >> Best,
> > >> Xuannan
> > >>
> > >> [1]
> > https://cwiki.apache.org/confluence/display/FLINK/FLIP-521%3A+Integrating+Variant+Type+into+Flink%3A+Enabling+Efficient+Semi-Structured+Data+Processing
> > >> [2]
> > https://github.com/apache/parquet-format/blob/master/VariantEncoding.md
> > >> [3] https://github.com/apache/paimon/issues/4471
> > >
> >
> >

Unless otherwise stated above:

IBM United Kingdom Limited
Registered in England and Wales with number 741598
Registered office: Building C, IBM Hursley Office, Hursley Park Road, 
Winchester, Hampshire SO21 2JN

Reply via email to