> The problem is that the discussion is still framed as "Arrow Variant"
> type (see mail subject line) but most people seem to be thinking of
> canonicalizing a Parquet Variant extension type in Arrow.

I have renamed the ticket[1] to "[Format] Add an Arrow Canonical Extension
Type for Parquet Variant #46908" to try and reduce the confusion

Antoine, do you have any other recommendations on how to avoid confusion
other than being more precise with the naming?

Andrew

[1]: https://github.com/apache/arrow/issues/46908


On Thu, Jun 26, 2025 at 8:43 AM Joris Van den Bossche <
jorisvandenboss...@gmail.com> wrote:

> Note that the extension type that was merged in Go
> (
> https://github.com/apache/arrow-go/blob/c542dd68e2757122ce8ffc15936f2df46664c30c/arrow/extensions/variant.go#L170
> )
> and also the one used in Parquet C++ in the arrow repo is using the
> name "parquet.variant", not "arrow.variant".
>
> That could help frame it as "Parquet variant" compatible instead of
> *the* Arrow variant type. But from the discussion here (or the google
> doc), it was not clear to me that this is the name being used in the
> current implementations, and the proposal is to follow those
> implementations or change them to "arrow.variant" once that would be
> voted upon.
>
> Joris
>
> On Thu, 26 Jun 2025 at 13:58, Antoine Pitrou <anto...@python.org> wrote:
> >
> >
> > The problem is that the discussion is still framed as "Arrow Variant"
> > type (see mail subject line) but most people seem to be thinking of
> > canonicalizing a Parquet Variant extension type in Arrow.
> >
> > That confusion should be cleared before we think of moving any further.
> >
> > Regards
> >
> > Antoine.
> >
> >
> > On Wed, 25 Jun 2025 12:38:21 -0400
> > Andrew Lamb <al...@influxdata.com> wrote:
> > > Did we ever decide that Variant will be a Arrow canonical extension
> type?
> > >
> > > I don't see it currently listed in the docs [1] however an extension
> type
> > > maybe was added to the C++ implementation in [2] (sorry I am not
> > > familiar with that codebase to be sure)
> > >
> > > As I think was mentioned elsewhere there is also a github discussion
> about
> > > adding Variant as a real type[3] that may also be relevant, from Curt.
> > >
> > > If this is the direction we are heading I will be happy to file a
> ticket to
> > > track the work
> > >
> > > Andrew
> > >
> > > [1]:
> > >
> https://arrow.apache.org/docs/format/CanonicalExtensions.html#canonical-extension-types
> > > [2]: https://github.com/apache/arrow/pull/45375/files
> > > [3]: https://github.com/apache/arrow/issues/42069
> > >
> > > On Wed, May 21, 2025 at 4:43 AM wish maple <maplewish...@gmail.com>
> wrote:
> > >
> > > > When I went through the parquet variant spec, I found that an arrow
> > > > extension type might be a must because decoding the parquet row
> > > > by row is so inefficient.
> > > >
> > > > I've draft a decoding tool in parquet c++ and ready for review now
> [1]
> > > >
> > > > [1] https://github.com/apache/arrow/pull/46372
> > > >
> > > > Best,
> > > > Xuwei Fu
> > > >
> > > > Matt Topol <zotthewiz...@gmail.com> 于2025年5月9日周五 06:03写道:
> > > >
> > > > > Hey All,
> > > > >
> > > > > There's been various discussions occurring on many different thread
> > > > > locations (issues, PRs, and so on)[1][2][3], and more that I
> haven't
> > > > > linked to, concerning what a canonical Variant Extension Type for
> > > > > Arrow might look like. As I've looked into implementing some
> things,
> > > > > I've also spoken with members of the Arrow, Iceberg and Parquet
> > > > > communities as to what a good representation for Arrow Variant
> would
> > > > > be like in order to ensure good support and adoption.
> > > > >
> > > > > I also looked at the ClickHouse variant implementation [4]. The
> > > > > ClickHouse Variant is nearly equivalent to the Arrow Dense Union
> type,
> > > > > so we don't need to do any extra work there to support it.
> > > > >
> > > > > So, after discussions and looking into the needs for engines and so
> > > > > on, I've iterated and written up a proposal for what a Canonical
> > > > > Variant Extension Type for Arrow could be in a google doc[5]. I'm
> > > > > hoping that this can spark some discussion and comments on the
> > > > > document. If there's relative consensus on it, then I'll work on
> > > > > creating some implementations of it that I can use to formally
> propose
> > > > > the addition to the Canonical Extensions.
> > > > >
> > > > > Please take a read and leave comments on the google doc or on this
> > > > > thread. Thanks everyone!
> > > > >
> > > > > --Matt
> > > > >
> > > > > [1]: https://github.com/apache/arrow-rs/issues/7063
> > > > > [2]: https://github.com/apache/arrow/issues/45937
> > > > > [3]:
> https://github.com/apache/arrow/pull/45375#issuecomment-2649807352
> > > > > [4]:
> > > > >
> https://clickhouse.com/blog/a-new-powerful-json-data-type-for-clickhouse
> > > > > [5]:
> > > > >
> > > >
> https://docs.google.com/document/d/1pw0AWoMQY3SjD7R4LgbPvMjG_xSCtXp3rZHkVp9jpZ4/edit?usp=sharing
> > > > >
> > > >
> > >
> >
> >
> >
>

Reply via email to