No, I think it's fine. Now someone has to write an actual PR for the
format docs, I guess.
Thanks
Antoine.
Le 01/07/2025 à 17:44, Andrew Lamb a écrit :
The problem is that the discussion is still framed as "Arrow Variant"
type (see mail subject line) but most people seem to be thinking of
canonicalizing a Parquet Variant extension type in Arrow.
I have renamed the ticket[1] to "[Format] Add an Arrow Canonical Extension
Type for Parquet Variant #46908" to try and reduce the confusion
Antoine, do you have any other recommendations on how to avoid confusion
other than being more precise with the naming?
Andrew
[1]: https://github.com/apache/arrow/issues/46908
On Thu, Jun 26, 2025 at 8:43 AM Joris Van den Bossche <
jorisvandenboss...@gmail.com> wrote:
Note that the extension type that was merged in Go
(
https://github.com/apache/arrow-go/blob/c542dd68e2757122ce8ffc15936f2df46664c30c/arrow/extensions/variant.go#L170
)
and also the one used in Parquet C++ in the arrow repo is using the
name "parquet.variant", not "arrow.variant".
That could help frame it as "Parquet variant" compatible instead of
*the* Arrow variant type. But from the discussion here (or the google
doc), it was not clear to me that this is the name being used in the
current implementations, and the proposal is to follow those
implementations or change them to "arrow.variant" once that would be
voted upon.
Joris
On Thu, 26 Jun 2025 at 13:58, Antoine Pitrou <anto...@python.org> wrote:
The problem is that the discussion is still framed as "Arrow Variant"
type (see mail subject line) but most people seem to be thinking of
canonicalizing a Parquet Variant extension type in Arrow.
That confusion should be cleared before we think of moving any further.
Regards
Antoine.
On Wed, 25 Jun 2025 12:38:21 -0400
Andrew Lamb <al...@influxdata.com> wrote:
Did we ever decide that Variant will be a Arrow canonical extension
type?
I don't see it currently listed in the docs [1] however an extension
type
maybe was added to the C++ implementation in [2] (sorry I am not
familiar with that codebase to be sure)
As I think was mentioned elsewhere there is also a github discussion
about
adding Variant as a real type[3] that may also be relevant, from Curt.
If this is the direction we are heading I will be happy to file a
ticket to
track the work
Andrew
[1]:
https://arrow.apache.org/docs/format/CanonicalExtensions.html#canonical-extension-types
[2]: https://github.com/apache/arrow/pull/45375/files
[3]: https://github.com/apache/arrow/issues/42069
On Wed, May 21, 2025 at 4:43 AM wish maple <maplewish...@gmail.com>
wrote:
When I went through the parquet variant spec, I found that an arrow
extension type might be a must because decoding the parquet row
by row is so inefficient.
I've draft a decoding tool in parquet c++ and ready for review now
[1]
[1] https://github.com/apache/arrow/pull/46372
Best,
Xuwei Fu
Matt Topol <zotthewiz...@gmail.com> 于2025年5月9日周五 06:03写道:
Hey All,
There's been various discussions occurring on many different thread
locations (issues, PRs, and so on)[1][2][3], and more that I
haven't
linked to, concerning what a canonical Variant Extension Type for
Arrow might look like. As I've looked into implementing some
things,
I've also spoken with members of the Arrow, Iceberg and Parquet
communities as to what a good representation for Arrow Variant
would
be like in order to ensure good support and adoption.
I also looked at the ClickHouse variant implementation [4]. The
ClickHouse Variant is nearly equivalent to the Arrow Dense Union
type,
so we don't need to do any extra work there to support it.
So, after discussions and looking into the needs for engines and so
on, I've iterated and written up a proposal for what a Canonical
Variant Extension Type for Arrow could be in a google doc[5]. I'm
hoping that this can spark some discussion and comments on the
document. If there's relative consensus on it, then I'll work on
creating some implementations of it that I can use to formally
propose
the addition to the Canonical Extensions.
Please take a read and leave comments on the google doc or on this
thread. Thanks everyone!
--Matt
[1]: https://github.com/apache/arrow-rs/issues/7063
[2]: https://github.com/apache/arrow/issues/45937
[3]:
https://github.com/apache/arrow/pull/45375#issuecomment-2649807352
[4]:
https://clickhouse.com/blog/a-new-powerful-json-data-type-for-clickhouse
[5]:
https://docs.google.com/document/d/1pw0AWoMQY3SjD7R4LgbPvMjG_xSCtXp3rZHkVp9jpZ4/edit?usp=sharing