I am not sure a proper writeup is useful quite yet I have filed a ticket to track the idea[1]. I'll report back here or on that ticket as we work on implementing this in arrow-rs. Thank you
[1]: https://github.com/apache/arrow/issues/46908 On Wed, Jun 25, 2025 at 12:47 PM Matt Topol <zotthewiz...@gmail.com> wrote: > We've merged the extension type into the Arrow Go implementation[1], I've > been waiting for one of the other implementations to implement the proposal > [2] before I go make a PR to add it to the docs in full. If you think it's > worthwhile for me to start drafting up a PR to add to the Canonical > Extensions right now then I'm happy to do so. I think most of the > objections to using an extension type instead of a real type are answered > or managed by the Proposal [2] and ensuring the extension type has > appropriate functional support and methods. > > I think filing a ticket to track the work makes a lot of sense! > > --Matt > > [1]: > > https://github.com/apache/arrow-go/commit/5240503993cc0aa47554b932c341e4940ce42348 > [2]: > > https://docs.google.com/document/d/1pw0AWoMQY3SjD7R4LgbPvMjG_xSCtXp3rZHkVp9jpZ4/edit?usp=sharing > > On Wed, Jun 25, 2025 at 12:38 PM Andrew Lamb <al...@influxdata.com> wrote: > > > Did we ever decide that Variant will be a Arrow canonical extension type? > > > > I don't see it currently listed in the docs [1] however an extension type > > maybe was added to the C++ implementation in [2] (sorry I am not > > familiar with that codebase to be sure) > > > > As I think was mentioned elsewhere there is also a github discussion > about > > adding Variant as a real type[3] that may also be relevant, from Curt. > > > > If this is the direction we are heading I will be happy to file a ticket > to > > track the work > > > > Andrew > > > > [1]: > > > > > https://arrow.apache.org/docs/format/CanonicalExtensions.html#canonical-extension-types > > [2]: https://github.com/apache/arrow/pull/45375/files > > [3]: https://github.com/apache/arrow/issues/42069 > > > > On Wed, May 21, 2025 at 4:43 AM wish maple <maplewish...@gmail.com> > wrote: > > > > > When I went through the parquet variant spec, I found that an arrow > > > extension type might be a must because decoding the parquet row > > > by row is so inefficient. > > > > > > I've draft a decoding tool in parquet c++ and ready for review now [1] > > > > > > [1] https://github.com/apache/arrow/pull/46372 > > > > > > Best, > > > Xuwei Fu > > > > > > Matt Topol <zotthewiz...@gmail.com> 于2025年5月9日周五 06:03写道: > > > > > > > Hey All, > > > > > > > > There's been various discussions occurring on many different thread > > > > locations (issues, PRs, and so on)[1][2][3], and more that I haven't > > > > linked to, concerning what a canonical Variant Extension Type for > > > > Arrow might look like. As I've looked into implementing some things, > > > > I've also spoken with members of the Arrow, Iceberg and Parquet > > > > communities as to what a good representation for Arrow Variant would > > > > be like in order to ensure good support and adoption. > > > > > > > > I also looked at the ClickHouse variant implementation [4]. The > > > > ClickHouse Variant is nearly equivalent to the Arrow Dense Union > type, > > > > so we don't need to do any extra work there to support it. > > > > > > > > So, after discussions and looking into the needs for engines and so > > > > on, I've iterated and written up a proposal for what a Canonical > > > > Variant Extension Type for Arrow could be in a google doc[5]. I'm > > > > hoping that this can spark some discussion and comments on the > > > > document. If there's relative consensus on it, then I'll work on > > > > creating some implementations of it that I can use to formally > propose > > > > the addition to the Canonical Extensions. > > > > > > > > Please take a read and leave comments on the google doc or on this > > > > thread. Thanks everyone! > > > > > > > > --Matt > > > > > > > > [1]: https://github.com/apache/arrow-rs/issues/7063 > > > > [2]: https://github.com/apache/arrow/issues/45937 > > > > [3]: > > https://github.com/apache/arrow/pull/45375#issuecomment-2649807352 > > > > [4]: > > > > > > https://clickhouse.com/blog/a-new-powerful-json-data-type-for-clickhouse > > > > [5]: > > > > > > > > > > https://docs.google.com/document/d/1pw0AWoMQY3SjD7R4LgbPvMjG_xSCtXp3rZHkVp9jpZ4/edit?usp=sharing > > > > > > > > > >