Parquet is the right project for standardizing the new data type Variant huaxin gao <huaxin.ga...@gmail.com> 于2024年8月21日周三 15:26写道:
> +1 for moving variant type to Parquet, as it promotes standardization and > interoperability across numerous projects. > > Huaxin > > On Wed, Aug 21, 2024 at 1:28 PM Yufei Gu <flyrain...@gmail.com> wrote: > >> Agreed that Parquet would be a good place to host the new type. Different >> table formats, like Iceberg and Delta can benefit from it as they have >> based on parquet already. >> >> Yufei >> >> >> On Wed, Aug 21, 2024 at 12:15 AM Alkis Evlogimenos >> <alkis.evlogime...@databricks.com.invalid> wrote: >> >>> +1 >>> >>> In addition to everything said above, it is also a great opportunity for >>> wider testing and possibly tweaking the spec before it takes off post >>> standardization. >>> >>> On Tue, Aug 20, 2024 at 4:36 PM Russell Spitzer < >>> russell.spit...@gmail.com> wrote: >>> >>>> I think this would be a great move to encourage all sorts of engines >>>> and table formats to take advantage of variant type and make sure it >>>> remains compatible between all those systems. >>>> >>>> I strongly support this, >>>> Russ >>>> >>>> On Tue, Aug 20, 2024 at 8:06 AM Fokko Driesprong <fo...@apache.org> >>>> wrote: >>>> >>>>> Hey everyone, >>>>> >>>>> I agree the Parquet project is a good place to host and evolve the >>>>> spec (we could store it in parquet-variant?). We would need to align this >>>>> with the Parquet project. Anyway, I'm familiar both with Iceberg and >>>>> Parquet and happy to help where needed. >>>>> >>>>> Kind regards, >>>>> Fokko >>>>> >>>>> >>>>> Op ma 19 aug 2024 om 16:36 schreef Reynold Xin >>>>> <r...@databricks.com.invalid>: >>>>> >>>>>> As I said on dev@iceberg, it'd be really unfortunate if we end up >>>>>> with two or even more diverging specs for storing variants. It just adds >>>>>> more work for everybody to interop. Parquet would be a great home for >>>>>> this >>>>>> spec as a neutral project that almost all the other important projects in >>>>>> this space depend on as the de facto standard for physical data encoding >>>>>> and storage. So if we can collaborate with the Parquet community and get >>>>>> this into Parquet to avoid each project building its own spec, that'd be >>>>>> amazing. >>>>>> >>>>>> >>>>>> >>>>>> >>>>>> On Sat, Aug 17, 2024 at 2:56 AM Gene Pang <gene.p...@gmail.com> >>>>>> wrote: >>>>>> >>>>>>> Hi all, >>>>>>> >>>>>>> I am one of the main developers implementing Variant in Spark. The >>>>>>> specification and all the code are currently merged into the >>>>>>> common/variant >>>>>>> <https://github.com/apache/spark/tree/master/common/variant> >>>>>>> package in the Spark repo. >>>>>>> >>>>>>> There has been growing interest from other projects (such as >>>>>>> Iceberg) in supporting Variant, and we think that moving the Variant >>>>>>> spec >>>>>>> and implementation out to a new home might be the best way for all the >>>>>>> different projects to be able to use and collaborate on Variant. We >>>>>>> originally put all the Variant code under common/variant with the >>>>>>> expectation that eventually it would be moved elsewhere. >>>>>>> >>>>>>> We are proposing that we move the Variant spec and implementation >>>>>>> out of the Spark project, to the Parquet project. Spark depends heavily >>>>>>> on >>>>>>> Parquet, and the Variant spec contains a lot of details on the physical >>>>>>> storage layer, such as shredding. The Parquet project would be a great >>>>>>> place to standardize the Variant data type, and to enable >>>>>>> interoperability >>>>>>> across many different projects. However, even when we move Variant out, >>>>>>> we >>>>>>> expect to retain the compatibility with the current Spark >>>>>>> implementation. >>>>>>> >>>>>>> What do people think? There are probably many details we still need >>>>>>> to figure out in terms of moving the implementation, but at a >>>>>>> high-level, >>>>>>> does it make sense to move Variant to Parquet? >>>>>>> >>>>>>> I appreciate your feedback! >>>>>>> >>>>>>> Thanks, >>>>>>> Gene >>>>>>> >>>>>>