Hi, There is a relevant discussion in the dev@parquet: https://lists.apache.org/thread/6h58hj39lhqtcyd2hlsyvqm4lzdh4b9z
The feedback looks promising. Looking forward to cooperating with the Spark community! Best regards, Gang On Thu, Aug 22, 2024 at 10:20 PM Eduard Tudenhöfner < etudenhoef...@apache.org> wrote: > +1 on moving this to the Parquet project/community (assuming that the > Parquet community is ok with this) > > On Thu, Aug 22, 2024 at 3:02 AM Chao Sun <sunc...@apache.org> wrote: > >> +1 too >> >> On Wed, Aug 21, 2024 at 4:43 PM huaxin gao <huaxin.ga...@gmail.com> >> wrote: >> >>> +1 for moving variant type to Parquet, as it promotes standardization >>> and interoperability across numerous projects. >>> >>> Huaxin >>> >>> On Wed, Aug 21, 2024 at 1:28 PM Yufei Gu <flyrain...@gmail.com> wrote: >>> >>>> Agreed that Parquet would be a good place to host the new type. >>>> Different table formats, like Iceberg and Delta can benefit from it as they >>>> have based on parquet already. >>>> >>>> Yufei >>>> >>>> >>>> On Wed, Aug 21, 2024 at 12:15 AM Alkis Evlogimenos >>>> <alkis.evlogime...@databricks.com.invalid> wrote: >>>> >>>>> +1 >>>>> >>>>> In addition to everything said above, it is also a great opportunity >>>>> for wider testing and possibly tweaking the spec before it takes off post >>>>> standardization. >>>>> >>>>> On Tue, Aug 20, 2024 at 4:36 PM Russell Spitzer < >>>>> russell.spit...@gmail.com> wrote: >>>>> >>>>>> I think this would be a great move to encourage all sorts of engines >>>>>> and table formats to take advantage of variant type and make sure it >>>>>> remains compatible between all those systems. >>>>>> >>>>>> I strongly support this, >>>>>> Russ >>>>>> >>>>>> On Tue, Aug 20, 2024 at 8:06 AM Fokko Driesprong <fo...@apache.org> >>>>>> wrote: >>>>>> >>>>>>> Hey everyone, >>>>>>> >>>>>>> I agree the Parquet project is a good place to host and evolve the >>>>>>> spec (we could store it in parquet-variant?). We would need to align >>>>>>> this >>>>>>> with the Parquet project. Anyway, I'm familiar both with Iceberg and >>>>>>> Parquet and happy to help where needed. >>>>>>> >>>>>>> Kind regards, >>>>>>> Fokko >>>>>>> >>>>>>> >>>>>>> Op ma 19 aug 2024 om 16:36 schreef Reynold Xin >>>>>>> <r...@databricks.com.invalid>: >>>>>>> >>>>>>>> As I said on dev@iceberg, it'd be really unfortunate if we end up >>>>>>>> with two or even more diverging specs for storing variants. It just >>>>>>>> adds >>>>>>>> more work for everybody to interop. Parquet would be a great home for >>>>>>>> this >>>>>>>> spec as a neutral project that almost all the other important projects >>>>>>>> in >>>>>>>> this space depend on as the de facto standard for physical data >>>>>>>> encoding >>>>>>>> and storage. So if we can collaborate with the Parquet community and >>>>>>>> get >>>>>>>> this into Parquet to avoid each project building its own spec, that'd >>>>>>>> be >>>>>>>> amazing. >>>>>>>> >>>>>>>> >>>>>>>> >>>>>>>> >>>>>>>> On Sat, Aug 17, 2024 at 2:56 AM Gene Pang <gene.p...@gmail.com> >>>>>>>> wrote: >>>>>>>> >>>>>>>>> Hi all, >>>>>>>>> >>>>>>>>> I am one of the main developers implementing Variant in Spark. The >>>>>>>>> specification and all the code are currently merged into the >>>>>>>>> common/variant >>>>>>>>> <https://github.com/apache/spark/tree/master/common/variant> >>>>>>>>> package in the Spark repo. >>>>>>>>> >>>>>>>>> There has been growing interest from other projects (such as >>>>>>>>> Iceberg) in supporting Variant, and we think that moving the Variant >>>>>>>>> spec >>>>>>>>> and implementation out to a new home might be the best way for all the >>>>>>>>> different projects to be able to use and collaborate on Variant. We >>>>>>>>> originally put all the Variant code under common/variant with the >>>>>>>>> expectation that eventually it would be moved elsewhere. >>>>>>>>> >>>>>>>>> We are proposing that we move the Variant spec and implementation >>>>>>>>> out of the Spark project, to the Parquet project. Spark depends >>>>>>>>> heavily on >>>>>>>>> Parquet, and the Variant spec contains a lot of details on the >>>>>>>>> physical >>>>>>>>> storage layer, such as shredding. The Parquet project would be a great >>>>>>>>> place to standardize the Variant data type, and to enable >>>>>>>>> interoperability >>>>>>>>> across many different projects. However, even when we move Variant >>>>>>>>> out, we >>>>>>>>> expect to retain the compatibility with the current Spark >>>>>>>>> implementation. >>>>>>>>> >>>>>>>>> What do people think? There are probably many details we still >>>>>>>>> need to figure out in terms of moving the implementation, but at a >>>>>>>>> high-level, does it make sense to move Variant to Parquet? >>>>>>>>> >>>>>>>>> I appreciate your feedback! >>>>>>>>> >>>>>>>>> Thanks, >>>>>>>>> Gene >>>>>>>>> >>>>>>>>