+1 In addition to everything said above, it is also a great opportunity for wider testing and possibly tweaking the spec before it takes off post standardization.
On Tue, Aug 20, 2024 at 4:36 PM Russell Spitzer <russell.spit...@gmail.com> wrote: > I think this would be a great move to encourage all sorts of engines and > table formats to take advantage of variant type and make sure it remains > compatible between all those systems. > > I strongly support this, > Russ > > On Tue, Aug 20, 2024 at 8:06 AM Fokko Driesprong <fo...@apache.org> wrote: > >> Hey everyone, >> >> I agree the Parquet project is a good place to host and evolve the spec >> (we could store it in parquet-variant?). We would need to align this with >> the Parquet project. Anyway, I'm familiar both with Iceberg and Parquet and >> happy to help where needed. >> >> Kind regards, >> Fokko >> >> >> Op ma 19 aug 2024 om 16:36 schreef Reynold Xin >> <r...@databricks.com.invalid>: >> >>> As I said on dev@iceberg, it'd be really unfortunate if we end up with >>> two or even more diverging specs for storing variants. It just adds more >>> work for everybody to interop. Parquet would be a great home for this spec >>> as a neutral project that almost all the other important projects in this >>> space depend on as the de facto standard for physical data encoding and >>> storage. So if we can collaborate with the Parquet community and get this >>> into Parquet to avoid each project building its own spec, that'd be amazing. >>> >>> >>> >>> >>> On Sat, Aug 17, 2024 at 2:56 AM Gene Pang <gene.p...@gmail.com> wrote: >>> >>>> Hi all, >>>> >>>> I am one of the main developers implementing Variant in Spark. The >>>> specification and all the code are currently merged into the >>>> common/variant >>>> <https://github.com/apache/spark/tree/master/common/variant> package >>>> in the Spark repo. >>>> >>>> There has been growing interest from other projects (such as Iceberg) >>>> in supporting Variant, and we think that moving the Variant spec and >>>> implementation out to a new home might be the best way for all the >>>> different projects to be able to use and collaborate on Variant. We >>>> originally put all the Variant code under common/variant with the >>>> expectation that eventually it would be moved elsewhere. >>>> >>>> We are proposing that we move the Variant spec and implementation out >>>> of the Spark project, to the Parquet project. Spark depends heavily on >>>> Parquet, and the Variant spec contains a lot of details on the physical >>>> storage layer, such as shredding. The Parquet project would be a great >>>> place to standardize the Variant data type, and to enable interoperability >>>> across many different projects. However, even when we move Variant out, we >>>> expect to retain the compatibility with the current Spark implementation. >>>> >>>> What do people think? There are probably many details we still need to >>>> figure out in terms of moving the implementation, but at a high-level, does >>>> it make sense to move Variant to Parquet? >>>> >>>> I appreciate your feedback! >>>> >>>> Thanks, >>>> Gene >>>> >>>