Hey everyone, I agree the Parquet project is a good place to host and evolve the spec (we could store it in parquet-variant?). We would need to align this with the Parquet project. Anyway, I'm familiar both with Iceberg and Parquet and happy to help where needed.
Kind regards, Fokko Op ma 19 aug 2024 om 16:36 schreef Reynold Xin <r...@databricks.com.invalid >: > As I said on dev@iceberg, it'd be really unfortunate if we end up with > two or even more diverging specs for storing variants. It just adds more > work for everybody to interop. Parquet would be a great home for this spec > as a neutral project that almost all the other important projects in this > space depend on as the de facto standard for physical data encoding and > storage. So if we can collaborate with the Parquet community and get this > into Parquet to avoid each project building its own spec, that'd be amazing. > > > > > On Sat, Aug 17, 2024 at 2:56 AM Gene Pang <gene.p...@gmail.com> wrote: > >> Hi all, >> >> I am one of the main developers implementing Variant in Spark. The >> specification and all the code are currently merged into the >> common/variant >> <https://github.com/apache/spark/tree/master/common/variant> package in >> the Spark repo. >> >> There has been growing interest from other projects (such as Iceberg) in >> supporting Variant, and we think that moving the Variant spec and >> implementation out to a new home might be the best way for all the >> different projects to be able to use and collaborate on Variant. We >> originally put all the Variant code under common/variant with the >> expectation that eventually it would be moved elsewhere. >> >> We are proposing that we move the Variant spec and implementation out of >> the Spark project, to the Parquet project. Spark depends heavily on >> Parquet, and the Variant spec contains a lot of details on the physical >> storage layer, such as shredding. The Parquet project would be a great >> place to standardize the Variant data type, and to enable interoperability >> across many different projects. However, even when we move Variant out, we >> expect to retain the compatibility with the current Spark implementation. >> >> What do people think? There are probably many details we still need to >> figure out in terms of moving the implementation, but at a high-level, does >> it make sense to move Variant to Parquet? >> >> I appreciate your feedback! >> >> Thanks, >> Gene >> >