Hi all,

I am one of the main developers implementing Variant in Spark. The
specification and all the code are currently merged into the common/variant
<https://github.com/apache/spark/tree/master/common/variant> package in the
Spark repo.

There has been growing interest from other projects (such as Iceberg) in
supporting Variant, and we think that moving the Variant spec and
implementation out to a new home might be the best way for all the
different projects to be able to use and collaborate on Variant. We
originally put all the Variant code under common/variant with the
expectation that eventually it would be moved elsewhere.

We are proposing that we move the Variant spec and implementation out of
the Spark project, to the Parquet project. Spark depends heavily on
Parquet, and the Variant spec contains a lot of details on the physical
storage layer, such as shredding. The Parquet project would be a great
place to standardize the Variant data type, and to enable interoperability
across many different projects. However, even when we move Variant out, we
expect to retain the compatibility with the current Spark implementation.

What do people think? There are probably many details we still need to
figure out in terms of moving the implementation, but at a high-level, does
it make sense to move Variant to Parquet?

I appreciate your feedback!

Thanks,
Gene

Reply via email to