Re: [DISCUSS] Move Variant to Parquet?

Alkis Evlogimenos Wed, 21 Aug 2024 00:15:08 -0700

+1

In addition to everything said above, it is also a great opportunity for
wider testing and possibly tweaking the spec before it takes off post
standardization.


On Tue, Aug 20, 2024 at 4:36 PM Russell Spitzer <[email protected]>
wrote:

> I think this would be a great move to encourage all sorts of engines and
> table formats to take advantage of variant type and make sure it remains
> compatible between all those systems.
>
> I strongly support this,
> Russ
>
> On Tue, Aug 20, 2024 at 8:06 AM Fokko Driesprong <[email protected]> wrote:
>
>> Hey everyone,
>>
>> I agree the Parquet project is a good place to host and evolve the spec
>> (we could store it in parquet-variant?). We would need to align this with
>> the Parquet project. Anyway, I'm familiar both with Iceberg and Parquet and
>> happy to help where needed.
>>
>> Kind regards,
>> Fokko
>>
>>
>> Op ma 19 aug 2024 om 16:36 schreef Reynold Xin
>> <[email protected]>:
>>
>>> As I said on dev@iceberg, it'd be really unfortunate if we end up with
>>> two or even more diverging specs for storing variants. It just adds more
>>> work for everybody to interop. Parquet would be a great home for this spec
>>> as a neutral project that almost all the other important projects in this
>>> space depend on as the de facto standard for physical data encoding and
>>> storage. So if we can collaborate with the Parquet community and get this
>>> into Parquet to avoid each project building its own spec, that'd be amazing.
>>>
>>>
>>>
>>>
>>> On Sat, Aug 17, 2024 at 2:56 AM Gene Pang <[email protected]> wrote:
>>>
>>>> Hi all,
>>>>
>>>> I am one of the main developers implementing Variant in Spark. The
>>>> specification and all the code are currently merged into the
>>>> common/variant
>>>> <https://github.com/apache/spark/tree/master/common/variant> package
>>>> in the Spark repo.
>>>>
>>>> There has been growing interest from other projects (such as Iceberg)
>>>> in supporting Variant, and we think that moving the Variant spec and
>>>> implementation out to a new home might be the best way for all the
>>>> different projects to be able to use and collaborate on Variant. We
>>>> originally put all the Variant code under common/variant with the
>>>> expectation that eventually it would be moved elsewhere.
>>>>
>>>> We are proposing that we move the Variant spec and implementation out
>>>> of the Spark project, to the Parquet project. Spark depends heavily on
>>>> Parquet, and the Variant spec contains a lot of details on the physical
>>>> storage layer, such as shredding. The Parquet project would be a great
>>>> place to standardize the Variant data type, and to enable interoperability
>>>> across many different projects. However, even when we move Variant out, we
>>>> expect to retain the compatibility with the current Spark implementation.
>>>>
>>>> What do people think? There are probably many details we still need to
>>>> figure out in terms of moving the implementation, but at a high-level, does
>>>> it make sense to move Variant to Parquet?
>>>>
>>>> I appreciate your feedback!
>>>>
>>>> Thanks,
>>>> Gene
>>>>
>>>

Re: [DISCUSS] Move Variant to Parquet?

Reply via email to