Parquet is the right project for standardizing the new data type Variant

huaxin gao <huaxin.ga...@gmail.com> 于2024年8月21日周三 15:26写道:

> +1 for moving variant type to Parquet, as it promotes standardization and
> interoperability across numerous projects.
>
> Huaxin
>
> On Wed, Aug 21, 2024 at 1:28 PM Yufei Gu <flyrain...@gmail.com> wrote:
>
>> Agreed that Parquet would be a good place to host the new type. Different
>> table formats, like Iceberg and Delta can benefit from it as they have
>> based on parquet already.
>>
>> Yufei
>>
>>
>> On Wed, Aug 21, 2024 at 12:15 AM Alkis Evlogimenos
>> <alkis.evlogime...@databricks.com.invalid> wrote:
>>
>>> +1
>>>
>>> In addition to everything said above, it is also a great opportunity for
>>> wider testing and possibly tweaking the spec before it takes off post
>>> standardization.
>>>
>>> On Tue, Aug 20, 2024 at 4:36 PM Russell Spitzer <
>>> russell.spit...@gmail.com> wrote:
>>>
>>>> I think this would be a great move to encourage all sorts of engines
>>>> and table formats to take advantage of variant type and make sure it
>>>> remains compatible between all those systems.
>>>>
>>>> I strongly support this,
>>>> Russ
>>>>
>>>> On Tue, Aug 20, 2024 at 8:06 AM Fokko Driesprong <fo...@apache.org>
>>>> wrote:
>>>>
>>>>> Hey everyone,
>>>>>
>>>>> I agree the Parquet project is a good place to host and evolve the
>>>>> spec (we could store it in parquet-variant?). We would need to align this
>>>>> with the Parquet project. Anyway, I'm familiar both with Iceberg and
>>>>> Parquet and happy to help where needed.
>>>>>
>>>>> Kind regards,
>>>>> Fokko
>>>>>
>>>>>
>>>>> Op ma 19 aug 2024 om 16:36 schreef Reynold Xin
>>>>> <r...@databricks.com.invalid>:
>>>>>
>>>>>> As I said on dev@iceberg, it'd be really unfortunate if we end up
>>>>>> with two or even more diverging specs for storing variants. It just adds
>>>>>> more work for everybody to interop. Parquet would be a great home for 
>>>>>> this
>>>>>> spec as a neutral project that almost all the other important projects in
>>>>>> this space depend on as the de facto standard for physical data encoding
>>>>>> and storage. So if we can collaborate with the Parquet community and get
>>>>>> this into Parquet to avoid each project building its own spec, that'd be
>>>>>> amazing.
>>>>>>
>>>>>>
>>>>>>
>>>>>>
>>>>>> On Sat, Aug 17, 2024 at 2:56 AM Gene Pang <gene.p...@gmail.com>
>>>>>> wrote:
>>>>>>
>>>>>>> Hi all,
>>>>>>>
>>>>>>> I am one of the main developers implementing Variant in Spark. The
>>>>>>> specification and all the code are currently merged into the
>>>>>>> common/variant
>>>>>>> <https://github.com/apache/spark/tree/master/common/variant>
>>>>>>> package in the Spark repo.
>>>>>>>
>>>>>>> There has been growing interest from other projects (such as
>>>>>>> Iceberg) in supporting Variant, and we think that moving the Variant 
>>>>>>> spec
>>>>>>> and implementation out to a new home might be the best way for all the
>>>>>>> different projects to be able to use and collaborate on Variant. We
>>>>>>> originally put all the Variant code under common/variant with the
>>>>>>> expectation that eventually it would be moved elsewhere.
>>>>>>>
>>>>>>> We are proposing that we move the Variant spec and implementation
>>>>>>> out of the Spark project, to the Parquet project. Spark depends heavily 
>>>>>>> on
>>>>>>> Parquet, and the Variant spec contains a lot of details on the physical
>>>>>>> storage layer, such as shredding. The Parquet project would be a great
>>>>>>> place to standardize the Variant data type, and to enable 
>>>>>>> interoperability
>>>>>>> across many different projects. However, even when we move Variant out, 
>>>>>>> we
>>>>>>> expect to retain the compatibility with the current Spark 
>>>>>>> implementation.
>>>>>>>
>>>>>>> What do people think? There are probably many details we still need
>>>>>>> to figure out in terms of moving the implementation, but at a 
>>>>>>> high-level,
>>>>>>> does it make sense to move Variant to Parquet?
>>>>>>>
>>>>>>> I appreciate your feedback!
>>>>>>>
>>>>>>> Thanks,
>>>>>>> Gene
>>>>>>>
>>>>>>

Reply via email to