Re: [DISCUSS] Move Variant to Parquet?

Gang Wu Sun, 25 Aug 2024 23:55:28 -0700

Hi,

There is a relevant discussion in the dev@parquet:
https://lists.apache.org/thread/6h58hj39lhqtcyd2hlsyvqm4lzdh4b9z


The feedback looks promising. Looking forward to cooperating with the Spark
community!

Best regards,
Gang

On Thu, Aug 22, 2024 at 10:20 PM Eduard Tudenhöfner <
[email protected]> wrote:

> +1 on moving this to the Parquet project/community (assuming that the
> Parquet community is ok with this)
>
> On Thu, Aug 22, 2024 at 3:02 AM Chao Sun <[email protected]> wrote:
>
>> +1 too
>>
>> On Wed, Aug 21, 2024 at 4:43 PM huaxin gao <[email protected]>
>> wrote:
>>
>>> +1 for moving variant type to Parquet, as it promotes standardization
>>> and interoperability across numerous projects.
>>>
>>> Huaxin
>>>
>>> On Wed, Aug 21, 2024 at 1:28 PM Yufei Gu <[email protected]> wrote:
>>>
>>>> Agreed that Parquet would be a good place to host the new type.
>>>> Different table formats, like Iceberg and Delta can benefit from it as they
>>>> have based on parquet already.
>>>>
>>>> Yufei
>>>>
>>>>
>>>> On Wed, Aug 21, 2024 at 12:15 AM Alkis Evlogimenos
>>>> <[email protected]> wrote:
>>>>
>>>>> +1
>>>>>
>>>>> In addition to everything said above, it is also a great opportunity
>>>>> for wider testing and possibly tweaking the spec before it takes off post
>>>>> standardization.
>>>>>
>>>>> On Tue, Aug 20, 2024 at 4:36 PM Russell Spitzer <
>>>>> [email protected]> wrote:
>>>>>
>>>>>> I think this would be a great move to encourage all sorts of engines
>>>>>> and table formats to take advantage of variant type and make sure it
>>>>>> remains compatible between all those systems.
>>>>>>
>>>>>> I strongly support this,
>>>>>> Russ
>>>>>>
>>>>>> On Tue, Aug 20, 2024 at 8:06 AM Fokko Driesprong <[email protected]>
>>>>>> wrote:
>>>>>>
>>>>>>> Hey everyone,
>>>>>>>
>>>>>>> I agree the Parquet project is a good place to host and evolve the
>>>>>>> spec (we could store it in parquet-variant?). We would need to align 
>>>>>>> this
>>>>>>> with the Parquet project. Anyway, I'm familiar both with Iceberg and
>>>>>>> Parquet and happy to help where needed.
>>>>>>>
>>>>>>> Kind regards,
>>>>>>> Fokko
>>>>>>>
>>>>>>>
>>>>>>> Op ma 19 aug 2024 om 16:36 schreef Reynold Xin
>>>>>>> <[email protected]>:
>>>>>>>
>>>>>>>> As I said on dev@iceberg, it'd be really unfortunate if we end up
>>>>>>>> with two or even more diverging specs for storing variants. It just 
>>>>>>>> adds
>>>>>>>> more work for everybody to interop. Parquet would be a great home for 
>>>>>>>> this
>>>>>>>> spec as a neutral project that almost all the other important projects 
>>>>>>>> in
>>>>>>>> this space depend on as the de facto standard for physical data 
>>>>>>>> encoding
>>>>>>>> and storage. So if we can collaborate with the Parquet community and 
>>>>>>>> get
>>>>>>>> this into Parquet to avoid each project building its own spec, that'd 
>>>>>>>> be
>>>>>>>> amazing.
>>>>>>>>
>>>>>>>>
>>>>>>>>
>>>>>>>>
>>>>>>>> On Sat, Aug 17, 2024 at 2:56 AM Gene Pang <[email protected]>
>>>>>>>> wrote:
>>>>>>>>
>>>>>>>>> Hi all,
>>>>>>>>>
>>>>>>>>> I am one of the main developers implementing Variant in Spark. The
>>>>>>>>> specification and all the code are currently merged into the
>>>>>>>>> common/variant
>>>>>>>>> <https://github.com/apache/spark/tree/master/common/variant>
>>>>>>>>> package in the Spark repo.
>>>>>>>>>
>>>>>>>>> There has been growing interest from other projects (such as
>>>>>>>>> Iceberg) in supporting Variant, and we think that moving the Variant 
>>>>>>>>> spec
>>>>>>>>> and implementation out to a new home might be the best way for all the
>>>>>>>>> different projects to be able to use and collaborate on Variant. We
>>>>>>>>> originally put all the Variant code under common/variant with the
>>>>>>>>> expectation that eventually it would be moved elsewhere.
>>>>>>>>>
>>>>>>>>> We are proposing that we move the Variant spec and implementation
>>>>>>>>> out of the Spark project, to the Parquet project. Spark depends 
>>>>>>>>> heavily on
>>>>>>>>> Parquet, and the Variant spec contains a lot of details on the 
>>>>>>>>> physical
>>>>>>>>> storage layer, such as shredding. The Parquet project would be a great
>>>>>>>>> place to standardize the Variant data type, and to enable 
>>>>>>>>> interoperability
>>>>>>>>> across many different projects. However, even when we move Variant 
>>>>>>>>> out, we
>>>>>>>>> expect to retain the compatibility with the current Spark 
>>>>>>>>> implementation.
>>>>>>>>>
>>>>>>>>> What do people think? There are probably many details we still
>>>>>>>>> need to figure out in terms of moving the implementation, but at a
>>>>>>>>> high-level, does it make sense to move Variant to Parquet?
>>>>>>>>>
>>>>>>>>> I appreciate your feedback!
>>>>>>>>>
>>>>>>>>> Thanks,
>>>>>>>>> Gene
>>>>>>>>>
>>>>>>>>

Re: [DISCUSS] Move Variant to Parquet?

Reply via email to