Any specific reason spark does not support or community doesn't want to go
to Parquet V2 , which is more optimized and read and write is too much
faster (form other component which I am using)

On Mon, Apr 15, 2024 at 7:55 PM Ryan Blue <b...@tabular.io> wrote:

> Spark will read data written with v2 encodings just fine. You just don't
> need to worry about making Spark produce v2. And you should probably also
> not produce v2 encodings from other systems.
>
> On Mon, Apr 15, 2024 at 4:37 PM Prem Sahoo <prem.re...@gmail.com> wrote:
>
>> oops but so spark does not support parquet V2  atm ?, as We have a use
>> case where we need parquet V2 as  one of our components uses Parquet V2 .
>>
>> On Mon, Apr 15, 2024 at 7:09 PM Ryan Blue <b...@tabular.io> wrote:
>>
>>> Hi Prem,
>>>
>>> Parquet v1 is the default because v2 has not been finalized and adopted
>>> by the community. I highly recommend not using v2 encodings at this time.
>>>
>>> Ryan
>>>
>>> On Mon, Apr 15, 2024 at 3:05 PM Prem Sahoo <prem.re...@gmail.com> wrote:
>>>
>>>> I am using spark 3.2.0 . but my spark package comes with parquet-mr
>>>> 1.2.1 which writes in parquet version 1 not version version 2:(. so I was
>>>> looking how to write in Parquet version2 ?
>>>>
>>>> On Mon, Apr 15, 2024 at 5:05 PM Mich Talebzadeh <
>>>> mich.talebza...@gmail.com> wrote:
>>>>
>>>>> Sorry you have a point there. It was released in version 3.00. What
>>>>> version of spark are you using?
>>>>>
>>>>> Technologist | Solutions Architect | Data Engineer  | Generative AI
>>>>> London
>>>>> United Kingdom
>>>>>
>>>>>
>>>>>    view my Linkedin profile
>>>>> <https://www.linkedin.com/in/mich-talebzadeh-ph-d-5205b2/>
>>>>>
>>>>>
>>>>>  https://en.everybodywiki.com/Mich_Talebzadeh
>>>>>
>>>>>
>>>>>
>>>>> *Disclaimer:* The information provided is correct to the best of my
>>>>> knowledge but of course cannot be guaranteed . It is essential to note
>>>>> that, as with any advice, quote "one test result is worth one-thousand
>>>>> expert opinions (Werner
>>>>> <https://en.wikipedia.org/wiki/Wernher_von_Braun>Von Braun
>>>>> <https://en.wikipedia.org/wiki/Wernher_von_Braun>)".
>>>>>
>>>>>
>>>>> On Mon, 15 Apr 2024 at 21:33, Prem Sahoo <prem.re...@gmail.com> wrote:
>>>>>
>>>>>> Thank you so much for the info! But do we have any release notes
>>>>>> where it says spark2.4.0 onwards supports parquet version 2. I was under
>>>>>> the impression Spark3.0 onwards it started supporting .
>>>>>>
>>>>>>
>>>>>>
>>>>>>
>>>>>> On Mon, Apr 15, 2024 at 4:28 PM Mich Talebzadeh <
>>>>>> mich.talebza...@gmail.com> wrote:
>>>>>>
>>>>>>> Well if I am correct, Parquet version 2 support was introduced in
>>>>>>> Spark version 2.4.0. Therefore, any version of Spark starting from 2.4.0
>>>>>>> supports Parquet version 2. Assuming that you are using Spark version
>>>>>>> 2.4.0 or later, you should be able to take advantage of Parquet version 
>>>>>>> 2
>>>>>>> features.
>>>>>>>
>>>>>>> HTH
>>>>>>>
>>>>>>> Mich Talebzadeh,
>>>>>>> Technologist | Solutions Architect | Data Engineer  | Generative AI
>>>>>>> London
>>>>>>> United Kingdom
>>>>>>>
>>>>>>>
>>>>>>>    view my Linkedin profile
>>>>>>> <https://www.linkedin.com/in/mich-talebzadeh-ph-d-5205b2/>
>>>>>>>
>>>>>>>
>>>>>>>  https://en.everybodywiki.com/Mich_Talebzadeh
>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>> *Disclaimer:* The information provided is correct to the best of my
>>>>>>> knowledge but of course cannot be guaranteed . It is essential to note
>>>>>>> that, as with any advice, quote "one test result is worth one-thousand
>>>>>>> expert opinions (Werner
>>>>>>> <https://en.wikipedia.org/wiki/Wernher_von_Braun>Von Braun
>>>>>>> <https://en.wikipedia.org/wiki/Wernher_von_Braun>)".
>>>>>>>
>>>>>>>
>>>>>>> On Mon, 15 Apr 2024 at 20:53, Prem Sahoo <prem.re...@gmail.com>
>>>>>>> wrote:
>>>>>>>
>>>>>>>> Thank you for the information!
>>>>>>>> I can use any version of parquet-mr to produce parquet file.
>>>>>>>>
>>>>>>>> regarding 2nd question .
>>>>>>>> Which version of spark is supporting parquet version 2?
>>>>>>>> May I get the release notes where parquet versions are mentioned ?
>>>>>>>>
>>>>>>>>
>>>>>>>> On Mon, Apr 15, 2024 at 2:34 PM Mich Talebzadeh <
>>>>>>>> mich.talebza...@gmail.com> wrote:
>>>>>>>>
>>>>>>>>> Parquet-mr is a Java library that provides functionality for
>>>>>>>>> working with Parquet files with hadoop. It is therefore  more geared
>>>>>>>>> towards working with Parquet files within the Hadoop ecosystem,
>>>>>>>>> particularly using MapReduce jobs. There is no definitive way to check
>>>>>>>>> exact compatible versions within the library itself. However, you can 
>>>>>>>>> have
>>>>>>>>> a look at this
>>>>>>>>>
>>>>>>>>> https://github.com/apache/parquet-mr/blob/master/CHANGES.md
>>>>>>>>>
>>>>>>>>> HTH
>>>>>>>>>
>>>>>>>>> Mich Talebzadeh,
>>>>>>>>> Technologist | Solutions Architect | Data Engineer  | Generative AI
>>>>>>>>> London
>>>>>>>>> United Kingdom
>>>>>>>>>
>>>>>>>>>
>>>>>>>>>    view my Linkedin profile
>>>>>>>>> <https://www.linkedin.com/in/mich-talebzadeh-ph-d-5205b2/>
>>>>>>>>>
>>>>>>>>>
>>>>>>>>>  https://en.everybodywiki.com/Mich_Talebzadeh
>>>>>>>>>
>>>>>>>>>
>>>>>>>>>
>>>>>>>>> *Disclaimer:* The information provided is correct to the best of
>>>>>>>>> my knowledge but of course cannot be guaranteed . It is essential to 
>>>>>>>>> note
>>>>>>>>> that, as with any advice, quote "one test result is worth one-thousand
>>>>>>>>> expert opinions (Werner
>>>>>>>>> <https://en.wikipedia.org/wiki/Wernher_von_Braun>Von Braun
>>>>>>>>> <https://en.wikipedia.org/wiki/Wernher_von_Braun>)".
>>>>>>>>>
>>>>>>>>>
>>>>>>>>> On Mon, 15 Apr 2024 at 18:59, Prem Sahoo <prem.re...@gmail.com>
>>>>>>>>> wrote:
>>>>>>>>>
>>>>>>>>>> Hello Team,
>>>>>>>>>> May I know how to check which version of parquet is supported by
>>>>>>>>>> parquet-mr 1.2.1 ?
>>>>>>>>>>
>>>>>>>>>> Which version of parquet-mr is supporting parquet version 2 (V2) ?
>>>>>>>>>>
>>>>>>>>>> Which version of spark is supporting parquet version 2?
>>>>>>>>>> May I get the release notes where parquet versions are mentioned ?
>>>>>>>>>>
>>>>>>>>>
>>>
>>> --
>>> Ryan Blue
>>> Tabular
>>>
>>
>
> --
> Ryan Blue
> Tabular
>

Reply via email to