Any specific reason spark does not support or community doesn't want to go to Parquet V2 , which is more optimized and read and write is too much faster (form other component which I am using)
On Mon, Apr 15, 2024 at 7:55 PM Ryan Blue <b...@tabular.io> wrote: > Spark will read data written with v2 encodings just fine. You just don't > need to worry about making Spark produce v2. And you should probably also > not produce v2 encodings from other systems. > > On Mon, Apr 15, 2024 at 4:37 PM Prem Sahoo <prem.re...@gmail.com> wrote: > >> oops but so spark does not support parquet V2 atm ?, as We have a use >> case where we need parquet V2 as one of our components uses Parquet V2 . >> >> On Mon, Apr 15, 2024 at 7:09 PM Ryan Blue <b...@tabular.io> wrote: >> >>> Hi Prem, >>> >>> Parquet v1 is the default because v2 has not been finalized and adopted >>> by the community. I highly recommend not using v2 encodings at this time. >>> >>> Ryan >>> >>> On Mon, Apr 15, 2024 at 3:05 PM Prem Sahoo <prem.re...@gmail.com> wrote: >>> >>>> I am using spark 3.2.0 . but my spark package comes with parquet-mr >>>> 1.2.1 which writes in parquet version 1 not version version 2:(. so I was >>>> looking how to write in Parquet version2 ? >>>> >>>> On Mon, Apr 15, 2024 at 5:05 PM Mich Talebzadeh < >>>> mich.talebza...@gmail.com> wrote: >>>> >>>>> Sorry you have a point there. It was released in version 3.00. What >>>>> version of spark are you using? >>>>> >>>>> Technologist | Solutions Architect | Data Engineer | Generative AI >>>>> London >>>>> United Kingdom >>>>> >>>>> >>>>> view my Linkedin profile >>>>> <https://www.linkedin.com/in/mich-talebzadeh-ph-d-5205b2/> >>>>> >>>>> >>>>> https://en.everybodywiki.com/Mich_Talebzadeh >>>>> >>>>> >>>>> >>>>> *Disclaimer:* The information provided is correct to the best of my >>>>> knowledge but of course cannot be guaranteed . It is essential to note >>>>> that, as with any advice, quote "one test result is worth one-thousand >>>>> expert opinions (Werner >>>>> <https://en.wikipedia.org/wiki/Wernher_von_Braun>Von Braun >>>>> <https://en.wikipedia.org/wiki/Wernher_von_Braun>)". >>>>> >>>>> >>>>> On Mon, 15 Apr 2024 at 21:33, Prem Sahoo <prem.re...@gmail.com> wrote: >>>>> >>>>>> Thank you so much for the info! But do we have any release notes >>>>>> where it says spark2.4.0 onwards supports parquet version 2. I was under >>>>>> the impression Spark3.0 onwards it started supporting . >>>>>> >>>>>> >>>>>> >>>>>> >>>>>> On Mon, Apr 15, 2024 at 4:28 PM Mich Talebzadeh < >>>>>> mich.talebza...@gmail.com> wrote: >>>>>> >>>>>>> Well if I am correct, Parquet version 2 support was introduced in >>>>>>> Spark version 2.4.0. Therefore, any version of Spark starting from 2.4.0 >>>>>>> supports Parquet version 2. Assuming that you are using Spark version >>>>>>> 2.4.0 or later, you should be able to take advantage of Parquet version >>>>>>> 2 >>>>>>> features. >>>>>>> >>>>>>> HTH >>>>>>> >>>>>>> Mich Talebzadeh, >>>>>>> Technologist | Solutions Architect | Data Engineer | Generative AI >>>>>>> London >>>>>>> United Kingdom >>>>>>> >>>>>>> >>>>>>> view my Linkedin profile >>>>>>> <https://www.linkedin.com/in/mich-talebzadeh-ph-d-5205b2/> >>>>>>> >>>>>>> >>>>>>> https://en.everybodywiki.com/Mich_Talebzadeh >>>>>>> >>>>>>> >>>>>>> >>>>>>> *Disclaimer:* The information provided is correct to the best of my >>>>>>> knowledge but of course cannot be guaranteed . It is essential to note >>>>>>> that, as with any advice, quote "one test result is worth one-thousand >>>>>>> expert opinions (Werner >>>>>>> <https://en.wikipedia.org/wiki/Wernher_von_Braun>Von Braun >>>>>>> <https://en.wikipedia.org/wiki/Wernher_von_Braun>)". >>>>>>> >>>>>>> >>>>>>> On Mon, 15 Apr 2024 at 20:53, Prem Sahoo <prem.re...@gmail.com> >>>>>>> wrote: >>>>>>> >>>>>>>> Thank you for the information! >>>>>>>> I can use any version of parquet-mr to produce parquet file. >>>>>>>> >>>>>>>> regarding 2nd question . >>>>>>>> Which version of spark is supporting parquet version 2? >>>>>>>> May I get the release notes where parquet versions are mentioned ? >>>>>>>> >>>>>>>> >>>>>>>> On Mon, Apr 15, 2024 at 2:34 PM Mich Talebzadeh < >>>>>>>> mich.talebza...@gmail.com> wrote: >>>>>>>> >>>>>>>>> Parquet-mr is a Java library that provides functionality for >>>>>>>>> working with Parquet files with hadoop. It is therefore more geared >>>>>>>>> towards working with Parquet files within the Hadoop ecosystem, >>>>>>>>> particularly using MapReduce jobs. There is no definitive way to check >>>>>>>>> exact compatible versions within the library itself. However, you can >>>>>>>>> have >>>>>>>>> a look at this >>>>>>>>> >>>>>>>>> https://github.com/apache/parquet-mr/blob/master/CHANGES.md >>>>>>>>> >>>>>>>>> HTH >>>>>>>>> >>>>>>>>> Mich Talebzadeh, >>>>>>>>> Technologist | Solutions Architect | Data Engineer | Generative AI >>>>>>>>> London >>>>>>>>> United Kingdom >>>>>>>>> >>>>>>>>> >>>>>>>>> view my Linkedin profile >>>>>>>>> <https://www.linkedin.com/in/mich-talebzadeh-ph-d-5205b2/> >>>>>>>>> >>>>>>>>> >>>>>>>>> https://en.everybodywiki.com/Mich_Talebzadeh >>>>>>>>> >>>>>>>>> >>>>>>>>> >>>>>>>>> *Disclaimer:* The information provided is correct to the best of >>>>>>>>> my knowledge but of course cannot be guaranteed . It is essential to >>>>>>>>> note >>>>>>>>> that, as with any advice, quote "one test result is worth one-thousand >>>>>>>>> expert opinions (Werner >>>>>>>>> <https://en.wikipedia.org/wiki/Wernher_von_Braun>Von Braun >>>>>>>>> <https://en.wikipedia.org/wiki/Wernher_von_Braun>)". >>>>>>>>> >>>>>>>>> >>>>>>>>> On Mon, 15 Apr 2024 at 18:59, Prem Sahoo <prem.re...@gmail.com> >>>>>>>>> wrote: >>>>>>>>> >>>>>>>>>> Hello Team, >>>>>>>>>> May I know how to check which version of parquet is supported by >>>>>>>>>> parquet-mr 1.2.1 ? >>>>>>>>>> >>>>>>>>>> Which version of parquet-mr is supporting parquet version 2 (V2) ? >>>>>>>>>> >>>>>>>>>> Which version of spark is supporting parquet version 2? >>>>>>>>>> May I get the release notes where parquet versions are mentioned ? >>>>>>>>>> >>>>>>>>> >>> >>> -- >>> Ryan Blue >>> Tabular >>> >> > > -- > Ryan Blue > Tabular >