> *As per Apache Parquet Community Parquet V2 is not final yet so it is not > official . They are advising not to use Parquet V2 for writing (though code > is available ) .*
This would be news to me. Parquet releases are listed (by the parquet community) at [1] The vote to release parquet 2.10 is here: [2] Neither of these links mention anything about this being an experimental, unofficial, or non-finalized release. I understand your concern. I believe your quotes are coming from your discussion on the parquet mailing list here [3]. This communication is unfortunate and confusing to me as well. [1] https://parquet.apache.org/blog/ [2] https://lists.apache.org/thread/fdf1zz0f3xzz5zpvo6c811xjswhm1zy6 [3] https://lists.apache.org/thread/4nzroc68czwxnp0ndqz15kp1vhcd7vg3 On Wed, Apr 24, 2024 at 5:10 AM Prem Sahoo <prem.re...@gmail.com> wrote: > Hello Jacob, > Thanks for the information, and my apologies for the weird format of my > email. > > This is the email from the Parquet community. May I know why pyarrow is > using Parquet V2 which is not official yet ? > > My question is from Parquet community V2 is not final yet so it is not > official yet. > "Hi Prem - Maybe I can help clarify to the best of my knowledge. Parquet V2 > as a standard isn't finalized just yet. Meaning there is no formal, > *finalized* "contract" that specifies what it means to write data in the V2 > version. The discussions/conversations about what the final V2 standard may > be are still in progress and are evolving. > > That being said, because V2 code does exist (though unfinalized), there are > clients / tools that are writing data in the un-finalized V2 format, as > seems to be the case with Dremio. > > Now, as that comment you quoted said, you can have Spark write V2 files, > but it's worth being mindful about the fact that V2 is a moving target and > can (and likely will) change. You can overwrite parquet.writer.version to > specify your desired version, but it can be dangerous to produce data in a > moving-target format. For example, let's say you write a bunch of data in > Parquet V2, and then the community decides to make a breaking change (which > is completely fine / allowed since V2 isn't finalized). You are now left > having to deal with a potentially large and complicated file format update. > That's why it's not recommended to write files in parquet v2 just yet." > > > *As per Apache Parquet Community Parquet V2 is not final yet so it is not > official . They are advising not to use Parquet V2 for writing (though code > is available ) .* > > > *As per above Spark hasn't started using Parquet V2 for writing *. > > May I know how an unstable /unofficial version is being used in pyarrow ? > > > On Wed, Apr 24, 2024 at 12:43 AM Jacob Wujciak <assignu...@apache.org> > wrote: > > > Hello, > > > > First off, please try to clean up formating of emails to be legible when > > forwarding/quoting previous messages multiple times, especially when most > > of the quotes do not contain any useful information. It makes it much > > easier to parse the message and thus quicker to answer. > > > > The short answer is that we switched to 2.4 and more recently to 2.6 as > > the default to enable the usage of features these versions provide. As > you > > have correctly quoted from the docs you can still write 1.0 if you want > to > > ensure compatibility with systems that can not process the 'newer' > versions > > yet (2.6 was released in 2018!). > > > > You can find the long form discussions about these changes here: > > https://issues.apache.org/jira/browse/ARROW-12203 > > https://lists.apache.org/thread/027g366yr3m03hwtpst6sr58b3trwhsm > > > > Best > > Jacob > > > > On 2024/04/24 02:32:01 Prem Sahoo wrote: > > > Hello Team, > > > Could you please share your thoughts about below questions? > > > Sent from my iPhone > > > > > > Begin forwarded message: > > > > > > > From: Prem Sahoo <prem.re...@gmail.com> > > > > Date: April 23, 2024 at 11:03:48 AM EDT > > > > To: dev-ow...@arrow.apache.org > > > > Subject: Re: PyArrow Using Parquet V2 > > > > > > > > dev@arrow.apache.org > > > > Sent from my iPhone > > > > > > > >>> On Apr 23, 2024, at 6:25 AM, Prem Sahoo <prem.re...@gmail.com> > > wrote: > > > >>> > > > >> Hello Team, > > > >> Could anyone please help me on below query? > > > >> Sent from my iPhone > > > >> > > > >>>> On Apr 22, 2024, at 10:01 PM, Prem Sahoo <prem.re...@gmail.com> > > wrote: > > > >>>> > > > >>> > > > >>> Sent from my iPhone > > > >>> > > > >>>>> On Apr 22, 2024, at 9:51 PM, Prem Sahoo <prem.re...@gmail.com> > > wrote: > > > >>>>> > > > >>>> > > > >>>> > > > >>>>> > > > >>>>> > > > >>>>> Hello Team, > > > >>>>> I have a question regarding Parquet V2 writing thro pyarrow . > > > >>>>> As per below Pyarrow started writing Parquet in V2 encoding. > > > >>>>> > > > https://arrow.apache.org/docs/python/generated/pyarrow.parquet.write_table.html#pyarrow.parquet.write_table > > > >>>>> > > > >>>>> version{“1.0”, “2.4”, “2.6”}, default “2.6” > > > >>>>> Determine which Parquet logical types are available for use, > > whether the reduced set from the Parquet 1.x.x format or the expanded > > logical types added in later format versions. Files written with > > version=’2.4’ or ‘2.6’ may not be readable in all Parquet > implementations, > > so version=’1.0’ is likely the choice that maximizes file compatibility. > > UINT32 and some logical types are only available with version ‘2.4’. > > Nanosecond timestamps are only available with version ‘2.6’. Other > features > > such as compression algorithms or the new serialized data page format > must > > be enabled separately (see ‘compression’ and ‘data_page_version’). > > > >>>>> > > > >>>>> > > > >>>>> As per Apache Parquet Community Parquet V2 is not final yet so it > > is not official . They are advising not to use Parquet V2 for writing > > (though code is available ) . > > > >>>>> > > > >>>>> As per above Spark hasn't started using Parquet V2 for writing . > > > >>>>> May I know how an unstable /unofficial version is being used in > > pyarrow ? > > > >>>>> > > > > > >