> *As per Apache Parquet Community Parquet V2 is not final yet so it is not
> official . They are advising not to use Parquet V2 for writing (though
code
> is available ) .*

This would be news to me.  Parquet releases are listed (by the parquet
community) at [1]

The vote to release parquet 2.10 is here: [2]

Neither of these links mention anything about this being an experimental,
unofficial, or non-finalized release.

I understand your concern.  I believe your quotes are coming from your
discussion on the parquet mailing list here [3].  This communication is
unfortunate and confusing to me as well.

[1] https://parquet.apache.org/blog/
[2] https://lists.apache.org/thread/fdf1zz0f3xzz5zpvo6c811xjswhm1zy6
[3] https://lists.apache.org/thread/4nzroc68czwxnp0ndqz15kp1vhcd7vg3


On Wed, Apr 24, 2024 at 5:10 AM Prem Sahoo <prem.re...@gmail.com> wrote:

> Hello Jacob,
> Thanks for the information, and my apologies for the weird format of my
> email.
>
> This is the email from the Parquet community. May I know why pyarrow is
> using Parquet V2 which is not official yet ?
>
> My question is from Parquet community V2 is not final yet so it is not
> official yet.
> "Hi Prem - Maybe I can help clarify to the best of my knowledge. Parquet V2
> as a standard isn't finalized just yet. Meaning there is no formal,
> *finalized* "contract" that specifies what it means to write data in the V2
> version. The discussions/conversations about what the final V2 standard may
> be are still in progress and are evolving.
>
> That being said, because V2 code does exist (though unfinalized), there are
> clients / tools that are writing data in the un-finalized V2 format, as
> seems to be the case with Dremio.
>
> Now, as that comment you quoted said, you can have Spark write V2 files,
> but it's worth being mindful about the fact that V2 is a moving target and
> can (and likely will) change. You can overwrite parquet.writer.version to
> specify your desired version, but it can be dangerous to produce data in a
> moving-target format. For example, let's say you write a bunch of data in
> Parquet V2, and then the community decides to make a breaking change (which
> is completely fine / allowed since V2 isn't finalized). You are now left
> having to deal with a potentially large and complicated file format update.
> That's why it's not recommended to write files in parquet v2 just yet."
>
>
> *As per Apache Parquet Community Parquet V2 is not final yet so it is not
> official . They are advising not to use Parquet V2 for writing (though code
> is available ) .*
>
>
> *As per above Spark hasn't started using Parquet V2 for writing *.
>
> May I know how an unstable /unofficial  version is being used in pyarrow ?
>
>
> On Wed, Apr 24, 2024 at 12:43 AM Jacob Wujciak <assignu...@apache.org>
> wrote:
>
> > Hello,
> >
> > First off, please try to clean up formating of emails to be legible when
> > forwarding/quoting previous messages multiple times, especially when most
> > of the quotes do not contain any useful information. It makes it much
> > easier to parse the message and thus quicker to answer.
> >
> > The short answer is that we switched to 2.4 and more recently to 2.6 as
> > the default to enable the usage of features these versions provide. As
> you
> > have correctly quoted from the docs you can still write 1.0 if you want
> to
> > ensure compatibility with systems that can not process the 'newer'
> versions
> > yet (2.6 was released in 2018!).
> >
> > You can find the long form discussions about these changes here:
> > https://issues.apache.org/jira/browse/ARROW-12203
> > https://lists.apache.org/thread/027g366yr3m03hwtpst6sr58b3trwhsm
> >
> > Best
> > Jacob
> >
> > On 2024/04/24 02:32:01 Prem Sahoo wrote:
> > > Hello Team,
> > > Could you please share your thoughts about below questions?
> > > Sent from my iPhone
> > >
> > > Begin forwarded message:
> > >
> > > > From: Prem Sahoo <prem.re...@gmail.com>
> > > > Date: April 23, 2024 at 11:03:48 AM EDT
> > > > To: dev-ow...@arrow.apache.org
> > > > Subject: Re: PyArrow Using Parquet V2
> > > >
> > > > dev@arrow.apache.org
> > > > Sent from my iPhone
> > > >
> > > >>> On Apr 23, 2024, at 6:25 AM, Prem Sahoo <prem.re...@gmail.com>
> > wrote:
> > > >>>
> > > >> Hello Team,
> > > >> Could anyone please help me on below query?
> > > >> Sent from my iPhone
> > > >>
> > > >>>> On Apr 22, 2024, at 10:01 PM, Prem Sahoo <prem.re...@gmail.com>
> > wrote:
> > > >>>>
> > > >>> 
> > > >>> Sent from my iPhone
> > > >>>
> > > >>>>> On Apr 22, 2024, at 9:51 PM, Prem Sahoo <prem.re...@gmail.com>
> > wrote:
> > > >>>>>
> > > >>>> 
> > > >>>>
> > > >>>>>
> > > >>>>> 
> > > >>>>> Hello Team,
> > > >>>>> I have a question regarding Parquet V2 writing thro pyarrow .
> > > >>>>> As per below Pyarrow started writing Parquet in V2 encoding.
> > > >>>>>
> >
> https://arrow.apache.org/docs/python/generated/pyarrow.parquet.write_table.html#pyarrow.parquet.write_table
> > > >>>>>
> > > >>>>> version{“1.0”, “2.4”, “2.6”}, default “2.6”
> > > >>>>> Determine which Parquet logical types are available for use,
> > whether the reduced set from the Parquet 1.x.x format or the expanded
> > logical types added in later format versions. Files written with
> > version=’2.4’ or ‘2.6’ may not be readable in all Parquet
> implementations,
> > so version=’1.0’ is likely the choice that maximizes file compatibility.
> > UINT32 and some logical types are only available with version ‘2.4’.
> > Nanosecond timestamps are only available with version ‘2.6’. Other
> features
> > such as compression algorithms or the new serialized data page format
> must
> > be enabled separately (see ‘compression’ and ‘data_page_version’).
> > > >>>>>
> > > >>>>>
> > > >>>>> As per Apache Parquet Community Parquet V2 is not final yet so it
> > is not official . They are advising not to use Parquet V2 for writing
> > (though code is available ) .
> > > >>>>>
> > > >>>>> As per above Spark hasn't started using Parquet V2 for writing .
> > > >>>>> May I know how an unstable /unofficial  version is being used in
> > pyarrow ?
> > > >>>>>
> > >
> >
>

Reply via email to