Re: Fwd: PyArrow Using Parquet V2

Jacob Wujciak Tue, 23 Apr 2024 21:43:23 -0700

Hello,

First off, please try to clean up formating of emails to be legible when 
forwarding/quoting previous messages multiple times, especially when most of 
the quotes do not contain any useful information. It makes it much easier to 
parse the message and thus quicker to answer.


The short answer is that we switched to 2.4 and more recently to 2.6 as the 
default to enable the usage of features these versions provide. As you have 
correctly quoted from the docs you can still write 1.0 if you want to ensure 
compatibility with systems that can not process the 'newer' versions yet (2.6 
was released in 2018!).

You can find the long form discussions about these changes here:
https://issues.apache.org/jira/browse/ARROW-12203
https://lists.apache.org/thread/027g366yr3m03hwtpst6sr58b3trwhsm

Best
Jacob

On 2024/04/24 02:32:01 Prem Sahoo wrote:
> Hello Team,
> Could you please share your thoughts about below questions?
> Sent from my iPhone
> 
> Begin forwarded message:
> 
> > From: Prem Sahoo <prem.re...@gmail.com>
> > Date: April 23, 2024 at 11:03:48 AM EDT
> > To: dev-ow...@arrow.apache.org
> > Subject: Re: PyArrow Using Parquet V2
> > 
> > dev@arrow.apache.org
> > Sent from my iPhone
> > 
> >>> On Apr 23, 2024, at 6:25 AM, Prem Sahoo <prem.re...@gmail.com> wrote:
> >>> 
> >> Hello Team,
> >> Could anyone please help me on below query?
> >> Sent from my iPhone
> >> 
> >>>> On Apr 22, 2024, at 10:01 PM, Prem Sahoo <prem.re...@gmail.com> wrote:
> >>>> 
> >>> 
> >>> Sent from my iPhone
> >>> 
> >>>>> On Apr 22, 2024, at 9:51 PM, Prem Sahoo <prem.re...@gmail.com> wrote:
> >>>>> 
> >>>> 
> >>>> 
> >>>>> 
> >>>>> 
> >>>>> Hello Team,
> >>>>> I have a question regarding Parquet V2 writing thro pyarrow .
> >>>>> As per below Pyarrow started writing Parquet in V2 encoding. 
> >>>>> https://arrow.apache.org/docs/python/generated/pyarrow.parquet.write_table.html#pyarrow.parquet.write_table
> >>>>> 
> >>>>> version{“1.0”, “2.4”, “2.6”}, default “2.6”
> >>>>> Determine which Parquet logical types are available for use, whether 
> >>>>> the reduced set from the Parquet 1.x.x format or the expanded logical 
> >>>>> types added in later format versions. Files written with version=’2.4’ 
> >>>>> or ‘2.6’ may not be readable in all Parquet implementations, so 
> >>>>> version=’1.0’ is likely the choice that maximizes file compatibility. 
> >>>>> UINT32 and some logical types are only available with version ‘2.4’. 
> >>>>> Nanosecond timestamps are only available with version ‘2.6’. Other 
> >>>>> features such as compression algorithms or the new serialized data page 
> >>>>> format must be enabled separately (see ‘compression’ and 
> >>>>> ‘data_page_version’).
> >>>>> 
> >>>>> 
> >>>>> As per Apache Parquet Community Parquet V2 is not final yet so it is 
> >>>>> not official . They are advising not to use Parquet V2 for writing 
> >>>>> (though code is available ) .
> >>>>> 
> >>>>> As per above Spark hasn't started using Parquet V2 for writing .
> >>>>> May I know how an unstable /unofficial  version is being used in 
> >>>>> pyarrow ?
> >>>>> 
>

Re: Fwd: PyArrow Using Parquet V2

Reply via email to