Re: parquet 2 incompatibility between 0.16 and 0.17?

Wes McKinney Wed, 29 Apr 2020 16:06:17 -0700

Ah, so we have a slight mess on our hands because the patch for
PARQUET-458 enabled the use of DataPageV2, which is not forward
compatible with older version because the implementation was fixed
(see the JIRA for more details)

https://github.com/apache/arrow/commit/809d40ab9518bd254705f35af01162a9da588516

Unfortunately, in Python the version='1.0' / version='2.0' flag is
being used for two different purposes:

* Expanded ConvertedType / LogicalType metadata, like unsigned types
and nanosecond timestamps
* DataPageV1 vs. DataPageV2 data pages

I think we should separate these concepts and instead have a
"compatibility mode" option regarding the ConvertedType/LogicalType
annotations and the behavior around conversions when writing unsigned
integers, nanosecond timestamps, and other types to Parquet V1 (which
is the only "production" Parquet format).

On Wed, Apr 29, 2020 at 5:56 PM Pierre Belzile <pierre.belz...@gmail.com> wrote:
>
> Hi,
>
> We've been using the parquet 2 format (mostly because of nanosecond
> resolution). I'm getting crashes in the C++ parquet decoder, arrow 0.16,
> when decoding a parquet 2 file created with pyarrow 0.17.0. Is this
> expected? Would a 0.17 decode a 0.16?
>
> If that's not expected, I can put the debugger on it and see what is
> happening. I suspect it's with string fields (regular, not large string).
>
> Cheers, Pierre

Re: parquet 2 incompatibility between 0.16 and 0.17?

Reply via email to