Wes,

You used the words "forward compatible". Does this mean that 0.17 is able
to decode 0.16 datapagev2?

Crossing my fingers...

Pierre

Le mer. 29 avr. 2020 à 19:05, Wes McKinney <wesmck...@gmail.com> a écrit :

> Ah, so we have a slight mess on our hands because the patch for
> PARQUET-458 enabled the use of DataPageV2, which is not forward
> compatible with older version because the implementation was fixed
> (see the JIRA for more details)
>
>
> https://github.com/apache/arrow/commit/809d40ab9518bd254705f35af01162a9da588516
>
> Unfortunately, in Python the version='1.0' / version='2.0' flag is
> being used for two different purposes:
>
> * Expanded ConvertedType / LogicalType metadata, like unsigned types
> and nanosecond timestamps
> * DataPageV1 vs. DataPageV2 data pages
>
> I think we should separate these concepts and instead have a
> "compatibility mode" option regarding the ConvertedType/LogicalType
> annotations and the behavior around conversions when writing unsigned
> integers, nanosecond timestamps, and other types to Parquet V1 (which
> is the only "production" Parquet format).
>
> On Wed, Apr 29, 2020 at 5:56 PM Pierre Belzile <pierre.belz...@gmail.com>
> wrote:
> >
> > Hi,
> >
> > We've been using the parquet 2 format (mostly because of nanosecond
> > resolution). I'm getting crashes in the C++ parquet decoder, arrow 0.16,
> > when decoding a parquet 2 file created with pyarrow 0.17.0. Is this
> > expected? Would a 0.17 decode a 0.16?
> >
> > If that's not expected, I can put the debugger on it and see what is
> > happening. I suspect it's with string fields (regular, not large string).
> >
> > Cheers, Pierre
>

Reply via email to