Ah, so we have a slight mess on our hands because the patch for PARQUET-458 enabled the use of DataPageV2, which is not forward compatible with older version because the implementation was fixed (see the JIRA for more details)
https://github.com/apache/arrow/commit/809d40ab9518bd254705f35af01162a9da588516 Unfortunately, in Python the version='1.0' / version='2.0' flag is being used for two different purposes: * Expanded ConvertedType / LogicalType metadata, like unsigned types and nanosecond timestamps * DataPageV1 vs. DataPageV2 data pages I think we should separate these concepts and instead have a "compatibility mode" option regarding the ConvertedType/LogicalType annotations and the behavior around conversions when writing unsigned integers, nanosecond timestamps, and other types to Parquet V1 (which is the only "production" Parquet format). On Wed, Apr 29, 2020 at 5:56 PM Pierre Belzile <pierre.belz...@gmail.com> wrote: > > Hi, > > We've been using the parquet 2 format (mostly because of nanosecond > resolution). I'm getting crashes in the C++ parquet decoder, arrow 0.16, > when decoding a parquet 2 file created with pyarrow 0.17.0. Is this > expected? Would a 0.17 decode a 0.16? > > If that's not expected, I can put the debugger on it and see what is > happening. I suspect it's with string fields (regular, not large string). > > Cheers, Pierre