On Wed, Apr 29, 2020 at 6:15 PM Pierre Belzile <pierre.belz...@gmail.com> wrote: > > Wes, > > You used the words "forward compatible". Does this mean that 0.17 is able > to decode 0.16 datapagev2?
0.16 doesn't write DataPageV2 at all, the version flag only determines the type casting and metadata behavior I indicated in my email. The changes in https://github.com/apache/arrow/commit/809d40ab9518bd254705f35af01162a9da588516 enabled the use of DataPageV2 and I/we didn't think about the forward compatibility issue (version=2.0 files written in 0.17.0 being unreadable in 0.16.0). We might actually want to revert this (just the toggle between DataPageV1/V2, not the whole patch). > Crossing my fingers... > > Pierre > > Le mer. 29 avr. 2020 à 19:05, Wes McKinney <wesmck...@gmail.com> a écrit : > > > Ah, so we have a slight mess on our hands because the patch for > > PARQUET-458 enabled the use of DataPageV2, which is not forward > > compatible with older version because the implementation was fixed > > (see the JIRA for more details) > > > > > > https://github.com/apache/arrow/commit/809d40ab9518bd254705f35af01162a9da588516 > > > > Unfortunately, in Python the version='1.0' / version='2.0' flag is > > being used for two different purposes: > > > > * Expanded ConvertedType / LogicalType metadata, like unsigned types > > and nanosecond timestamps > > * DataPageV1 vs. DataPageV2 data pages > > > > I think we should separate these concepts and instead have a > > "compatibility mode" option regarding the ConvertedType/LogicalType > > annotations and the behavior around conversions when writing unsigned > > integers, nanosecond timestamps, and other types to Parquet V1 (which > > is the only "production" Parquet format). > > > > On Wed, Apr 29, 2020 at 5:56 PM Pierre Belzile <pierre.belz...@gmail.com> > > wrote: > > > > > > Hi, > > > > > > We've been using the parquet 2 format (mostly because of nanosecond > > > resolution). I'm getting crashes in the C++ parquet decoder, arrow 0.16, > > > when decoding a parquet 2 file created with pyarrow 0.17.0. Is this > > > expected? Would a 0.17 decode a 0.16? > > > > > > If that's not expected, I can put the debugger on it and see what is > > > happening. I suspect it's with string fields (regular, not large string). > > > > > > Cheers, Pierre > >