Re: Parquet test files with all data types

2019-03-17 Thread Wes McKinney
As an aside (and probably a discussion for the Parquet community) it would be useful to develop an integration testing framework similar to what we've done for Arrow (with JSON "point of truth") but for Parquet files. The limited integration testing across Parquet implementations is definitely conc

Re: Parquet test files with all data types

2019-03-17 Thread Uwe L. Korn
Hello Andy, I guess these files stem from the beginning of the Parquet format, when only INT96 timestamps were available. Feel free to add more of them. Using the Java implementation is best, this is definitely the reference through age and wide usage. Uwe > Am 17.03.2019 um 00:55 schrieb And

Parquet test files with all data types

2019-03-16 Thread Andy Grove
Hi, The currently available Parquet files in the Arrow repository don't seem to be sufficient for unit testing support for all date/time data types. For example the "alltypes_plain.parquet" only has one time column and it uses the deprecated INT96 encoding. I guess I'm volunteering to try and cre