As an aside (and probably a discussion for the Parquet community) it would be useful to develop an integration testing framework similar to what we've done for Arrow (with JSON "point of truth") but for Parquet files. The limited integration testing across Parquet implementations is definitely concerning
On Sun, Mar 17, 2019 at 4:26 AM Uwe L. Korn <uw...@xhochy.com> wrote: > > Hello Andy, > > I guess these files stem from the beginning of the Parquet format, when only > INT96 timestamps were available. Feel free to add more of them. Using the > Java implementation is best, this is definitely the reference through age and > wide usage. > > Uwe > > > Am 17.03.2019 um 00:55 schrieb Andy Grove <andygrov...@gmail.com>: > > > > Hi, > > > > The currently available Parquet files in the Arrow repository don't seem to > > be sufficient for unit testing support for all date/time data types. For > > example the "alltypes_plain.parquet" only has one time column and it uses > > the deprecated INT96 encoding. > > > > I guess I'm volunteering to try and create a Parquet test file that covers > > all types needed, probably using the Java implementation since that would > > be easiest for me. Before I do this though, am I missing something? It > > seems odd that we don't already have this? > > > > Thanks, > > > > Andy. >