Thanks. I similarly noticed that uint32 gets converted to int64. This makes some surface sense as uint32 is a logical type with int64 as the backing physical type. However, uint8, uint16, and uint64 all keep their data types so I was a little surprised.
On Fri, Dec 6, 2019 at 6:52 AM Wes McKinney <wesmck...@gmail.com> wrote: > Some notes > > * 96-bit nanosecond timestamps are deprecated in the Parquet format by > default, so we don't write them by default unless you use the > use_deprecated_int96_timestamps flag > * 64-bit timestamps are relatively new to the Parquet format, I'm not > actually sure what's required to write these. Using version='2.0' is > not safe because our implementation of Parquet V2 data pages is > incorrect (see PARQUET-458) > > So I'd recommend using the deprecated int96 flag if you need > nanoseconds right now > > On Fri, Dec 6, 2019 at 8:50 AM Weston Pace <weston.p...@gmail.com> wrote: > > > > If my table has timestamp fields with ns resolution and I save the table > to > > parquet format without specifying any timestamp args (default coerce and > > legacy settings) then it automatically converts my timestamp to us > > resolution. > > > > As best I can tell Parquet supports ns resolution so I would prefer it > just > > keep that. Is there some argument I can pass to write_table to get my > > desired resolution? > > > > Here is an example program: > > > > import pyarrow as pa > > import pyarrow.parquet as pq > > > > table = pa.table({'mytimestamp': []}, schema=pa.schema({'mytimestamp': > > pa.timestamp('ns')})) > > pq.write_table(table, '/tmp/foo.parquet') > > table2 = pq.read_table('/tmp/foo.parquet') > > print(table.schema.field('mytimestamp').type) > > # timestamp[ns] > > print(table2.schema.field('mytimestamp').type) > > # timestamp[us] >