Thanks.  I similarly noticed that uint32 gets converted to int64.  This
makes some surface sense as uint32 is a logical type with int64 as the
backing physical type.  However, uint8, uint16, and uint64 all keep their
data types so I was a little surprised.

On Fri, Dec 6, 2019 at 6:52 AM Wes McKinney <wesmck...@gmail.com> wrote:

> Some notes
>
> * 96-bit nanosecond timestamps are deprecated in the Parquet format by
> default, so we don't write them by default unless you use the
> use_deprecated_int96_timestamps flag
> * 64-bit timestamps are relatively new to the Parquet format, I'm not
> actually sure what's required to write these. Using version='2.0' is
> not safe because our implementation of Parquet V2 data pages is
> incorrect (see PARQUET-458)
>
> So I'd recommend using the deprecated int96 flag if you need
> nanoseconds right now
>
> On Fri, Dec 6, 2019 at 8:50 AM Weston Pace <weston.p...@gmail.com> wrote:
> >
> > If my table has timestamp fields with ns resolution and I save the table
> to
> > parquet format without specifying any timestamp args (default coerce and
> > legacy settings) then it automatically converts my timestamp to us
> > resolution.
> >
> > As best I can tell Parquet supports ns resolution so I would prefer it
> just
> > keep that.  Is there some argument I can pass to write_table to get my
> > desired resolution?
> >
> > Here is an example program:
> >
> > import pyarrow as pa
> > import pyarrow.parquet as pq
> >
> > table = pa.table({'mytimestamp': []}, schema=pa.schema({'mytimestamp':
> > pa.timestamp('ns')}))
> > pq.write_table(table, '/tmp/foo.parquet')
> > table2 = pq.read_table('/tmp/foo.parquet')
> > print(table.schema.field('mytimestamp').type)
> > # timestamp[ns]
> > print(table2.schema.field('mytimestamp').type)
> > # timestamp[us]
>

Reply via email to