Diego Argueta created ARROW-2026: ------------------------------------ Summary: Timestamps saved as int64 even if use_deprecated_int96_timestamps=True Key: ARROW-2026 URL: https://issues.apache.org/jira/browse/ARROW-2026 Project: Apache Arrow Issue Type: Bug Components: Python Affects Versions: 0.8.0 Environment: OS: Mac OS X 10.13.2 Python: 3.6.4 PyArrow: 0.8.0 Reporter: Diego Argueta
When writing to a Parquet file, if `use_deprecated_int96_timestamps` is True, timestamps are only written as 96-bit integers if the timestamp has nanosecond resolution. This is a problem because Amazon Redshift timestamps only have microsecond resolution but requires 96-bit format in Parquet files. I'd expect the use_deprecated_int96_timestamps flag to cause _all_ timestamps to be written as 96 bits, regardless of resolution. If this is a deliberate design decision, it'd be immensely helpful if it were explicitly documented as part of the argument. To reproduce: 1. Create a table with a timestamp having microsecond or millisecond resolution, and save it to a Parquet file. Be sure to set `use_deprecated_int96_timestamps` to True. {code:java} import datetime import pyarrow from pyarrow import parquet schema = pyarrow.schema([ pyarrow.field('last_updated', pyarrow.timestamp('us')), ]) data = [ pyarrow.array([datetime.datetime.now()], pyarrow.timestamp('us')), ] table = pyarrow.Table.from_arrays(data, ['last_updated']) with open('test_file.parquet', 'wb') as fdesc: parquet.write_table(table, fdesc, use_deprecated_int96_timestamps=True) {code} 2. Inspect the file. I used parquet-tools: {noformat} dak@tux ~ $ parquet-tools meta test_file.parquet file: file:/Users/dak/test_file.parquet creator: parquet-cpp version 1.3.2-SNAPSHOT file schema: schema -------------------------------------------------------------------------------- last_updated: OPTIONAL INT64 O:TIMESTAMP_MICROS R:0 D:1 row group 1: RC:1 TS:76 OFFSET:4 -------------------------------------------------------------------------------- last_updated: INT64 SNAPPY DO:4 FPO:28 SZ:76/72/0.95 VC:1 ENC:PLAIN,PLAIN_DICTIONARY,RLE{noformat} -- This message was sent by Atlassian JIRA (v7.6.3#76005)