[ https://issues.apache.org/jira/browse/ARROW-4967?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
Rok Mihevc updated ARROW-4967: ------------------------------ External issue URL: https://github.com/apache/arrow/issues/16034 > [C++] Parquet: Object type and stats lost when using 96-bit timestamps > ---------------------------------------------------------------------- > > Key: ARROW-4967 > URL: https://issues.apache.org/jira/browse/ARROW-4967 > Project: Apache Arrow > Issue Type: Bug > Components: C++, Python > Affects Versions: 0.12.1 > Environment: PyArrow: 0.12.1 > Python: 2.7.15, 3.7.2 > Pandas: 0.24.2 > Reporter: Diego Argueta > Priority: Minor > Labels: parquet > > Run the following code: > {code:python} > import datetime as dt > import pandas as pd > import pyarrow as pa > import pyarrow.parquet as pq > dataframe = pd.DataFrame({'foo': [dt.datetime.now()]}) > table = pa.Table.from_pandas(dataframe, preserve_index=False) > pq.write_table(table, 'int64.parq') > pq.write_table(table, 'int96.parq', use_deprecated_int96_timestamps=True) > {code} > Examining the {{int64.parq}} file, we see that the column metadata includes > an object type of {{TIMESTAMP_MICROS}} and also gives some stats. All is well. > {code} > file schema: schema > -------------------------------------------------------------------------------- > foo: OPTIONAL INT64 O:TIMESTAMP_MICROS R:0 D:1 > row group 1: RC:1 TS:76 OFFSET:4 > -------------------------------------------------------------------------------- > foo: INT64 SNAPPY ... ST:[min: 2019-12-31T23:59:59.999000, max: > 2019-12-31T23:59:59.999000, num_nulls: 0] > {code} > However, if we look at {{int96.parq}}, it appears that that metadata is lost. > No object type, and no column stats. > {code} > file schema: schema > -------------------------------------------------------------------------------- > foo: OPTIONAL INT96 R:0 D:1 > row group 1: RC:1 TS:58 OFFSET:4 > -------------------------------------------------------------------------------- > foo: INT96 SNAPPY ... ST:[no stats for this column] > {code} > This is a bit confusing since the metadata for the exact same data can look > differently depending on an unrelated flag being set or cleared. -- This message was sent by Atlassian Jira (v8.20.10#820010)