Florian Jetter created ARROW-5104: ------------------------------------- Summary: [Python/C++] Schema for empty tables include index column as integer Key: ARROW-5104 URL: https://issues.apache.org/jira/browse/ARROW-5104 Project: Apache Arrow Issue Type: Bug Affects Versions: 0.13.0 Reporter: Florian Jetter
The schema for an empty table/dataframe still includes the index as an integer column instead of being serialized solely as a metadata reference (see ARROW-1639) In the example below, the empty dataframe still holds `__index_level_0__` as an integer column. Proper behavior would be to exclude it and reference the index information in the pandas metadata as it is the case for a non-empty column {code} In [1]: import pandas as pd im In [2]: import pyarrow as pa In [3]: non_empty = pd.DataFrame({"col": [1]}) In [4]: empty = non_empty.drop(0) In [5]: empty Out[5]: Empty DataFrame Columns: [col] Index: [] In [6]: pa.Table.from_pandas(non_empty) Out[6]: pyarrow.Table col: int64 metadata -------- OrderedDict([(b'pandas', b'{"index_columns": [{"kind": "range", "name": null, "start": ' b'0, "stop": 1, "step": 1}], "column_indexes": [{"name": null,' b' "field_name": null, "pandas_type": "unicode", "numpy_type":' b' "object", "metadata": {"encoding": "UTF-8"}}], "columns": [' b'{"name": "col", "field_name": "col", "pandas_type": "int64",' b' "numpy_type": "int64", "metadata": null}], "creator": {"lib' b'rary": "pyarrow", "version": "0.13.0"}, "pandas_version": nu' b'll}')]) In [7]: pa.Table.from_pandas(empty) Out[7]: pyarrow.Table col: int64 __index_level_0__: int64 metadata -------- OrderedDict([(b'pandas', b'{"index_columns": ["__index_level_0__"], "column_indexes": [' b'{"name": null, "field_name": null, "pandas_type": "unicode",' b' "numpy_type": "object", "metadata": {"encoding": "UTF-8"}}]' b', "columns": [{"name": "col", "field_name": "col", "pandas_t' b'ype": "int64", "numpy_type": "int64", "metadata": null}, {"n' b'ame": null, "field_name": "__index_level_0__", "pandas_type"' b': "int64", "numpy_type": "int64", "metadata": null}], "creat' b'or": {"library": "pyarrow", "version": "0.13.0"}, "pandas_ve' b'rsion": null}')]) In [8]: pa.__version__ Out[8]: '0.13.0' In [9]: ! python --version Python 3.6.7 {code} -- This message was sent by Atlassian JIRA (v7.6.3#76005)