Florian Jetter created ARROW-2194: ------------------------------------- Summary: Pandas columns metadata incorrect for empty string columns Key: ARROW-2194 URL: https://issues.apache.org/jira/browse/ARROW-2194 Project: Apache Arrow Issue Type: Bug Components: Python Affects Versions: 0.8.0 Reporter: Florian Jetter
The {{pandas_type}} for {{bytes}} or {{unicode}} columns of an empty pandas DataFrame is unexpectedly {{float64}} {code} import numpy as np import pandas as pd import pyarrow as pa import json empty_df = pd.DataFrame({'unicode': np.array([], dtype=np.unicode_), 'bytes': np.array([], dtype=np.bytes_)}) empty_table = pa.Table.from_pandas(empty_df) json.loads(empty_table.schema.metadata[b'pandas'])['columns'] # Same behavior for input dtype np.unicode_ [{u'field_name': u'bytes', u'metadata': None, u'name': u'bytes', u'numpy_type': u'object', u'pandas_type': u'float64'}, {u'field_name': u'unicode', u'metadata': None, u'name': u'unicode', u'numpy_type': u'object', u'pandas_type': u'float64'}, {u'field_name': u'__index_level_0__', u'metadata': None, u'name': None, u'numpy_type': u'int64', u'pandas_type': u'int64'}]{code} Tested on Debian 8 with python2.7 and python 3.6.4 -- This message was sent by Atlassian JIRA (v7.6.3#76005)