Florian Jetter created ARROW-8142: ------------------------------------- Summary: [Python/C++] Casting empty table from after parquet roundtrip causes critical failure Key: ARROW-8142 URL: https://issues.apache.org/jira/browse/ARROW-8142 Project: Apache Arrow Issue Type: Bug Reporter: Florian Jetter
When casting a schema of an empty table from dict encoded to non-dict encoded type a critical error is raised and not handled causing the interpreter to shut down. This only happens after a parquet roundtrip {code:python} import pyarrow as paimport pandas as pdimport pyarrow.parquet as pq df = pd.DataFrame({"col": ["a"]}).astype({"col": "category"}).iloc[:0] table = pa.Table.from_pandas(df)field = table.schema[0] new_field = pa.field(field.name, field.type.value_type, field.nullable, field.metadata) buf = pa.BufferOutputStream() pq.write_table(table, buf) reader = pa.BufferReader(buf.getvalue().to_pybytes()) table = pq.read_table(reader) schema = table.schema.remove(0).insert(0, new_field) new_table = table.cast(schema) assert new_table.schema == schema {code} Output {code:java} WARNING: Logging before InitGoogleLogging() is written to STDERR F0318 09:55:14.266649 299722176 table.cc:47] Check failed: (chunks.size()) > (0) cannot construct ChunkedArray from empty vector and omitted type {code} -- This message was sent by Atlassian Jira (v8.3.4#803005)