Joris Van den Bossche created ARROW-6187:
--------------------------------------------
Summary: [C++] fallback to storage type when writing ExtensionType
to Parquet
Key: ARROW-6187
URL: https://issues.apache.org/jira/browse/ARROW-6187
Project: Apache Arrow
Issue Type: Improvement
Components: C++
Reporter: Joris Van den Bossche
Writing a table that contains an ExtensionType array to a parquet file is not
yet implemented. It currently raises "ArrowNotImplementedError: Unhandled type
for Arrow to Parquet schema conversion: extension<arrow.py_extension_type>"
(for a PyExtensionType in this case).
I think minimal support can consist of writing the storage type / array.
We also might want to save the extension name and metadata in the parquet
FileMetadata.
Later on, this could be potentially be used to restore the extension type when
reading. This is related to other issues that need to save the arrow schema
(categorical: ARROW-5480, time zones: ARROW-5888). Only in this case, we
probably want to store the serialised type in addition to the schema (which
only has the extension type's name).
--
This message was sent by Atlassian JIRA
(v7.6.14#76016)