Hi, While finishing the ORC writer in C++ I found that the ORC reader treats certain types in rather awkward ways. Hence I filed this Jira ticket: https://issues.apache.org/jira/browse/ARROW-11117 <https://issues.apache.org/jira/browse/ARROW-11117>
After starting to work on ORC tickets mostly filed by myself I began to worry that the type mappings in the ORC reader might already be used by users of Arrow. I wonder whether we should grandfather the issues or gradually switch to a new type mapping. Here are my proposed changes: 1. The ORC STRING type should be converted to the Arrow LARGE_STRING type instead of STRING type since it is large. 2. The ORC LIST type should be converted to the Arrow LARGE_LIST type instead of LIST type since it is large. 3. The ORC MAP type should be converted to the Arrow MAP type instead of list of structs with hardcoded field names as long as the offsets fit into int32. Otherwise we shouldn't return OK. Thanks, Ying