Hi,

While finishing the ORC writer in C++ I found that the ORC reader treats 
certain types in rather awkward ways. Hence I filed this Jira ticket: 
https://issues.apache.org/jira/browse/ARROW-11117 
<https://issues.apache.org/jira/browse/ARROW-11117>

After starting to work on ORC tickets mostly filed by myself I began to worry 
that the type mappings in the ORC reader might already be used by users of 
Arrow. I wonder whether we should grandfather the issues or gradually switch to 
a new type mapping.

Here are my proposed changes:
1. The ORC STRING type should be converted to the Arrow LARGE_STRING type 
instead of STRING type since it is large.
2. The ORC LIST type should be converted to the Arrow LARGE_LIST type instead 
of LIST type since it is large.
3. The ORC MAP type should be converted to the Arrow MAP type instead of list 
of structs with hardcoded field names as long as 
the offsets fit into int32. Otherwise we shouldn't return OK.

Thanks,
Ying

Reply via email to