I don't think 1 & 2 make sense. I don't think there are a lot of users
reading 2gb strings or lists with 2B objects in them. Saying we just don't
support that pattern seems fine for now. I also believe the string and list
types have better cross-language support than the large variants.

On Sun, Jan 10, 2021 at 8:49 AM Ying Zhou <yzhou7...@gmail.com> wrote:

> Hi,
>
> While finishing the ORC writer in C++ I found that the ORC reader treats
> certain types in rather awkward ways. Hence I filed this Jira ticket:
> https://issues.apache.org/jira/browse/ARROW-11117 <
> https://issues.apache.org/jira/browse/ARROW-11117>
>
> After starting to work on ORC tickets mostly filed by myself I began to
> worry that the type mappings in the ORC reader might already be used by
> users of Arrow. I wonder whether we should grandfather the issues or
> gradually switch to a new type mapping.
>
> Here are my proposed changes:
> 1. The ORC STRING type should be converted to the Arrow LARGE_STRING type
> instead of STRING type since it is large.
> 2. The ORC LIST type should be converted to the Arrow LARGE_LIST type
> instead of LIST type since it is large.
> 3. The ORC MAP type should be converted to the Arrow MAP type instead of
> list of structs with hardcoded field names as long as
> the offsets fit into int32. Otherwise we shouldn't return OK.
>
> Thanks,
> Ying

Reply via email to