Re: [C++] Shall we modify the ORC reader?

Ying Zhou Mon, 11 Jan 2021 18:15:17 -0800

Thanks! What about 3? 
Shall we convert ORC maps to Arrow maps as opposed to lists of structs with 
fields of the structs named ‘key’ and ‘value’?




> On Jan 10, 2021, at 6:45 PM, Jacques Nadeau <[email protected]> wrote:
> 
> I don't think 1 & 2 make sense. I don't think there are a lot of users
> reading 2gb strings or lists with 2B objects in them. Saying we just don't
> support that pattern seems fine for now. I also believe the string and list
> types have better cross-language support than the large variants.
> 
> On Sun, Jan 10, 2021 at 8:49 AM Ying Zhou <[email protected]> wrote:
> 
>> Hi,
>> 
>> While finishing the ORC writer in C++ I found that the ORC reader treats
>> certain types in rather awkward ways. Hence I filed this Jira ticket:
>> https://issues.apache.org/jira/browse/ARROW-11117 <
>> https://issues.apache.org/jira/browse/ARROW-11117>
>> 
>> After starting to work on ORC tickets mostly filed by myself I began to
>> worry that the type mappings in the ORC reader might already be used by
>> users of Arrow. I wonder whether we should grandfather the issues or
>> gradually switch to a new type mapping.
>> 
>> Here are my proposed changes:
>> 1. The ORC STRING type should be converted to the Arrow LARGE_STRING type
>> instead of STRING type since it is large.
>> 2. The ORC LIST type should be converted to the Arrow LARGE_LIST type
>> instead of LIST type since it is large.
>> 3. The ORC MAP type should be converted to the Arrow MAP type instead of
>> list of structs with hardcoded field names as long as
>> the offsets fit into int32. Otherwise we shouldn't return OK.
>> 
>> Thanks,
>> Ying

Re: [C++] Shall we modify the ORC reader?

Reply via email to