Hi Ying,

Is there a semantic description of the ORC data types somewhere?
I've read through https://orc.apache.org/docs/types.html and
https://orc.apache.org/specification/ORCv1/ but those docs don't seem
to explain the intent and constraints of each of the data types.

Regards

Antoine.




On Mon, 11 Jan 2021 21:15:05 -0500
Ying Zhou <yzhou7...@gmail.com> wrote:
> Thanks! What about 3? 
> Shall we convert ORC maps to Arrow maps as opposed to lists of structs with 
> fields of the structs named ‘key’ and ‘value’?
> 
> 
> 
> > On Jan 10, 2021, at 6:45 PM, Jacques Nadeau <jacq...@apache.org> wrote:
> > 
> > I don't think 1 & 2 make sense. I don't think there are a lot of users
> > reading 2gb strings or lists with 2B objects in them. Saying we just don't
> > support that pattern seems fine for now. I also believe the string and list
> > types have better cross-language support than the large variants.
> > 
> > On Sun, Jan 10, 2021 at 8:49 AM Ying Zhou <yzhou7...@gmail.com> wrote:
> >   
> >> Hi,
> >> 
> >> While finishing the ORC writer in C++ I found that the ORC reader treats
> >> certain types in rather awkward ways. Hence I filed this Jira ticket:
> >> https://issues.apache.org/jira/browse/ARROW-11117 <
> >> https://issues.apache.org/jira/browse/ARROW-11117>
> >> 
> >> After starting to work on ORC tickets mostly filed by myself I began to
> >> worry that the type mappings in the ORC reader might already be used by
> >> users of Arrow. I wonder whether we should grandfather the issues or
> >> gradually switch to a new type mapping.
> >> 
> >> Here are my proposed changes:
> >> 1. The ORC STRING type should be converted to the Arrow LARGE_STRING type
> >> instead of STRING type since it is large.
> >> 2. The ORC LIST type should be converted to the Arrow LARGE_LIST type
> >> instead of LIST type since it is large.
> >> 3. The ORC MAP type should be converted to the Arrow MAP type instead of
> >> list of structs with hardcoded field names as long as
> >> the offsets fit into int32. Otherwise we shouldn't return OK.
> >> 
> >> Thanks,
> >> Ying  
> 
> 



Reply via email to