It probably makes sense to make this option configurable.  I think it is OK
to change the default to use Maps.  My guess is the initial ORC
implementation predated having a Map type in the specification.

On Thu, Jan 28, 2021 at 9:28 AM Ying Zhou <yzhou7...@gmail.com> wrote:

> Hi,
>
> Really thanks Deepak!
>
> I really want to edit the ORC reader to read ORC MAPs as Arrow MAPs now
> and it’s not a serious hassle to do so. Is there anyone who needs the
> read-ORC-maps-as-lists-of-structs functionality? If not I will do it likely
> in my current PR.
>
> Ying
>
> > On Jan 19, 2021, at 8:45 PM, Deepak Majeti <majeti.dee...@gmail.com>
> wrote:
> >
> > Hi Ying,
> >
> > I can help review/merge any ORC C++ contributions.
> >
> >
> > On Thu, Jan 14, 2021 at 6:57 PM Ying Zhou <yzhou7...@gmail.com> wrote:
> >
> >> Well, I haven’t found any. Thankfully ORC does work and I can figure out
> >> how it works by testing using simple examples. However I have never
> managed
> >> to contact the ORC community at all. They have never responded to any
> of my
> >> emails to d...@orc.apache.org <mailto:d...@orc.apache.org> I do want to
> add
> >> write Snappy support (which was actually already done 2 years ago by
> >> someone else but due to lack of unit testing it was never merged into
> >> master. I can write the tests.) and maybe Decimal256 to ORC C++ if they
> are
> >> wiling to review and merge them. If anyone has successfully contacted
> the
> >> ORC community please let me know how.
> >>
> >> Best,
> >> Ying
> >>
> >>> On Jan 14, 2021, at 8:39 AM, Antoine Pitrou <anto...@python.org>
> wrote:
> >>>
> >>>
> >>> Hi Ying,
> >>>
> >>> Is there a semantic description of the ORC data types somewhere?
> >>> I've read through https://orc.apache.org/docs/types.html and
> >>> https://orc.apache.org/specification/ORCv1/ but those docs don't seem
> >>> to explain the intent and constraints of each of the data types.
> >>>
> >>> Regards
> >>>
> >>> Antoine.
> >>>
> >>>
> >>>
> >>>
> >>> On Mon, 11 Jan 2021 21:15:05 -0500
> >>> Ying Zhou <yzhou7...@gmail.com> wrote:
> >>>> Thanks! What about 3?
> >>>> Shall we convert ORC maps to Arrow maps as opposed to lists of structs
> >> with fields of the structs named ‘key’ and ‘value’?
> >>>>
> >>>>
> >>>>
> >>>>> On Jan 10, 2021, at 6:45 PM, Jacques Nadeau <jacq...@apache.org>
> >> wrote:
> >>>>>
> >>>>> I don't think 1 & 2 make sense. I don't think there are a lot of
> users
> >>>>> reading 2gb strings or lists with 2B objects in them. Saying we just
> >> don't
> >>>>> support that pattern seems fine for now. I also believe the string
> and
> >> list
> >>>>> types have better cross-language support than the large variants.
> >>>>>
> >>>>> On Sun, Jan 10, 2021 at 8:49 AM Ying Zhou <yzhou7...@gmail.com>
> wrote:
> >>>>>
> >>>>>> Hi,
> >>>>>>
> >>>>>> While finishing the ORC writer in C++ I found that the ORC reader
> >> treats
> >>>>>> certain types in rather awkward ways. Hence I filed this Jira
> ticket:
> >>>>>> https://issues.apache.org/jira/browse/ARROW-11117 <
> >>>>>> https://issues.apache.org/jira/browse/ARROW-11117>
> >>>>>>
> >>>>>> After starting to work on ORC tickets mostly filed by myself I began
> >> to
> >>>>>> worry that the type mappings in the ORC reader might already be used
> >> by
> >>>>>> users of Arrow. I wonder whether we should grandfather the issues or
> >>>>>> gradually switch to a new type mapping.
> >>>>>>
> >>>>>> Here are my proposed changes:
> >>>>>> 1. The ORC STRING type should be converted to the Arrow LARGE_STRING
> >> type
> >>>>>> instead of STRING type since it is large.
> >>>>>> 2. The ORC LIST type should be converted to the Arrow LARGE_LIST
> type
> >>>>>> instead of LIST type since it is large.
> >>>>>> 3. The ORC MAP type should be converted to the Arrow MAP type
> instead
> >> of
> >>>>>> list of structs with hardcoded field names as long as
> >>>>>> the offsets fit into int32. Otherwise we shouldn't return OK.
> >>>>>>
> >>>>>> Thanks,
> >>>>>> Ying
> >>>>
> >>>>
> >>>
> >>>
> >>>
> >>
> >>
> >
> > --
> > regards,
> > Deepak Majeti
>
>

Reply via email to