Hi Ying,

I can help review/merge any ORC C++ contributions.


On Thu, Jan 14, 2021 at 6:57 PM Ying Zhou <yzhou7...@gmail.com> wrote:

> Well, I haven’t found any. Thankfully ORC does work and I can figure out
> how it works by testing using simple examples. However I have never managed
> to contact the ORC community at all. They have never responded to any of my
> emails to d...@orc.apache.org <mailto:d...@orc.apache.org> I do want to add
> write Snappy support (which was actually already done 2 years ago by
> someone else but due to lack of unit testing it was never merged into
> master. I can write the tests.) and maybe Decimal256 to ORC C++ if they are
> wiling to review and merge them. If anyone has successfully contacted the
> ORC community please let me know how.
>
> Best,
> Ying
>
> > On Jan 14, 2021, at 8:39 AM, Antoine Pitrou <anto...@python.org> wrote:
> >
> >
> > Hi Ying,
> >
> > Is there a semantic description of the ORC data types somewhere?
> > I've read through https://orc.apache.org/docs/types.html and
> > https://orc.apache.org/specification/ORCv1/ but those docs don't seem
> > to explain the intent and constraints of each of the data types.
> >
> > Regards
> >
> > Antoine.
> >
> >
> >
> >
> > On Mon, 11 Jan 2021 21:15:05 -0500
> > Ying Zhou <yzhou7...@gmail.com> wrote:
> >> Thanks! What about 3?
> >> Shall we convert ORC maps to Arrow maps as opposed to lists of structs
> with fields of the structs named ‘key’ and ‘value’?
> >>
> >>
> >>
> >>> On Jan 10, 2021, at 6:45 PM, Jacques Nadeau <jacq...@apache.org>
> wrote:
> >>>
> >>> I don't think 1 & 2 make sense. I don't think there are a lot of users
> >>> reading 2gb strings or lists with 2B objects in them. Saying we just
> don't
> >>> support that pattern seems fine for now. I also believe the string and
> list
> >>> types have better cross-language support than the large variants.
> >>>
> >>> On Sun, Jan 10, 2021 at 8:49 AM Ying Zhou <yzhou7...@gmail.com> wrote:
> >>>
> >>>> Hi,
> >>>>
> >>>> While finishing the ORC writer in C++ I found that the ORC reader
> treats
> >>>> certain types in rather awkward ways. Hence I filed this Jira ticket:
> >>>> https://issues.apache.org/jira/browse/ARROW-11117 <
> >>>> https://issues.apache.org/jira/browse/ARROW-11117>
> >>>>
> >>>> After starting to work on ORC tickets mostly filed by myself I began
> to
> >>>> worry that the type mappings in the ORC reader might already be used
> by
> >>>> users of Arrow. I wonder whether we should grandfather the issues or
> >>>> gradually switch to a new type mapping.
> >>>>
> >>>> Here are my proposed changes:
> >>>> 1. The ORC STRING type should be converted to the Arrow LARGE_STRING
> type
> >>>> instead of STRING type since it is large.
> >>>> 2. The ORC LIST type should be converted to the Arrow LARGE_LIST type
> >>>> instead of LIST type since it is large.
> >>>> 3. The ORC MAP type should be converted to the Arrow MAP type instead
> of
> >>>> list of structs with hardcoded field names as long as
> >>>> the offsets fit into int32. Otherwise we shouldn't return OK.
> >>>>
> >>>> Thanks,
> >>>> Ying
> >>
> >>
> >
> >
> >
>
>

-- 
regards,
Deepak Majeti

Reply via email to