Hi,

Really thanks Deepak!

I really want to edit the ORC reader to read ORC MAPs as Arrow MAPs now and 
it’s not a serious hassle to do so. Is there anyone who needs the 
read-ORC-maps-as-lists-of-structs functionality? If not I will do it likely in 
my current PR.

Ying

> On Jan 19, 2021, at 8:45 PM, Deepak Majeti <majeti.dee...@gmail.com> wrote:
> 
> Hi Ying,
> 
> I can help review/merge any ORC C++ contributions.
> 
> 
> On Thu, Jan 14, 2021 at 6:57 PM Ying Zhou <yzhou7...@gmail.com> wrote:
> 
>> Well, I haven’t found any. Thankfully ORC does work and I can figure out
>> how it works by testing using simple examples. However I have never managed
>> to contact the ORC community at all. They have never responded to any of my
>> emails to d...@orc.apache.org <mailto:d...@orc.apache.org> I do want to add
>> write Snappy support (which was actually already done 2 years ago by
>> someone else but due to lack of unit testing it was never merged into
>> master. I can write the tests.) and maybe Decimal256 to ORC C++ if they are
>> wiling to review and merge them. If anyone has successfully contacted the
>> ORC community please let me know how.
>> 
>> Best,
>> Ying
>> 
>>> On Jan 14, 2021, at 8:39 AM, Antoine Pitrou <anto...@python.org> wrote:
>>> 
>>> 
>>> Hi Ying,
>>> 
>>> Is there a semantic description of the ORC data types somewhere?
>>> I've read through https://orc.apache.org/docs/types.html and
>>> https://orc.apache.org/specification/ORCv1/ but those docs don't seem
>>> to explain the intent and constraints of each of the data types.
>>> 
>>> Regards
>>> 
>>> Antoine.
>>> 
>>> 
>>> 
>>> 
>>> On Mon, 11 Jan 2021 21:15:05 -0500
>>> Ying Zhou <yzhou7...@gmail.com> wrote:
>>>> Thanks! What about 3?
>>>> Shall we convert ORC maps to Arrow maps as opposed to lists of structs
>> with fields of the structs named ‘key’ and ‘value’?
>>>> 
>>>> 
>>>> 
>>>>> On Jan 10, 2021, at 6:45 PM, Jacques Nadeau <jacq...@apache.org>
>> wrote:
>>>>> 
>>>>> I don't think 1 & 2 make sense. I don't think there are a lot of users
>>>>> reading 2gb strings or lists with 2B objects in them. Saying we just
>> don't
>>>>> support that pattern seems fine for now. I also believe the string and
>> list
>>>>> types have better cross-language support than the large variants.
>>>>> 
>>>>> On Sun, Jan 10, 2021 at 8:49 AM Ying Zhou <yzhou7...@gmail.com> wrote:
>>>>> 
>>>>>> Hi,
>>>>>> 
>>>>>> While finishing the ORC writer in C++ I found that the ORC reader
>> treats
>>>>>> certain types in rather awkward ways. Hence I filed this Jira ticket:
>>>>>> https://issues.apache.org/jira/browse/ARROW-11117 <
>>>>>> https://issues.apache.org/jira/browse/ARROW-11117>
>>>>>> 
>>>>>> After starting to work on ORC tickets mostly filed by myself I began
>> to
>>>>>> worry that the type mappings in the ORC reader might already be used
>> by
>>>>>> users of Arrow. I wonder whether we should grandfather the issues or
>>>>>> gradually switch to a new type mapping.
>>>>>> 
>>>>>> Here are my proposed changes:
>>>>>> 1. The ORC STRING type should be converted to the Arrow LARGE_STRING
>> type
>>>>>> instead of STRING type since it is large.
>>>>>> 2. The ORC LIST type should be converted to the Arrow LARGE_LIST type
>>>>>> instead of LIST type since it is large.
>>>>>> 3. The ORC MAP type should be converted to the Arrow MAP type instead
>> of
>>>>>> list of structs with hardcoded field names as long as
>>>>>> the offsets fit into int32. Otherwise we shouldn't return OK.
>>>>>> 
>>>>>> Thanks,
>>>>>> Ying
>>>> 
>>>> 
>>> 
>>> 
>>> 
>> 
>> 
> 
> -- 
> regards,
> Deepak Majeti

Reply via email to