Re: Proposed new type: Fixed width list

Micah Kornfield Tue, 12 Jul 2016 22:42:25 -0700

Two questions come to mind.
1.  Is it useful to have fixed width with list types exclusive of
binary types?
2.  Should binary/string types have there own separate memory
layout/be a primitive type?


IMO, I think I think the answer to 1  is yes.  Another example of a
use-case where this is handy is for the outputs of the aggregate
functions "histogram_numeric" and "percentile_approx" in Apache Hive
[1].

For #2, I'm still not sure I see the a clear benefit or harm either
way.  The benefit of having there own type, is by definition, you
don't need to worry about ill formed arrays (e.g. having a byte
declared null).  The potential cost is more code to deal with the
additional types (although we end up paying this cost a little bit
even if we treat everything as a list).

Jacques can you elaborate more on where you see harm in the reduction?
 If we can agree on the first question, it might pay to handle the
discussion of bytes/string as a primitive type on a separate thread (I
think it got lost previously due to many issues surfaced in the same
e-mail and a lack of time to do a google hangout.  I apologize for
that).

Thanks,
Micah

[1] https://cwiki.apache.org/confluence/display/Hive/LanguageManual+UDF

On Tue, Jul 12, 2016 at 5:44 PM, Jacques Nadeau <[email protected]> wrote:
> Completely in support of fixed bit width types. Just thinking that it
> shouldn't be done by using a list.
>
> Not sure how the two are orthogonal. What am I missing?
>
> On Tue, Jul 12, 2016 at 5:38 PM, Wes McKinney <[email protected]> wrote:
>>
>> I think it would be good to revisit that discussion. This is somewhat
>> orthogonal -- i.e. having a fixed-width binary type that does not have
>> an accompanying list of n + 1 offsets.
>>
>> On Tue, Jul 12, 2016 at 5:36 PM, Jacques Nadeau <[email protected]>
>> wrote:
>> > I was further reflecting on the previous discussion on lists and
>> > binary/utf8. I think that treating strings (binary or utf8) as lists is
>> > too
>> > much of reduction. This seems like a good example of how they are
>> > treated
>> > differently (beyond the previously discussed not-possible-nullability).
>> > As
>> > such I'm -1 on this change and would prefer if we go back and further
>> > review the concept of treating a string of bits, or bytes as a
>> > "primitive"
>> > type.
>> >
>> > On Tue, Jul 12, 2016 at 5:19 PM, Wes McKinney <[email protected]>
>> > wrote:
>> >
>> >> I'm +1 on this. I've seen fixed-width strings and other things in many
>> >> different contexts. I would say that fixed-width binary is probably
>> >> the primary use case, but you could imaging casting int96 data to
>> >> fixed_list<3, int32>
>> >>
>> >> On Mon, Jul 11, 2016 at 11:24 PM, Micah Kornfield
>> >> <[email protected]>
>> >> wrote:
>> >> > This came up in a code review a while ago, but what do people think
>> >> > of
>> >> > adding a fixed width list type to the memory layout spec.
>> >> >
>> >> > This would have the same layout as the current list type.  Instead of
>> >> > having a separate offset buffer to determine location and length of
>> >> > each list, the length would be stored as part of metadata and offsets
>> >> > would be calculated using multiplication instead of lookups.
>> >> >
>> >> > One use case for this is an easy mapping to the
>> >> > "FIXED_LEN_BYTE_ARRAY"
>> >> > in parquet.
>> >> >
>> >> > If people like the idea I can file a JIRA and update the current
>> >> layout.md.
>> >> >
>> >> > Thanks,
>> >> > -Micah
>> >>
>
>

Re: Proposed new type: Fixed width list

Reply via email to