Two questions come to mind. 1. Is it useful to have fixed width with list types exclusive of binary types? 2. Should binary/string types have there own separate memory layout/be a primitive type?
IMO, I think I think the answer to 1 is yes. Another example of a use-case where this is handy is for the outputs of the aggregate functions "histogram_numeric" and "percentile_approx" in Apache Hive [1]. For #2, I'm still not sure I see the a clear benefit or harm either way. The benefit of having there own type, is by definition, you don't need to worry about ill formed arrays (e.g. having a byte declared null). The potential cost is more code to deal with the additional types (although we end up paying this cost a little bit even if we treat everything as a list). Jacques can you elaborate more on where you see harm in the reduction? If we can agree on the first question, it might pay to handle the discussion of bytes/string as a primitive type on a separate thread (I think it got lost previously due to many issues surfaced in the same e-mail and a lack of time to do a google hangout. I apologize for that). Thanks, Micah [1] https://cwiki.apache.org/confluence/display/Hive/LanguageManual+UDF On Tue, Jul 12, 2016 at 5:44 PM, Jacques Nadeau <jacq...@apache.org> wrote: > Completely in support of fixed bit width types. Just thinking that it > shouldn't be done by using a list. > > Not sure how the two are orthogonal. What am I missing? > > On Tue, Jul 12, 2016 at 5:38 PM, Wes McKinney <wesmck...@gmail.com> wrote: >> >> I think it would be good to revisit that discussion. This is somewhat >> orthogonal -- i.e. having a fixed-width binary type that does not have >> an accompanying list of n + 1 offsets. >> >> On Tue, Jul 12, 2016 at 5:36 PM, Jacques Nadeau <jacq...@apache.org> >> wrote: >> > I was further reflecting on the previous discussion on lists and >> > binary/utf8. I think that treating strings (binary or utf8) as lists is >> > too >> > much of reduction. This seems like a good example of how they are >> > treated >> > differently (beyond the previously discussed not-possible-nullability). >> > As >> > such I'm -1 on this change and would prefer if we go back and further >> > review the concept of treating a string of bits, or bytes as a >> > "primitive" >> > type. >> > >> > On Tue, Jul 12, 2016 at 5:19 PM, Wes McKinney <wesmck...@gmail.com> >> > wrote: >> > >> >> I'm +1 on this. I've seen fixed-width strings and other things in many >> >> different contexts. I would say that fixed-width binary is probably >> >> the primary use case, but you could imaging casting int96 data to >> >> fixed_list<3, int32> >> >> >> >> On Mon, Jul 11, 2016 at 11:24 PM, Micah Kornfield >> >> <emkornfi...@gmail.com> >> >> wrote: >> >> > This came up in a code review a while ago, but what do people think >> >> > of >> >> > adding a fixed width list type to the memory layout spec. >> >> > >> >> > This would have the same layout as the current list type. Instead of >> >> > having a separate offset buffer to determine location and length of >> >> > each list, the length would be stored as part of metadata and offsets >> >> > would be calculated using multiplication instead of lookups. >> >> > >> >> > One use case for this is an easy mapping to the >> >> > "FIXED_LEN_BYTE_ARRAY" >> >> > in parquet. >> >> > >> >> > If people like the idea I can file a JIRA and update the current >> >> layout.md. >> >> > >> >> > Thanks, >> >> > -Micah >> >> > >