Thanks Wes, This makes sense. +1 on the "Logical Types / IPC layout document" is there a JIRA open for this?
I'll open a JIRA item to change the inheritance of string/binary in the C++ code base. Thanks, Micah On Sun, Aug 14, 2016 at 10:51 PM, Wes McKinney <wesmck...@gmail.com> wrote: > On Fri, Aug 12, 2016 at 5:57 PM, Micah Kornfield <emkornfi...@gmail.com> > wrote: > > Sorry for the late reply. > > > > This all sounds reasonable to me. But I'm not sure I understand exactly > > what you mean by > > > >> Accordingly, in the metadata and in RPC/IPC scenarios, binary/string > >> would be a single array unit in the buffer stream and flattened Field > >> metadata rather than nested types (2 array units as they are > >> presently). > > > > > > The way I read it this seems to me to contradict the > cross-implementation as > > "List<UInt8-not null>"? > > > > Thanks, > > Micah > > > > I think we can resolve this by starting a "Logical Types and IPC/RPC > layout" specification document. > > The schema metadata > (https://github.com/apache/arrow/blob/master/format/Message.fbs) is, > as I understand it, strictly the domain of logical types. I believe > there is some minor conflation of the notions of primitive physical > types and primitive logical types. > > While String / Binary have identical physical layouts to List<UInt8 > not null>, in the domain of logical types and IPC, what we are saying > is that these types are: > > - logically speaking: primitive, non-nested types > - their IPC layout is the flattened version of the nested List<UInt8> > counterpart -- a single Field node having String type (with a null > count, etc.), and 3 memory buffers: validity bitmap, offsets, and > data. Structurally on the wire / in shared memory (compared with > List<UInt8 not null>) the only difference is the Field metadata (since > if null count is 0 for the inner UInt8 values, then there is only a > single buffer) -- one node versus two > > Let me know if this does not make sense. > > To move this forward I propose to begin a Logical Types / IPC layout > document and begin to document the mapping between logical types and > their physical in-memory representation and layout on the wire. > > - Wes >