Yes, the "typeIds" field in the metadata are the codes that correspond to each type; the actual data uses 1 byte per value
So we might have something like typeIds: [0, 5, 10] typeIds buffer: [0, 5, 10, 10, 10, 10, 0, 5, 10, 0] Relatedly, we will have to start a new mailing list discussion about reconciling the Union format - Wes On Wed, Mar 20, 2019 at 3:49 AM Micah Kornfield <[email protected]> wrote: > > Hi Paul, > TL;DR; I think the the typeIds field you referenced is not the offset for > dense vectors mentioned by the spec. I believe (but lack the historical > context) that it is an outgrowth of the Java implementation that might be > useful in other contexts. > > The requirement is that typeIDs field you referenced is that has a less > length less the 127, the bit-width of the ID is immaterial. Also, the > typeIDs field and unions aren't fully supported yet. There is an open PR > [1] which got stalled on performance and long term direction concerns. > > I haven't fully validated this, but my rough understanding is that the Java > implementation assumes only one array/vector of each type is in a union. > Roughly, each logical type + Schema.fbs enum parameterization has its own > type with its own type ID (I think the number is still less 127 but might > grow larger). The implementation makes use of this fact to do some > optimizations. So when a union (I think only Sparse is supported in Java) > serializes itself it records each of the type IDs [2] so it can easily map > back to them. > > [1] https://github.com/apache/arrow/pull/987 > [2] > https://github.com/apache/arrow/blob/73d379f4631cd3013371f60876a52615171e6c3b/java/vector/src/main/codegen/templates/UnionVector.java#L329 > > On Wed, Mar 20, 2019 at 1:08 AM Paul Taylor <[email protected]> wrote: > > > I noticed the the DenseUnion docs[1] says the typeIds buffer is 8-bit > > signed integers, but in the flatbuffer schema[2] it's typed as int (and > > flatc generates a function that returns an Int32Array). > > > > How are the other implementations treating this buffer, and should we > > update the docs or the flatbuffers schema? > > > > Thanks, > > > > Paul > > > > 1. https://arrow.apache.org/docs/format/Layout.html#dense-union-type > > > > 2. > > > > https://github.com/apache/arrow/blob/50bc9f49671afb56594910f49b9bf34e080a70e7/format/Schema.fbs#L92 > > > >
