Yes, the "typeIds" field in the metadata are the codes that correspond
to each type; the actual data uses 1 byte per value

So we might have something like

typeIds: [0, 5, 10]
typeIds buffer: [0, 5, 10, 10, 10, 10, 0, 5, 10, 0]

Relatedly, we will have to start a new mailing list discussion about
reconciling the Union format

- Wes

On Wed, Mar 20, 2019 at 3:49 AM Micah Kornfield <[email protected]> wrote:
>
> Hi Paul,
> TL;DR; I think the the typeIds field you referenced is not the offset for
> dense vectors mentioned by the spec.  I believe (but lack the historical
> context) that it is an outgrowth of the Java implementation that might be
> useful in other contexts.
>
> The requirement is that typeIDs field you referenced is that  has a less
> length less the 127, the bit-width of the ID is immaterial.  Also, the
> typeIDs field and unions aren't fully supported yet.  There is an open PR
> [1] which got stalled on performance and long term direction concerns.
>
> I haven't fully validated this, but my rough understanding is that the Java
> implementation assumes only one array/vector of each type is in a union.
> Roughly, each logical type + Schema.fbs enum parameterization has its own
> type with its own type ID (I think the number is still less 127 but might
> grow larger).  The implementation makes use of this fact to do some
> optimizations.  So when a union (I think only Sparse is supported in Java)
> serializes itself it records each of the type IDs [2] so it can easily map
> back to them.
>
> [1] https://github.com/apache/arrow/pull/987
> [2]
> https://github.com/apache/arrow/blob/73d379f4631cd3013371f60876a52615171e6c3b/java/vector/src/main/codegen/templates/UnionVector.java#L329
>
> On Wed, Mar 20, 2019 at 1:08 AM Paul Taylor <[email protected]> wrote:
>
> > I noticed the the DenseUnion docs[1] says the typeIds buffer is 8-bit
> > signed integers, but in the flatbuffer schema[2] it's typed as int (and
> > flatc generates a function that returns an Int32Array).
> >
> > How are the other implementations treating this buffer, and should we
> > update the docs or the flatbuffers schema?
> >
> > Thanks,
> >
> > Paul
> >
> > 1. https://arrow.apache.org/docs/format/Layout.html#dense-union-type
> >
> > 2.
> >
> > https://github.com/apache/arrow/blob/50bc9f49671afb56594910f49b9bf34e080a70e7/format/Schema.fbs#L92
> >
> >

Reply via email to