hi all,

So one of the conflicts that keeps coming up re: unions is the
following two notions:

* A union as a "variant of primitives" type. Here, values are
constrained to be one of Arrow's primitive types (integer, floating
point, string, boolean, etc.). The value types are statically declared
and thus the union type codes have a fixed interpretation (e.g. 0 is
always boolean, 1 always int8, etc. and so on).

* A union as a composition of any child types (including nested
types). In this model, a union internally is like a struct plus type
codes, which refer to a collection of any fields, which may include
other nested types

IMHO, these are two different and totally valid things to support. The
former can be viewed as a special case of the latter, but there are
benefits to computation engines to rely on the assumptions of the
former (like the type codes having a static interpretation rather than
a dynamic one).

Not having the latter union type seems troublesome to me. For example,
other data serialization systems support this

* oneof in Protocol Buffers
https://developers.google.com/protocol-buffers/docs/proto#oneof
* union in Flatbuffers https://google.github.io/flatbuffers/md__schemas.html
* union in Thrift (not documented very well unfortunately)
* union in Avro (I think this is the same)

Thanks
Wes

On Thu, Jan 11, 2018 at 11:16 AM, Li Jin <ice.xell...@gmail.com> wrote:
> Hi All,
>
> Here is a summary of the state and issue of union vector (to the best of my
> knowledge).
>
> I have summarized some possible solutions based on the discussion so far.
> However, this is not a proposal as there are still a lot of things that are
> not clear at this moment.
>
> I'd like to share this as a base for further discussion and move towards a
> proposal. Thank you.
>
> https://docs.google.com/document/d/1zSwSZDVxgmoDol_PKfyTDHD5wbw1eALs5eTS9kyjtYU/edit?usp=sharing
>
> Li

Reply via email to