hi all, So one of the conflicts that keeps coming up re: unions is the following two notions:
* A union as a "variant of primitives" type. Here, values are constrained to be one of Arrow's primitive types (integer, floating point, string, boolean, etc.). The value types are statically declared and thus the union type codes have a fixed interpretation (e.g. 0 is always boolean, 1 always int8, etc. and so on). * A union as a composition of any child types (including nested types). In this model, a union internally is like a struct plus type codes, which refer to a collection of any fields, which may include other nested types IMHO, these are two different and totally valid things to support. The former can be viewed as a special case of the latter, but there are benefits to computation engines to rely on the assumptions of the former (like the type codes having a static interpretation rather than a dynamic one). Not having the latter union type seems troublesome to me. For example, other data serialization systems support this * oneof in Protocol Buffers https://developers.google.com/protocol-buffers/docs/proto#oneof * union in Flatbuffers https://google.github.io/flatbuffers/md__schemas.html * union in Thrift (not documented very well unfortunately) * union in Avro (I think this is the same) Thanks Wes On Thu, Jan 11, 2018 at 11:16 AM, Li Jin <ice.xell...@gmail.com> wrote: > Hi All, > > Here is a summary of the state and issue of union vector (to the best of my > knowledge). > > I have summarized some possible solutions based on the discussion so far. > However, this is not a proposal as there are still a lot of things that are > not clear at this moment. > > I'd like to share this as a base for further discussion and move towards a > proposal. Thank you. > > https://docs.google.com/document/d/1zSwSZDVxgmoDol_PKfyTDHD5wbw1eALs5eTS9kyjtYU/edit?usp=sharing > > Li