I'd like to bump this thread, to see if anyone has any comments.  If nobody
objects I will try to start implementing the changes next week.

Thanks,
Micah

On Mon, May 20, 2019 at 9:37 PM Micah Kornfield <emkornfi...@gmail.com>
wrote:

> In the past [1] there hasn't been agreement on the final requirements for
> union types.
>
> Briefly the two approaches that are currently advocated:
> 1.  Limit unions to only contain one field of each individual type (e.g.
> you can't have two separate int32 fields).  Java takes this approach.
> 2.  Generalized unions (unions can have any number of fields with the same
> type).  C++ takes this approach.
>
> There was a prior PR [2] that stalled in trying to take this approach with
> Java.  For writing vectors it seemed to be slower on a benchmark.
>
> My proposal:  We should pursue option 2 (the general approach).  There are
> already data interchange formats that support it and it would be nice to a
> data-model that lets us make the translation between Arrow schemas easy:
> 1.  Avro Seems to support it [3] (with the exception of complex types)
> 2.  Protobufs loosely support it [4] via one-of.
>
> In order to address issues in [2], I propose the following making the
> changes/additions to the Java implementation:
> 1.  Keep the default write-path untouched with the existing class.
> 2.  Add in a new sparse union class that implements the same interface
> that can be used on the read path, and if a client opts in (via direct
> construction).
> 3.  Add in a dense union class (I don't believe Java has one).
>
> I'm still ramping up the Java code base, so I'd like other Java
> contributors to chime in to see if this plan sounds feasible and acceptable.
>
> Any other thoughts on Unions?
>
> Thanks,
> Micah
>
> [1]
> https://lists.apache.org/thread.html/82ec2049fc3c29de232c9c6962aaee9ec022d581cecb6cf0eb6a8f36@%3Cdev.arrow.apache.org%3E
> [2] https://github.com/apache/arrow/pull/987#issuecomment-493231493
> [3] https://github.com/apache/arrow/pull/987#issuecomment-493231493
> [4] https://developers.google.com/protocol-buffers/docs/proto#oneof
>

Reply via email to