I'd like to bump this thread, to see if anyone has any comments. If nobody objects I will try to start implementing the changes next week.
Thanks, Micah On Mon, May 20, 2019 at 9:37 PM Micah Kornfield <emkornfi...@gmail.com> wrote: > In the past [1] there hasn't been agreement on the final requirements for > union types. > > Briefly the two approaches that are currently advocated: > 1. Limit unions to only contain one field of each individual type (e.g. > you can't have two separate int32 fields). Java takes this approach. > 2. Generalized unions (unions can have any number of fields with the same > type). C++ takes this approach. > > There was a prior PR [2] that stalled in trying to take this approach with > Java. For writing vectors it seemed to be slower on a benchmark. > > My proposal: We should pursue option 2 (the general approach). There are > already data interchange formats that support it and it would be nice to a > data-model that lets us make the translation between Arrow schemas easy: > 1. Avro Seems to support it [3] (with the exception of complex types) > 2. Protobufs loosely support it [4] via one-of. > > In order to address issues in [2], I propose the following making the > changes/additions to the Java implementation: > 1. Keep the default write-path untouched with the existing class. > 2. Add in a new sparse union class that implements the same interface > that can be used on the read path, and if a client opts in (via direct > construction). > 3. Add in a dense union class (I don't believe Java has one). > > I'm still ramping up the Java code base, so I'd like other Java > contributors to chime in to see if this plan sounds feasible and acceptable. > > Any other thoughts on Unions? > > Thanks, > Micah > > [1] > https://lists.apache.org/thread.html/82ec2049fc3c29de232c9c6962aaee9ec022d581cecb6cf0eb6a8f36@%3Cdev.arrow.apache.org%3E > [2] https://github.com/apache/arrow/pull/987#issuecomment-493231493 > [3] https://github.com/apache/arrow/pull/987#issuecomment-493231493 > [4] https://developers.google.com/protocol-buffers/docs/proto#oneof >