Hi Antoine, For Java, the physical child id is the same as the logical type code, as the index of each child vector is the code (ordinal) of the vector's minor type. This leads to a problem, that only a single vector for each type can exist in a union vector, so strictly speaking, the Java implementation is not consistent with the Arrow specification. (This is indicated by Micah long ago).
Best, Liya Fan On Tue, Nov 26, 2019 at 9:59 PM Francois Saint-Jacques < fsaintjacq...@gmail.com> wrote: > It seems that the array_union_test.cc does the latter, look at how > `expected_types` is constructed. I opened > https://issues.apache.org/jira/browse/ARROW-7265 . > > Wes, is the intended usage of type_ids to allow a producer to pass a > subset columns of unions without modifying the type codes? > > François > > > On Thu, Nov 21, 2019 at 10:51 AM Antoine Pitrou <anto...@python.org> > wrote: > > > > > > Hello, > > > > There's some ambiguity whether a union array's "types" buffer stores > > physical child ids, or logical type codes. > > > > Some of our C++ tests assume the former: > > > https://github.com/apache/arrow/blob/master/cpp/src/arrow/array_union_test.cc#L107-L123 > > > > Some of our C++ tests assume the latter: > > > https://github.com/apache/arrow/blob/master/cpp/src/arrow/array_union_test.cc#L311-L326 > > > https://github.com/apache/arrow/blob/master/cpp/src/arrow/ipc/json_simple_test.cc#L943-L955 > > > > Critically, no validation of union data is currently implemented in C++ > > (ARROW-6157). I can't parse the Java source code. > > > > Regards > > > > Antoine. > > >