This sounds like a bug from the an interop perspective. Record batches including those with null columns, I think should be round trippable (I'm surprised this isn't covered in our integration testing).
On Mon, Jan 11, 2021 at 8:44 AM Matt Boughen <matt.boug...@gmail.com> wrote: > Hi > > I'm confused by NullVector::getField. > It is a FieldVector so, according to the javadoc, should be "A vector > corresponding to a Field in the schema." > However, getField unconditionally returns a new field - with the name > "$data$" (which is clearly intentional: > https://github.com/apache/arrow/pull/1193). It does not return the field > in the schema that was used to create the vector. > (there is some indication in that PR that NullVector/ZeroVector is only > ever supposed to be an inner vector?) > I have noticed this because for some types, pyarrow serialises fields of > an empty data table to null-type Arrow fields ( > https://github.com/apache/arrow/issues/2110). When deserializing in Java, > the original schema field names are not contained in the names of > VectorSchemaRoot::getFieldVectors (and instead the "$data$" names are > exposed). > > Is this a bug? > > Best > Matt > > > > (p.s. I emailed u...@arrow.apache.org <mailto:u...@arrow.apache.org> > using the web GUI but it still hasn’t shown up, so am trying this instead)