> > We don't have any test cases that have a future metadata version. I > made a branch where I added V6 and wrote an IPC message, then found > that I was unable to determine that it was out of bounds (presumably > UBSAN would error, though, but we need a runtime error outside of > ASAN/UBSAN).
To clarify I don't think UBSAN will error on the existing generated code on future versions. I believe we had issues with parquet because the enums did not have an explicit type (compare [1] to [2]) . The version check needs to be done in our code (comparing against MAX [3]). Does that align with your expectations? So we don't get this for free, but I'm not sure I understand why this is difficult? [1] https://github.com/apache/arrow/blob/master/cpp/src/generated/parquet_types.h#L26 [2] https://github.com/apache/arrow/blob/master/cpp/src/generated/Schema_generated.h#L91 [3] https://github.com/apache/arrow/blob/master/cpp/src/generated/Schema_generated.h#L109 On Mon, Jul 13, 2020 at 2:34 PM Wes McKinney <wesmck...@gmail.com> wrote: > On Mon, Jul 13, 2020 at 4:31 PM Micah Kornfield <emkornfi...@gmail.com> > wrote: > > > > > > > > > > > That static cast is currently undefined behavior. > > > > Is ubsan reporting this? When looking into the feature enum I tried to > > understand if that was valid. At the time I read the C++ spec* if the > enum > > has an explicitly declared type, all values in that types range are > > supported. > > We don't have any test cases that have a future metadata version. I > made a branch where I added V6 and wrote an IPC message, then found > that I was unable to determine that it was out of bounds (presumably > UBSAN would error, though, but we need a runtime error outside of > ASAN/UBSAN). > > > The generated enums provide a "max" [1] value that should be comparable > > against. > > < > https://github.com/apache/arrow/blob/master/cpp/src/generated/Schema_generated.h#L109 > > > > > > > > * I am not a C++ lawyer > > > > [1] > > > https://github.com/apache/arrow/blob/master/cpp/src/generated/Schema_generated.h#L109 > > > > On Mon, Jul 13, 2020 at 2:19 PM Wes McKinney <wesmck...@gmail.com> > wrote: > > > > > I've discovered while working on ARROW-9399 that it is very difficult > > > with the Flatbuffers API in C++ to detect a MetadataVersion [1] that > > > is higher than the current version. > > > > > > For example, suppose that 3 or 4 years from now we move from version > > > V5 to version V6. The generated Flatbuffers code looks like this > > > > > > org::apache::arrow::flatbuf::MetadataVersion version() const { > > > return > > > > static_cast<org::apache::arrow::flatbuf::MetadataVersion>(GetField<int16_t>(VT_VERSION, > > > 0)); > > > } > > > > > > That static cast is currently undefined behavior. > > > > > > One way to deal with this would be to add placeholder future versions > > > (e.g. V6 and V7). > > > > > > Another backward-and-forward-compatible option would be to return the > > > version as a short (int16_t) rather than the enum value, which is > > > subject to this brittleness. > > > > > > Either way unfortunately I think adding forward compatibility checks > > > is out of scope for 1.0.0 and the risk is low since we don't > > > anticipate bumping the version anytime soon. > > > > > > Thanks, > > > Wes > > > > > > [1]: https://github.com/apache/arrow/blob/master/format/Schema.fbs#L22 > > > >