Le 29/04/2021 à 02:26, Weston Pace a écrit :

There is also a potential format change coming up (new interval type).

Ok, so more accurately, it is not a format change, it's a format addition ;-)

This sounds pedantic but a format change would potentially break compatibility (for example if some 32-bit encoded field would suddenly become 64-bit encoded). The format embodies a "MetadataVersion" field which tracks those changes:
https://github.com/apache/arrow/blob/master/format/Schema.fbs#L22-L43

Conversely, adding a new DataType does not break compatibility. It's just something that not all implementations might recognize - but just as they might not recognize all currently defined DataTypes.

In the past, we don't seem to have bumped the format version when doing backwards-compatible additions. I don't know if that's the optimal policy but we should not bump the format version erratically just because this comes up in a JIRA or mailing-list discussion. If we can't discipline ourselves to do it reliably and consistenly, then let's just not do it.

We also have a "Feature" field that, to my knowledge, is supported (read, written) by no existing implementation:
https://github.com/apache/arrow/blob/master/format/Schema.fbs#L45-L72

In addition, is there value in aligning format adoption across languages?

For example, if Rust adopts format version 1.1 in version 5 and
pyarrow does not then users will need to consult a table to figure out
which versions are interoperable.

There is no interoperability breakage that I can think of here. There is a limitation that some implementations may not support all datatypes, but that's the case already (hence the feature matrix that already exists :-)).

Regards

Antoine.

Reply via email to