Le 29/04/2021 à 02:26, Weston Pace a écrit :
There is also a potential format change coming up (new interval type).
Ok, so more accurately, it is not a format change, it's a format
addition ;-)
This sounds pedantic but a format change would potentially break
compatibility (for example if some 32-bit encoded field would suddenly
become 64-bit encoded). The format embodies a "MetadataVersion" field
which tracks those changes:
https://github.com/apache/arrow/blob/master/format/Schema.fbs#L22-L43
Conversely, adding a new DataType does not break compatibility. It's
just something that not all implementations might recognize - but just
as they might not recognize all currently defined DataTypes.
In the past, we don't seem to have bumped the format version when doing
backwards-compatible additions. I don't know if that's the optimal
policy but we should not bump the format version erratically just
because this comes up in a JIRA or mailing-list discussion. If we can't
discipline ourselves to do it reliably and consistenly, then let's just
not do it.
We also have a "Feature" field that, to my knowledge, is supported
(read, written) by no existing implementation:
https://github.com/apache/arrow/blob/master/format/Schema.fbs#L45-L72
In addition, is there value in aligning format adoption across languages?
For example, if Rust adopts format version 1.1 in version 5 and
pyarrow does not then users will need to consult a table to figure out
which versions are interoperable.
There is no interoperability breakage that I can think of here. There
is a limitation that some implementations may not support all datatypes,
but that's the case already (hence the feature matrix that already
exists :-)).
Regards
Antoine.