Hello Arrow devs, I'm working on some breaking changes for how C++ handles type equality with the field names within ListType and MapType. [1] I call these "internal field names", since--unlike the fields in StructType--they often don't provide much information that isn't already implied by their position. (Though Dewey did note some exceptions to this in the Jira. [2])
The PR adds an option called "check_internal_field_names" to configure whether to check equality of these names when checking equality of these types. Currently, we do this inconsistently: we always check them for ListType but never for MapType. In the C++ implementation, we have two equals methods: a strict one and a loose one. The strict one, TypeEqual, checks field metadata by default, while the loose one, DataType.Equals, does not. Given this precedent, I made the default for "check_internal_field_names" align with the defaults of the "check_metadata" flag. But it's worth noting that these settings are configurable in either method; they just have different defaults. The motivation for this work is in anticipation of turning on compliant nested types in Parquet. [3] Parquet requires that ListTypes are always written with the "element" field name and has specific requirements for MapType as well. This means these fields can lose their field names when roundtripped through Parquet, so it's helpful to be able to check equality while ignoring these field names. Of course, changes like these can have unintended consequences, so I wanted to alert other developers. If you have feedback or concerns, please discuss. Best, Will Jones [1] https://github.com/apache/arrow/pull/13851 [2] https://issues.apache.org/jira/browse/ARROW-14999?focusedCommentId=17581439&page=com.atlassian.jira.plugin.system.issuetabpanels%3Acomment-tabpanel#comment-17581439 [3] https://issues.apache.org/jira/browse/ARROW-14196