I recently created https://issues.apache.org/jira/browse/ARROW-1207 and wanted to discuss on the mailing list to hear opinions about how to proceed.
Some systems, like Spark [1], Presto [2], or Drill have a Map<K, V> composite type. These are sometimes stored in Parquet as a repeated struct, or in Arrow types List<item: Struct<key: K, value: V>>. While we can represent in-memory map data as List<Struct<K, V>>, it may be useful to add a new logical type to the set of supported logical types [3]. The idea is that the memory format between Map<K, V> and List<Struct<K, V>> is identical, so this is strictly a logical construct, similar to date/time values having the same in-memory format as the corresponding integer types (int32/int64) For Arrow implementation that do not provide a first class Map container, they could process the data as though it were a repeated struct. It would be helpful to us in C++ to have an arrow::MapArray container because we could convert to / from this type and other data structures like Python dictionaries. It would also be helpful to faithfully transport the MAP logical type from Parquet [4] Let me know what others think. One question I have is whether the repeated struct in-memory representation makes sense as the canonical map representation. Thanks Wes [1]: https://spark.apache.org/docs/2.0.2/api/java/org/apache/spark/sql/types/MapType.html [2]: https://prestodb.io/docs/current/functions/map.html [3]: https://github.com/apache/arrow/blob/master/format/Schema.fbs#L157 [4]: https://github.com/apache/parquet-format/blob/master/src/main/thrift/parquet.thrift#L63