List<Struct<K, V>> isn’t the only physical representation that makes sense. Because it doesn’t take advantage of the fact that (a) keys can be re-ordered, (b) keys are unique.
So, another viable physical representation would be Struct<List<K>, List<V>>, with the keys sorted. If keys are constant width and in contiguous memory then binary search is very fast. I am not claiming that this physical representation is better than yours. But the fact that there is a more than one means it’s not a no-brainer. Julian > On Jul 18, 2017, at 12:10 PM, Wes McKinney <wesmck...@gmail.com> wrote: > > I recently created https://issues.apache.org/jira/browse/ARROW-1207 > and wanted to discuss on the mailing list to hear opinions about how > to proceed. > > Some systems, like Spark [1], Presto [2], or Drill have a Map<K, V> > composite type. These are sometimes stored in Parquet as a repeated > struct, or in Arrow types List<item: Struct<key: K, value: V>>. > > While we can represent in-memory map data as List<Struct<K, V>>, it > may be useful to add a new logical type to the set of supported > logical types [3]. The idea is that the memory format between Map<K, > V> and List<Struct<K, V>> is identical, so this is strictly a logical > construct, similar to date/time values having the same in-memory > format as the corresponding integer types (int32/int64) > > For Arrow implementation that do not provide a first class Map > container, they could process the data as though it were a repeated > struct. It would be helpful to us in C++ to have an arrow::MapArray > container because we could convert to / from this type and other data > structures like Python dictionaries. It would also be helpful to > faithfully transport the MAP logical type from Parquet [4] > > Let me know what others think. One question I have is whether the > repeated struct in-memory representation makes sense as the canonical > map representation. > > Thanks > Wes > > [1]: > https://spark.apache.org/docs/2.0.2/api/java/org/apache/spark/sql/types/MapType.html > [2]: https://prestodb.io/docs/current/functions/map.html > [3]: https://github.com/apache/arrow/blob/master/format/Schema.fbs#L157 > [4]: > https://github.com/apache/parquet-format/blob/master/src/main/thrift/parquet.thrift#L63