List<Struct<K, V>> isn’t the only physical representation that makes sense. 
Because it doesn’t take advantage of the fact that (a) keys can be re-ordered, 
(b) keys are unique.

So, another viable physical representation would be Struct<List<K>, List<V>>, 
with the keys sorted. If keys are constant width and in contiguous memory then 
binary search is very fast.

I am not claiming that this physical representation is better than yours. But 
the fact that there is a more than one means it’s not a no-brainer.

Julian
 

> On Jul 18, 2017, at 12:10 PM, Wes McKinney <wesmck...@gmail.com> wrote:
> 
> I recently created https://issues.apache.org/jira/browse/ARROW-1207
> and wanted to discuss on the mailing list to hear opinions about how
> to proceed.
> 
> Some systems, like Spark [1], Presto [2], or Drill have a Map<K, V>
> composite type. These are sometimes stored in Parquet as a repeated
> struct, or in Arrow types List<item: Struct<key: K, value: V>>.
> 
> While we can represent in-memory map data as List<Struct<K, V>>, it
> may be useful to add a new logical type to the set of supported
> logical types [3]. The idea is that the memory format between Map<K,
> V> and List<Struct<K, V>> is identical, so this is strictly a logical
> construct, similar to date/time values having the same in-memory
> format as the corresponding integer types (int32/int64)
> 
> For Arrow implementation that do not provide a first class Map
> container, they could process the data as though it were a repeated
> struct. It would be helpful to us in C++ to have an arrow::MapArray
> container because we could convert to / from this type and other data
> structures like Python dictionaries. It would also be helpful to
> faithfully transport the MAP logical type from Parquet [4]
> 
> Let me know what others think. One question I have is whether the
> repeated struct in-memory representation makes sense as the canonical
> map representation.
> 
> Thanks
> Wes
> 
> [1]: 
> https://spark.apache.org/docs/2.0.2/api/java/org/apache/spark/sql/types/MapType.html
> [2]: https://prestodb.io/docs/current/functions/map.html
> [3]: https://github.com/apache/arrow/blob/master/format/Schema.fbs#L157
> [4]: 
> https://github.com/apache/parquet-format/blob/master/src/main/thrift/parquet.thrift#L63

Reply via email to