Hello Arrow Team, My name is Igor Guzenko. I'm currently working on task related to complex types in Apache Drill [1], and bumped into an issue that Drill hasn't appropriate vector for representing canonical (java-like) Map datatype [2]. So I'm looking for inspiration how the efficient columnar map vector can be implemented. I believe that such map can be composed of three value vectors (like in Hive): 1) keys vector; 2) values vector; 3) offsets vector which points to start index of each next map in two previous vectors. But there is a major issue with such implementation. It's hard to quickly retrieve values using key, some advanced tricks required to do this efficiently.
I would be happy if you guys can share your expertise on this topic. After learning some history of changes in Arrow, I found that old map vector was renamed to struct and map datatype was declared as list of structs, each of them containing vector for keys and values. I'm still very interested how Maps work internally in Arrow and I'd like to implement similar one in Drill (so later future integration with Arrow could be made more smoothly). Also, if you need new vector for Map too, I would be happy to contribute it to both Drill and Arrow projects. [1] : https://issues.apache.org/jira/browse/DRILL-3290 [2] : https://github.com/paul-rogers/drill/wiki/Drill-Maps Thanks for attention, Igor Guzenko