> On Feb 25, 2019, at 8:02 PM, Ihor Huzenko <ihor.huzenko....@gmail.com> wrote:
>
> Hello Arrow Team,
>
> My name is Igor Guzenko. I'm currently working on task related to
> complex types in Apache Drill [1], and bumped into an issue that Drill
> hasn't
> appropriate vector for representing canonical (java-like) Map datatype
> [2]. So I'm looking for inspiration how the efficient
> columnar map vector can be implemented. I believe that such map can be
> composed of three value vectors (like in Hive):
> 1) keys vector;
> 2) values vector;
> 3) offsets vector which points to start index of each next map in
> two previous vectors.
> But there is a major issue with such implementation. It's hard to
> quickly retrieve values using key, some advanced tricks required
> to do this efficiently.
Curious, does hive solve this problem ? i.e efficiently retrieving values using
key ?
>
> I would be happy if you guys can share your expertise on this topic.
> After learning some history of changes in Arrow, I found
> that old map vector was renamed to struct and map datatype was
> declared as list of structs, each of them containing vector for keys
> and values.
> I'm still very interested how Maps work internally in Arrow and I'd
> like to implement similar one in Drill (so later future integration
> with Arrow could be made more smoothly). Also, if you need new vector
> for Map too, I would be happy to contribute it to both Drill and
> Arrow projects.
Arrow doesn’t have an implementation for MAP type yet. The plan is to do it
using a list of structs - so, it’ll end up very similar to what you suggest (an
offsets array in the list vector, and child vectors for the keys/values).
https://issues.apache.org/jira/browse/ARROW-1279
>
> [1] : https://issues.apache.org/jira/browse/DRILL-3290
> [2] : https://github.com/paul-rogers/drill/wiki/Drill-Maps
>
> Thanks for attention,
> Igor Guzenko