hi Igor,

We have Map as a top-level logical data type in the columnar metadata:

https://github.com/apache/arrow/blob/master/format/Schema.fbs#L55

There isn't anything more than this right now. We have not implemented
container types in Java or C++ yet, for the Map type, but I don't view
it to be an extremely large project because Map is an alias for a list
of structs. If you'd like to contribute this to the Java library it
would be appreciated.

For fast key retrieval, the keys should be sorted (and this property
would be set in the metadata) so that a binary search can be used
instead of linear search.

Thanks,
Wes

On Mon, Feb 25, 2019 at 8:39 AM Ihor Huzenko <ihor.huzenko....@gmail.com> wrote:
>
> Hello Arrow Team,
>
> My name is Igor Guzenko. I'm currently working on task related to
> complex types in Apache Drill [1], and bumped into an issue that Drill
> hasn't
> appropriate vector for representing canonical (java-like) Map datatype
> [2]. So I'm looking for inspiration how the efficient
> columnar map vector can be implemented. I believe that such map can be
> composed of three value vectors (like in Hive):
>   1) keys vector;
>   2) values vector;
>   3) offsets vector which points to start index of each next map in
> two previous vectors.
> But there is a major issue with such implementation. It's hard to
> quickly retrieve values using key, some advanced tricks required
> to do this efficiently.
>
> I would be happy if you guys can share your expertise on this topic.
> After learning some history of changes in Arrow, I found
> that old map vector was renamed to struct and map datatype was
> declared as list of structs, each of them containing vector for keys
> and values.
> I'm still very interested how Maps work internally in Arrow and I'd
> like to implement similar one in Drill (so later future integration
> with Arrow could be made more smoothly). Also, if you need new vector
> for Map too, I would be happy to contribute it to both Drill and
> Arrow projects.
>
> [1] : https://issues.apache.org/jira/browse/DRILL-3290
> [2] : https://github.com/paul-rogers/drill/wiki/Drill-Maps
>
> Thanks for attention,
> Igor Guzenko

Reply via email to