Thanks for quick response, I'll update the discussion in case of progress.
On Mon, Feb 25, 2019 at 6:01 PM Wes McKinney <wesmck...@gmail.com> wrote: > > hi Igor, > > We have Map as a top-level logical data type in the columnar metadata: > > https://github.com/apache/arrow/blob/master/format/Schema.fbs#L55 > > There isn't anything more than this right now. We have not implemented > container types in Java or C++ yet, for the Map type, but I don't view > it to be an extremely large project because Map is an alias for a list > of structs. If you'd like to contribute this to the Java library it > would be appreciated. > > For fast key retrieval, the keys should be sorted (and this property > would be set in the metadata) so that a binary search can be used > instead of linear search. > > Thanks, > Wes > > On Mon, Feb 25, 2019 at 8:39 AM Ihor Huzenko <ihor.huzenko....@gmail.com> > wrote: > > > > Hello Arrow Team, > > > > My name is Igor Guzenko. I'm currently working on task related to > > complex types in Apache Drill [1], and bumped into an issue that Drill > > hasn't > > appropriate vector for representing canonical (java-like) Map datatype > > [2]. So I'm looking for inspiration how the efficient > > columnar map vector can be implemented. I believe that such map can be > > composed of three value vectors (like in Hive): > > 1) keys vector; > > 2) values vector; > > 3) offsets vector which points to start index of each next map in > > two previous vectors. > > But there is a major issue with such implementation. It's hard to > > quickly retrieve values using key, some advanced tricks required > > to do this efficiently. > > > > I would be happy if you guys can share your expertise on this topic. > > After learning some history of changes in Arrow, I found > > that old map vector was renamed to struct and map datatype was > > declared as list of structs, each of them containing vector for keys > > and values. > > I'm still very interested how Maps work internally in Arrow and I'd > > like to implement similar one in Drill (so later future integration > > with Arrow could be made more smoothly). Also, if you need new vector > > for Map too, I would be happy to contribute it to both Drill and > > Arrow projects. > > > > [1] : https://issues.apache.org/jira/browse/DRILL-3290 > > [2] : https://github.com/paul-rogers/drill/wiki/Drill-Maps > > > > Thanks for attention, > > Igor Guzenko