Wes McKinney created ARROW-38: --------------------------------- Summary: C++: Algorithms for using nested types in a hash table context Key: ARROW-38 URL: https://issues.apache.org/jira/browse/ARROW-38 Project: Apache Arrow Issue Type: New Feature Components: C++ Reporter: Wes McKinney
Computing hash values (and performing equality comparisons) for top-level slots in nested-type data (for example, computing DISTINCT on a {{List<List<Int32>>}}, related: ARROW-32) can be fairly complex. Additionally, value slots at any level of the type tree can be null. We should explore various algorithms for their performance and memory use in practical settings. For example, one can compute a contiguous "record" / byte array resulting from a depth-first traversal of a single value slot for the purposes of computing a hash value or comparing with another slot. If anyone has other ideas from past experiences I would be keen to learn more. -- This message was sent by Atlassian JIRA (v6.3.4#6332)