Wes McKinney created ARROW-38:
---------------------------------

             Summary: C++: Algorithms for using nested types in a hash table 
context
                 Key: ARROW-38
                 URL: https://issues.apache.org/jira/browse/ARROW-38
             Project: Apache Arrow
          Issue Type: New Feature
          Components: C++
            Reporter: Wes McKinney


Computing hash values (and performing equality comparisons) for top-level slots 
in nested-type data (for example, computing DISTINCT on a 
{{List<List<Int32>>}}, related: ARROW-32) can be fairly complex. Additionally, 
value slots at any level of the type tree can be null. 

We should explore various algorithms for their performance and memory use in 
practical settings. For example, one can compute a contiguous "record" / byte 
array resulting from a depth-first traversal of a single value slot for the 
purposes of computing a hash value or comparing with another slot. If anyone 
has other ideas from past experiences I would be keen to learn more.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Reply via email to