Wes McKinney created ARROW-7394:
-----------------------------------

             Summary: [C++] Implement zero-copy optimizations when performing 
Filter on ChunkedArray
                 Key: ARROW-7394
                 URL: https://issues.apache.org/jira/browse/ARROW-7394
             Project: Apache Arrow
          Issue Type: Improvement
          Components: C++
            Reporter: Wes McKinney


For high-selectivity filters (most elements included), it may be wasteful and 
slow to copy large contiguous ranges of array chunks into the resulting 
ChunkedArray. Instead, we can scan the filter boolean array and slice off 
chunks of the source data rather than copying. 

We will need to empirically determine how large the contiguous range needs to 
be in order to merit the slice-based approach versus simple/native 
materialization. For example, in a filter array like

1 0 1 0 1 0 1 0 1

it would not make sense to slice 5 times because slicing carries some overhead. 
But if we had

1 ... 1 [100 1's] 0 1 ... 1 [100 1's] 0 1 ... 1 [100 1's] 0 1 ... 1 [100 1's] 

then performing 4 slices may be faster than doing a copy materialization. 



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

Reply via email to