[jira] [Commented] (ARROW-39) C++: Logical chunked arrays / columns: conforming to fixed chunk sizes

Rok Mihevc (Jira) Tue, 10 Jan 2023 23:07:23 -0800


    [ 
https://issues.apache.org/jira/browse/ARROW-39?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17657140#comment-17657140
 ]


Rok Mihevc commented on ARROW-39:
---------------------------------

This issue has been migrated to [issue 
#15413|https://github.com/apache/arrow/issues/15413] on GitHub. Please see the 
[migration documentation|https://github.com/apache/arrow/issues/14542] for 
further details.

> C++: Logical chunked arrays / columns: conforming to fixed chunk sizes
> ----------------------------------------------------------------------
>
>                 Key: ARROW-39
>                 URL: https://issues.apache.org/jira/browse/ARROW-39
>             Project: Apache Arrow
>          Issue Type: New Feature
>          Components: C++
>            Reporter: Wes McKinney
>            Assignee: Wes McKinney
>            Priority: Major
>             Fix For: 0.3.0
>
>
> Implementing algorithms on large arrays assembled in physical chunks is 
> problematic if:
> - The chunks are not all the same size (except possibly the last chunk, which 
> can be less). Otherwise, retrieving a particular element is in general a 
> O(log num_chunks) operation
> - The chunk size is not a power of 2. Computing integer modulus with a 
> non-multiple of 2 requires more clock cycles (in other words, {{i % p}} is 
> much more expensive to compute than {{i & (p - 1)}}, but the latter only 
> works if p is a power of 2)
> Most of the Arrow data adapters will either feature contiguous data (1 chunk, 
> so chunking is not an issue) or a regular chunk size, so this isn't as much 
> of an immediate concern, but we should consider making it a contract of any 
> data structures dealing in multiple arrays. 
> In general, it would be preferable to reorganize memory into either a regular 
> chunksize (like 64K values per chunk) or a contiguous memory region. I would 
> prefer for the moment to not to invest significant energy in writing 
> algorithms for data with irregular chunk sizes. 



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

[jira] [Commented] (ARROW-39) C++: Logical chunked arrays / columns: conforming to fixed chunk sizes

Reply via email to