[ 
https://issues.apache.org/jira/browse/ARROW-4437?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Rok Mihevc updated ARROW-4437:
------------------------------
    External issue URL: https://github.com/apache/arrow/issues/20996

> [Python] Add builder API
> ------------------------
>
>                 Key: ARROW-4437
>                 URL: https://issues.apache.org/jira/browse/ARROW-4437
>             Project: Apache Arrow
>          Issue Type: New Feature
>          Components: Python
>         Environment: Python 3.7.0 pyarrow-0.12.0
>            Reporter: Zhuang Tianyi
>            Priority: Minor
>
> There is no [Array 
> Builder|https://arrow.apache.org/docs/cpp/api/builder.html#_CPPv3N5arrow12ArrayBuilderE]
>  API in python bindings. When I generate data from a stream, I have to build 
> a python list (high overhead) or pandas, then finalize it by call pa.array 
> with copy operation. It seems like that we can build an Array directly from 
> some (two or three) pa.ResizableBuffer in O(1) time.
> It's possible that maintain these buffers (value buffer, null bitmap, offset 
> buffer) manually by current exported API, but not safe enough.
>  
> I found undocumented StringBuilder API in 
> [python/pyarrow/builder.pxi|https://github.com/apache/arrow/blob/master/python/pyarrow/builder.pxi],
>  corresponding to 
> [https://arrow.apache.org/docs/cpp/api/builder.html#classarrow_1_1_string_builder].
>  Will other ArrayBuilder APIs to be add in python binding?
>  
> ----
> Something more
> a BatchBuilder API is better if possible.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

Reply via email to