[ https://issues.apache.org/jira/browse/ARROW-4437?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
Rok Mihevc updated ARROW-4437: ------------------------------ External issue URL: https://github.com/apache/arrow/issues/20996 > [Python] Add builder API > ------------------------ > > Key: ARROW-4437 > URL: https://issues.apache.org/jira/browse/ARROW-4437 > Project: Apache Arrow > Issue Type: New Feature > Components: Python > Environment: Python 3.7.0 pyarrow-0.12.0 > Reporter: Zhuang Tianyi > Priority: Minor > > There is no [Array > Builder|https://arrow.apache.org/docs/cpp/api/builder.html#_CPPv3N5arrow12ArrayBuilderE] > API in python bindings. When I generate data from a stream, I have to build > a python list (high overhead) or pandas, then finalize it by call pa.array > with copy operation. It seems like that we can build an Array directly from > some (two or three) pa.ResizableBuffer in O(1) time. > It's possible that maintain these buffers (value buffer, null bitmap, offset > buffer) manually by current exported API, but not safe enough. > > I found undocumented StringBuilder API in > [python/pyarrow/builder.pxi|https://github.com/apache/arrow/blob/master/python/pyarrow/builder.pxi], > corresponding to > [https://arrow.apache.org/docs/cpp/api/builder.html#classarrow_1_1_string_builder]. > Will other ArrayBuilder APIs to be add in python binding? > > ---- > Something more > a BatchBuilder API is better if possible. -- This message was sent by Atlassian Jira (v8.20.10#820010)