Zhuang Tianyi created ARROW-4437: ------------------------------------ Summary: [Python] Add builder API Key: ARROW-4437 URL: https://issues.apache.org/jira/browse/ARROW-4437 Project: Apache Arrow Issue Type: New Feature Components: Python Environment: Python 3.7.0 pyarrow-0.12.0 Reporter: Zhuang Tianyi
There is no [Array Builder|https://arrow.apache.org/docs/cpp/api/builder.html#_CPPv3N5arrow12ArrayBuilderE] API in python bindings. When I generate data from a stream, I have to build a python list (high overhead) or pandas, then finalize it by call pa.array with copy operation. It seems like that we can build an Array directly from some (two or three) pa.ResizableBuffer in O(1) time. It's possible that maintain these buffers (value buffer, null bitmap, offset buffer) manually by current exported API, but not safe enough. I found undocumented StringBuilder API in [python/pyarrow/builder.pxi|https://github.com/apache/arrow/blob/master/python/pyarrow/builder.pxi], corresponding to [https://arrow.apache.org/docs/cpp/api/builder.html#classarrow_1_1_string_builder]. Will other ArrayBuilder APIs to be add in python binding? ---- Something more a BatchBuilder API is better if possible. -- This message was sent by Atlassian JIRA (v7.6.3#76005)