Zhuang Tianyi created ARROW-4437:
------------------------------------

             Summary: [Python] Add builder API
                 Key: ARROW-4437
                 URL: https://issues.apache.org/jira/browse/ARROW-4437
             Project: Apache Arrow
          Issue Type: New Feature
          Components: Python
         Environment: Python 3.7.0 pyarrow-0.12.0
            Reporter: Zhuang Tianyi


There is no [Array 
Builder|https://arrow.apache.org/docs/cpp/api/builder.html#_CPPv3N5arrow12ArrayBuilderE]
 API in python bindings. When I generate data from a stream, I have to build a 
python list (high overhead) or pandas, then finalize it by call pa.array with 
copy operation. It seems like that we can build an Array directly from some 
(two or three) pa.ResizableBuffer in O(1) time.

It's possible that maintain these buffers (value buffer, null bitmap, offset 
buffer) manually by current exported API, but not safe enough.

 

I found undocumented StringBuilder API in 
[python/pyarrow/builder.pxi|https://github.com/apache/arrow/blob/master/python/pyarrow/builder.pxi],
 corresponding to 
[https://arrow.apache.org/docs/cpp/api/builder.html#classarrow_1_1_string_builder].
 Will other ArrayBuilder APIs to be add in python binding?

 
----
Something more

a BatchBuilder API is better if possible.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

Reply via email to