>
> I think having a chunked array with multiple vector buffers would be
> ideal, similar to C++. It might take a fair amount of work to add this but
> would open up a lot more functionality.


There are potentially two different use-cases.  ChunkedArray is
logical/lazy concatenation where as concat, physically rebuilds the vectors
to be a single vector.

On Fri, Nov 8, 2019 at 10:51 AM Bryan Cutler <cutl...@gmail.com> wrote:

> I think having a chunked array with multiple vector buffers would be
> ideal, similar to C++. It might take a fair amount of work to add this but
> would open up a lot more functionality. As for the API,
> VectorSchemaRoot.concat(Collection<VectorSchemaRoot>) seems good to me.
>
> On Thu, Nov 7, 2019 at 12:09 AM Fan Liya <liya.fa...@gmail.com> wrote:
>
>> Hi Micah,
>>
>> Thanks for bringing this up.
>>
>> > 1.  An efficient solution already exists? It seems like TransferPair
>> implementations could possibly be improved upon or have they already been
>> optimized?
>>
>> Fundamnentally, memory copy is unavoidable, IMO, because the source and
>> targe memory regions are likely to be in non-contiguous regions.
>> An alternative is to make ArrowBuf support a number of non-contiguous
>> memory regions. However, that would harm the perfomance of ArrowBuf, and
>> ArrowBuf is the core of the Arrow library.
>>
>> > 2.  What the preferred API for doing this would be?  Some options i can
>> think of:
>>
>> > * VectorSchemaRoot.concat(Collection<VectorSchemaRoot>)
>> > * VectorSchemaRoot.from(Collection<ArrowRecordBatch>)
>> > * VectorLoader.load(Collection<ArrowRecordBatch>)
>>
>> IMO, option 1 is required, as we have scenarios that need to concate
>> vectors/VectorSchemaRoots (e.g. restore the complete dictionary from delta
>> dictionaries).
>> Options 2 and 3 are optional for us.
>>
>> Best,
>> Liya Fan
>>
>> On Thu, Nov 7, 2019 at 3:44 PM Micah Kornfield <emkornfi...@gmail.com>
>> wrote:
>>
>> > Hi,
>> > A colleague opened up https://issues.apache.org/jira/browse/ARROW-7048
>> for
>> > having similar functionality to the python APIs that allow for creating
>> one
>> > larger data structure from a series of record batches.  I just wanted to
>> > surface it here in case:
>> > 1.  An efficient solution already exists? It seems like TransferPair
>> > implementations could possibly be improved upon or have they already
>> been
>> > optimized?
>> > 2.  What the preferred API for doing this would be?  Some options i can
>> > think of:
>> >
>> > * VectorSchemaRoot.concat(Collection<VectorSchemaRoot>)
>> > * VectorSchemaRoot.from(Collection<ArrowRecordBatch>)
>> > * VectorLoader.load(Collection<ArrowRecordBatch>)
>> >
>> > Thanks,
>> > Micah
>> >
>>
>

Reply via email to