Re: [Java] Append multiple record batches together?

Fan Liya Thu, 07 Nov 2019 00:10:12 -0800

Hi Micah,

Thanks for bringing this up.

> 1.  An efficient solution already exists? It seems like TransferPair
implementations could possibly be improved upon or have they already been
optimized?

Fundamnentally, memory copy is unavoidable, IMO, because the source and
targe memory regions are likely to be in non-contiguous regions.
An alternative is to make ArrowBuf support a number of non-contiguous
memory regions. However, that would harm the perfomance of ArrowBuf, and
ArrowBuf is the core of the Arrow library.

> 2.  What the preferred API for doing this would be?  Some options i can
think of:

> * VectorSchemaRoot.concat(Collection<VectorSchemaRoot>)
> * VectorSchemaRoot.from(Collection<ArrowRecordBatch>)
> * VectorLoader.load(Collection<ArrowRecordBatch>)

IMO, option 1 is required, as we have scenarios that need to concate
vectors/VectorSchemaRoots (e.g. restore the complete dictionary from delta
dictionaries).
Options 2 and 3 are optional for us.

Best,
Liya Fan

On Thu, Nov 7, 2019 at 3:44 PM Micah Kornfield <emkornfi...@gmail.com>
wrote:

> Hi,
> A colleague opened up https://issues.apache.org/jira/browse/ARROW-7048 for
> having similar functionality to the python APIs that allow for creating one
> larger data structure from a series of record batches.  I just wanted to
> surface it here in case:
> 1.  An efficient solution already exists? It seems like TransferPair
> implementations could possibly be improved upon or have they already been
> optimized?
> 2.  What the preferred API for doing this would be?  Some options i can
> think of:
>
> * VectorSchemaRoot.concat(Collection<VectorSchemaRoot>)
> * VectorSchemaRoot.from(Collection<ArrowRecordBatch>)
> * VectorLoader.load(Collection<ArrowRecordBatch>)
>
> Thanks,
> Micah
>

Re: [Java] Append multiple record batches together?

Reply via email to