Hi Micah, Thanks for bringing this up.
> 1. An efficient solution already exists? It seems like TransferPair implementations could possibly be improved upon or have they already been optimized? Fundamnentally, memory copy is unavoidable, IMO, because the source and targe memory regions are likely to be in non-contiguous regions. An alternative is to make ArrowBuf support a number of non-contiguous memory regions. However, that would harm the perfomance of ArrowBuf, and ArrowBuf is the core of the Arrow library. > 2. What the preferred API for doing this would be? Some options i can think of: > * VectorSchemaRoot.concat(Collection<VectorSchemaRoot>) > * VectorSchemaRoot.from(Collection<ArrowRecordBatch>) > * VectorLoader.load(Collection<ArrowRecordBatch>) IMO, option 1 is required, as we have scenarios that need to concate vectors/VectorSchemaRoots (e.g. restore the complete dictionary from delta dictionaries). Options 2 and 3 are optional for us. Best, Liya Fan On Thu, Nov 7, 2019 at 3:44 PM Micah Kornfield <emkornfi...@gmail.com> wrote: > Hi, > A colleague opened up https://issues.apache.org/jira/browse/ARROW-7048 for > having similar functionality to the python APIs that allow for creating one > larger data structure from a series of record batches. I just wanted to > surface it here in case: > 1. An efficient solution already exists? It seems like TransferPair > implementations could possibly be improved upon or have they already been > optimized? > 2. What the preferred API for doing this would be? Some options i can > think of: > > * VectorSchemaRoot.concat(Collection<VectorSchemaRoot>) > * VectorSchemaRoot.from(Collection<ArrowRecordBatch>) > * VectorLoader.load(Collection<ArrowRecordBatch>) > > Thanks, > Micah >