Just for some reference times from my system I created a quick test to dump a ~1.7GB table to buffer(s).
Going to many buffers (just collecting the buffers): ~11,000ns Going to one preallocated buffer: ~160,000,000ns Going to one dynamically allocated buffer (using a grow factor of 2x): ~2,000,000,000ns On Thu, Jun 10, 2021 at 11:46 AM Wes McKinney <wesmck...@gmail.com> wrote: > > To be clear, we would like to help make this faster. I don't recall > much effort being invested in optimizing this code path in the last > couple of years, so there may be some low hanging fruit to improve the > performance. Changing the in-memory data layout (the chunking) is one > of the most likely things to help. > > On Thu, Jun 10, 2021 at 2:14 PM Gosh Arzumanyan <gosh...@gmail.com> wrote: > > > > Hi Jayjeet, > > > > I wonder if you really need to serialize the whole table into a single > > buffer as you will end up with twice the memory while you could be sending > > chunks as they are generated by the RecordBatchStreamWriter. Also is the > > buffer resized beforehand? I'd suspect there might be relocations happening > > under the hood. > > > > > > Cheers, > > Gosh > > > > On Thu., 10 Jun. 2021, 21:01 Wes McKinney, <wesmck...@gmail.com> wrote: > > > > > hi Jayjeet — have you run prof to see where those 1000ms are being > > > spent? How many arrays (the sum of the number of chunks across all > > > columns) in total are there? I would guess that the problem is all the > > > little Buffer memcopies. I don't think that the C Interface is going > > > to help you. > > > > > > - Wes > > > > > > On Thu, Jun 10, 2021 at 1:48 PM Jayjeet Chakraborty > > > <jayjeetchakrabort...@gmail.com> wrote: > > > > > > > > Hello Arrow Community, > > > > > > > > I am a student working on a project where I need to serialize an > > > in-memory Arrow Table of size around 700MB to a uint8_t* buffer. I am > > > currently using the arrow::ipc::RecordBatchStreamWriter API to serialize > > > the table to a arrow::Buffer, but it is taking nearly 1000ms to serialize > > > the whole table, and that is harming the performance of my > > > performance-critical application. I basically want to get hold of the > > > underlying memory of the table as bytes and send it over the network. How > > > do you suggest I tackle this problem? I was thinking of using the C Data > > > interface for this, so that I convert my arrow::Table to ArrowArray and > > > ArrowSchema and serialize the structs to send them over the network, but > > > seems like serializing structs is another complex problem on its own. It > > > will be great to have some suggestions on this. Thanks a lot. > > > > > > > > Best, > > > > Jayjeet > > > > > > >