Hello,

I am using the C++ API to serialize and centralize data over the network. I
am wondering if I am using the API in an efficient way.

I have multiple nodes and a coordinator communicating over the network. I
do not have fine control over the network communication. Individual nodes
write one chunk of data to the network. The coordinator will receive all
the chunks and can loop over them.

On each node I do the following (the code is here
<https://github.com/Paradigm4/accelerated_io_tools/blob/e02aa37eb464d2eae501a36e4297adb28467f311/src/PhysicalAioSave.cpp#L512>
):

   1. Append data to Builders
   2. Finish Builders and get Arrays
   3. Create Record Batch from Arrays
   4. Create Pool Buffer
   5. Create Buffer Output Stream using Pool Buffer
   6. Open Record Batch Stream Writer using Buffer Output Stream
   7. Write Record Batch to writer
   8. Write Buffer data to network

On the coordinator I do the following (the code is here
<https://github.com/Paradigm4/accelerated_io_tools/blob/e02aa37eb464d2eae501a36e4297adb28467f311/src/PhysicalAioSave.cpp#L1000>
):

   1. Open File Output Stream
   2. Open Record Batch Stream Writer using File Output Stream
   3. For each chunk retrieved from the network
      1. Create Buffer Reader using data retrieved from the network (the
      code is here
      
<https://github.com/Paradigm4/accelerated_io_tools/blob/e02aa37eb464d2eae501a36e4297adb28467f311/src/PhysicalAioSave.cpp#L1054>
      )
      2. Create Record Batch Stream Reader using Buffer Reader and read
      Record Batch (I plan to use ReadRecordBatch, but I'm having
issues like in
      ARROW-2189)
      3. Write Record Batch

A few questions:

   - Does this look good decent? Could the API be used in more efficient
   ways in order to achieve the same goal?
   - On each node side, do steps #3 and #7 copy data?
   - On the coordinator side, do steps #3.2, and #3.3 copy data?
   - On the coordinator side, do I really need to read and write a record
   batch? Could I copy the buffer directly somehow?

Thank you so much!
Rares



(Could the same Pool Buffer be reused across calls?)

Reply via email to