Hi Aldrin, My apologies, I never saw this message from you, but thanks belatedly! Your guess was more or less on the mark.
Best, Jack On Tue, Mar 4, 2025 at 11:27 PM Aldrin <octalene....@pm.me> wrote: > Hi Jack! > > I attempted the code you provided and it seemed to work for me. I put my > code in a gist [1] for you to compare against your own. I don't use > THROW_NOT_OK simply because I figured it wouldn't be necessary to try > that as well (I assume that's either your own macro or something you can > easily replace ARROW_ASSIGN_OR_RAISE with). > > I provide a version that uses typical arrow macros > (`ARROW_ASSIGN_OR_RAISE`) as well as a version that manually gets the > result/status and checks ok(). > > I also tried using Buffer::data() and Buffer::size() functions to > replicate the fact that you use that constructor of Buffer > (`Buffer::Buffer(const uint8_t* data, int64_t size)`). > > I didn't see any issues. So, I kind of suspect that because of the Buffer > constructor you're using, maybe you somehow don't keep the Buffer data > alive? So maybe *most* of the data is actually still there but there has > been some garbage written? Not totally sure, but it seems a reasonable > culprit. > > [1]: https://gist.github.com/drin/0386e326a0e1e5d8b079c20e1f81bb0f > > > # ------------------------------ > # Aldrin > > https://github.com/drin/ > https://gitlab.com/octalene > https://keybase.io/octalene > > On Tuesday, March 4th, 2025 at 13:34, Jack Wimberley via user < > user@arrow.apache.org> wrote: > > Hello all, > > I am attempting to serialize and then deserialize individual RecordBatch > objects, using the C++ library. However, I’m getting an “Invalid” result on > the deserialization end. > > On the serialization end, with the help of some methods THROW_NOT_OK that > throw on non-OK Status and Result (and returns latter case returns the > inner Value), I’m serializing using > > // batch is a valid std::shared_ptr<RecordBatch> > auto bufferStream = THROW_NOT_OK(io::BufferOutputStream::Create(1024)); > auto batchWriter = THROW_NOT_OK(ipc::MakeStreamWriter(bufferStream, > batch->schema())); > auto writeStatus = THROW_NOT_OK(batchWriter->WriteRecordBatch(*batch)); > THROW_NOT_OK(batchWriter->Close()); > auto batchBuffer = THROW_NOT_OK(bufferStream->Finish()); > > // pass this data along > > The size of the buffer thus created is 1800. On the other end of the > channel, I try to deserialize an in-memory copy of this IPC data using > > > // bufferPtr is a uint8_t* const location in memory and bufferSize a > number of bytes > auto arrowBuffer = std::make_shared<Buffer>(bufferPtr, bufferSize); // > no-copy wrap > auto bufferReader = std::make_shared<io::BufferReader>(arrowBuffer); > auto batchReader = > THROW_NOT_OK(ipc::RecordBatchStreamReader::Open(bufferReader)); > > > But, the last step fails, with a non-OK result with message > > Invalid: Expected to read 165847040 metadata bytes, but only read 1796 > > > The metadata bytes size is way off, given the serialized RecordBatch was > 1800 bytes to begin with. The number of bytes read looks about right, > modulo that difference of 4. I saw some similar questions in the archives > and online but the issues in them tended to be that the Close() step was > missing. Other suggestions are a mismatch in the reader/writer format; I am > using ones that look to me to be appropriately paired IPC stream I/O > objects. Does some sort of header need to be written to the stream before > the RecordBatch? Or, I did not use the second overloaded WriteRecordBatch > method that takes a metadata object as the second argument, and the message > mentions metadata bytes; is that relevant? > > Best, > > Jack Wimberley > > >