Hi Jack! I attempted the code you provided and it seemed to work for me. I put my code in a gist [1] for you to compare against your own. I don't use `THROW_NOT_OK` simply because I figured it wouldn't be necessary to try that as well (I assume that's either your own macro or something you can easily replace `ARROW_ASSIGN_OR_RAISE` with).
I provide a version that uses typical arrow macros (`ARROW_ASSIGN_OR_RAISE`) as well as a version that manually gets the result/status and checks `ok()`. I also tried using `Buffer::data()` and `Buffer::size()` functions to replicate the fact that you use that constructor of `Buffer` (`Buffer::Buffer(const uint8_t* data, int64_t size)`). I didn't see any issues. So, I kind of suspect that because of the Buffer constructor you're using, maybe you somehow don't keep the Buffer data alive? So maybe most of the data is actually still there but there has been some garbage written? Not totally sure, but it seems a reasonable culprit. [1]: https://gist.github.com/drin/0386e326a0e1e5d8b079c20e1f81bb0f # ------------------------------ # Aldrin https://github.com/drin/ https://gitlab.com/octalene https://keybase.io/octalene On Tuesday, March 4th, 2025 at 13:34, Jack Wimberley via user <user@arrow.apache.org> wrote: > Hello all, > > I am attempting to serialize and then deserialize individual RecordBatch > objects, using the C++ library. However, I’m getting an “Invalid” result on > the deserialization end. > > On the serialization end, with the help of some methods THROW_NOT_OK that > throw on non-OK Status and Result (and returns latter case returns the inner > Value), I’m serializing using > > // batch is a valid std::shared_ptr<RecordBatch> > auto bufferStream = THROW_NOT_OK(io::BufferOutputStream::Create(1024)); > auto batchWriter = THROW_NOT_OK(ipc::MakeStreamWriter(bufferStream, > batch->schema())); > auto writeStatus = THROW_NOT_OK(batchWriter->WriteRecordBatch(*batch)); > THROW_NOT_OK(batchWriter->Close()); > auto batchBuffer = THROW_NOT_OK(bufferStream->Finish()); > > // pass this data along > > The size of the buffer thus created is 1800. On the other end of the channel, > I try to deserialize an in-memory copy of this IPC data using > > > // bufferPtr is a uint8_t* const location in memory and bufferSize a number > of bytes > auto arrowBuffer = std::make_shared<Buffer>(bufferPtr, bufferSize); // > no-copy wrap > auto bufferReader = std::make_shared<io::BufferReader>(arrowBuffer); > auto batchReader = > THROW_NOT_OK(ipc::RecordBatchStreamReader::Open(bufferReader)); > > > But, the last step fails, with a non-OK result with message > > Invalid: Expected to read 165847040 metadata bytes, but only read 1796 > > > The metadata bytes size is way off, given the serialized RecordBatch was 1800 > bytes to begin with. The number of bytes read looks about right, modulo that > difference of 4. I saw some similar questions in the archives and online but > the issues in them tended to be that the Close() step was missing. Other > suggestions are a mismatch in the reader/writer format; I am using ones that > look to me to be appropriately paired IPC stream I/O objects. Does some sort > of header need to be written to the stream before the RecordBatch? Or, I did > not use the second overloaded WriteRecordBatch method that takes a metadata > object as the second argument, and the message mentions metadata bytes; is > that relevant? > > Best, > > Jack Wimberley
publickey - octalene.dev@pm.me - 0x21969656.asc
Description: application/pgp-keys
signature.asc
Description: OpenPGP digital signature