Hi Jack!

I attempted the code you provided and it seemed to work for me. I put my code 
in a gist [1] for you to compare against your own. I don't use `THROW_NOT_OK` 
simply because I figured it wouldn't be necessary to try that as well (I assume 
that's either your own macro or something you can easily replace 
`ARROW_ASSIGN_OR_RAISE` with).

I provide a version that uses typical arrow macros (`ARROW_ASSIGN_OR_RAISE`) as 
well as a version that manually gets the result/status and checks `ok()`.

I also tried using `Buffer::data()` and `Buffer::size()` functions to replicate 
the fact that you use that constructor of `Buffer` (`Buffer::Buffer(const 
uint8_t* data, int64_t size)`).

I didn't see any issues. So, I kind of suspect that because of the Buffer 
constructor you're using, maybe you somehow don't keep the Buffer data alive? 
So maybe most of the data is actually still there but there has been some 
garbage written? Not totally sure, but it seems a reasonable culprit.

[1]: https://gist.github.com/drin/0386e326a0e1e5d8b079c20e1f81bb0f



# ------------------------------

# Aldrin


https://github.com/drin/

https://gitlab.com/octalene

https://keybase.io/octalene


On Tuesday, March 4th, 2025 at 13:34, Jack Wimberley via user 
<user@arrow.apache.org> wrote:

> Hello all,
> 

> I am attempting to serialize and then deserialize individual RecordBatch 
> objects, using the C++ library. However, I’m getting an “Invalid” result on 
> the deserialization end.
> 

> On the serialization end, with the help of some methods THROW_NOT_OK that 
> throw on non-OK Status and Result (and returns latter case returns the inner 
> Value), I’m serializing using
> 

> // batch is a valid std::shared_ptr<RecordBatch>
> auto bufferStream = THROW_NOT_OK(io::BufferOutputStream::Create(1024));
> auto batchWriter = THROW_NOT_OK(ipc::MakeStreamWriter(bufferStream, 
> batch->schema()));
> auto writeStatus = THROW_NOT_OK(batchWriter->WriteRecordBatch(*batch));
> THROW_NOT_OK(batchWriter->Close());
> auto batchBuffer = THROW_NOT_OK(bufferStream->Finish());
> 

> // pass this data along
> 

> The size of the buffer thus created is 1800. On the other end of the channel, 
> I try to deserialize an in-memory copy of this IPC data using
> 

> 

> // bufferPtr is a uint8_t* const location in memory and bufferSize a number 
> of bytes
> auto arrowBuffer = std::make_shared<Buffer>(bufferPtr, bufferSize); // 
> no-copy wrap
> auto bufferReader = std::make_shared<io::BufferReader>(arrowBuffer);
> auto batchReader = 
> THROW_NOT_OK(ipc::RecordBatchStreamReader::Open(bufferReader));
> 

> 

> But, the last step fails, with a non-OK result with message
> 

> Invalid: Expected to read 165847040 metadata bytes, but only read 1796
> 

> 

> The metadata bytes size is way off, given the serialized RecordBatch was 1800 
> bytes to begin with. The number of bytes read looks about right, modulo that 
> difference of 4. I saw some similar questions in the archives and online but 
> the issues in them tended to be that the Close() step was missing. Other 
> suggestions are a mismatch in the reader/writer format; I am using ones that 
> look to me to be appropriately paired IPC stream I/O objects. Does some sort 
> of header need to be written to the stream before the RecordBatch? Or, I did 
> not use the second overloaded WriteRecordBatch method that takes a metadata 
> object as the second argument, and the message mentions metadata bytes; is 
> that relevant?
> 

> Best,
> 

> Jack Wimberley

Attachment: publickey - octalene.dev@pm.me - 0x21969656.asc
Description: application/pgp-keys

Attachment: signature.asc
Description: OpenPGP digital signature

Reply via email to