Hi Aldrin,

My apologies, I never saw this message from you, but thanks belatedly! Your
guess was more or less on the mark.

Best,

Jack

On Tue, Mar 4, 2025 at 11:27 PM Aldrin <octalene....@pm.me> wrote:

> Hi Jack!
>
> I attempted the code you provided and it seemed to work for me. I put my
> code in a gist [1] for you to compare against your own. I don't use
> THROW_NOT_OK​ simply because I figured it wouldn't be necessary to try
> that as well (I assume that's either your own macro or something you can
> easily replace ARROW_ASSIGN_OR_RAISE​ with).
>
> I provide a version that uses typical arrow macros
> (`ARROW_ASSIGN_OR_RAISE`) as well as a version that manually gets the
> result/status and checks ok()​.
>
> I also tried using Buffer::data()​ and Buffer::size()​ functions to
> replicate the fact that you use that constructor of Buffer​
> (`Buffer::Buffer(const uint8_t* data, int64_t size)`).
>
> I didn't see any issues. So, I kind of suspect that because of the Buffer
> constructor you're using, maybe you somehow don't keep the Buffer data
> alive? So maybe *most*​ of the data is actually still there but there has
> been some garbage written? Not totally sure, but it seems a reasonable
> culprit.
>
> [1]: https://gist.github.com/drin/0386e326a0e1e5d8b079c20e1f81bb0f
>
>
> # ------------------------------
> # Aldrin
>
> https://github.com/drin/
> https://gitlab.com/octalene
> https://keybase.io/octalene
>
> On Tuesday, March 4th, 2025 at 13:34, Jack Wimberley via user <
> user@arrow.apache.org> wrote:
>
> Hello all,
>
> I am attempting to serialize and then deserialize individual RecordBatch
> objects, using the C++ library. However, I’m getting an “Invalid” result on
> the deserialization end.
>
> On the serialization end, with the help of some methods THROW_NOT_OK that
> throw on non-OK Status and Result (and returns latter case returns the
> inner Value), I’m serializing using
>
> // batch is a valid std::shared_ptr<RecordBatch>
> auto bufferStream = THROW_NOT_OK(io::BufferOutputStream::Create(1024));
> auto batchWriter = THROW_NOT_OK(ipc::MakeStreamWriter(bufferStream,
> batch->schema()));
> auto writeStatus = THROW_NOT_OK(batchWriter->WriteRecordBatch(*batch));
> THROW_NOT_OK(batchWriter->Close());
> auto batchBuffer = THROW_NOT_OK(bufferStream->Finish());
>
> // pass this data along
>
> The size of the buffer thus created is 1800. On the other end of the
> channel, I try to deserialize an in-memory copy of this IPC data using
>
>
> // bufferPtr is a uint8_t* const location in memory and bufferSize a
> number of bytes
> auto arrowBuffer = std::make_shared<Buffer>(bufferPtr, bufferSize); //
> no-copy wrap
> auto bufferReader = std::make_shared<io::BufferReader>(arrowBuffer);
> auto batchReader =
> THROW_NOT_OK(ipc::RecordBatchStreamReader::Open(bufferReader));
>
>
> But, the last step fails, with a non-OK result with message
>
> Invalid: Expected to read 165847040 metadata bytes, but only read 1796
>
>
> The metadata bytes size is way off, given the serialized RecordBatch was
> 1800 bytes to begin with. The number of bytes read looks about right,
> modulo that difference of 4. I saw some similar questions in the archives
> and online but the issues in them tended to be that the Close() step was
> missing. Other suggestions are a mismatch in the reader/writer format; I am
> using ones that look to me to be appropriately paired IPC stream I/O
> objects. Does some sort of header need to be written to the stream before
> the RecordBatch? Or, I did not use the second overloaded WriteRecordBatch
> method that takes a metadata object as the second argument, and the message
> mentions metadata bytes; is that relevant?
>
> Best,
>
> Jack Wimberley
>
>
>

Reply via email to