Hi Rusty,
On the face of it, this looks like a reasonable idea, though I wonder if
it should be a separate message type *or* an optional field carried
together in RecordBatches.
(I would perhaps also call it "application data" or something)
Regards
Antoine.
Le 03/02/2026 à 17:31, Rusty Conover a écrit :
Hi Arrow Friends,
I’ve really appreciated Arrow Flight’s ability to carry custom metadata
messages alongside record batches. In some of my current work, however, I’m
dealing with Arrow IPC streams that are *not* sent via Flight, and I’d like to
have a comparable capability there as well.
To support this, I’d like to propose adding a new IPC message type—tentatively
named `*OpaqueBytes*`—that would allow arbitrary bytes to be embedded directly
within IPC streams. IPC readers that do not understand this message type could
safely ignore it, preserving compatibility.
My motivation is to enable multiplexing of auxiliary messages within a stream
that otherwise consists of schemas, dictionaries, and record batches. A
concrete example would be interleaving logging or signaling messages with
record batches. Today, I’m approximating this by emitting zero-row record
batches with binary metadata, but this approach is awkward and incurs
unnecessary overhead due to schema complexity.
An `OpaqueBytes` IPC message type could enable a range of use cases, including
(but not limited to) logging, flow control, signaling, and other auxiliary
communication needs that don’t naturally map to record batches.
I briefly discussed this idea a few weeks ago on the Apache Arrow call, but
wanted to share it here to reach a broader audience and gather more feedback.
In addition to the message type itself, I’d also be interested in hearing
thoughts on how PyArrow’s interfaces might be extended to allow users to read
and write these arbitrary messages as part of existing IPC stream readers and
writers.
Looking forward to your thoughts and discussion.
Kind regards,
Rusty