Hi Arrow Friends,

I’ve really appreciated Arrow Flight’s ability to carry custom metadata 
messages alongside record batches. In some of my current work, however, I’m 
dealing with Arrow IPC streams that are *not* sent via Flight, and I’d like to 
have a comparable capability there as well.

To support this, I’d like to propose adding a new IPC message type—tentatively 
named `*OpaqueBytes*`—that would allow arbitrary bytes to be embedded directly 
within IPC streams. IPC readers that do not understand this message type could 
safely ignore it, preserving compatibility.

My motivation is to enable multiplexing of auxiliary messages within a stream 
that otherwise consists of schemas, dictionaries, and record batches. A 
concrete example would be interleaving logging or signaling messages with 
record batches. Today, I’m approximating this by emitting zero-row record 
batches with binary metadata, but this approach is awkward and incurs 
unnecessary overhead due to schema complexity.

An `OpaqueBytes` IPC message type could enable a range of use cases, including 
(but not limited to) logging, flow control, signaling, and other auxiliary 
communication needs that don’t naturally map to record batches.

I briefly discussed this idea a few weeks ago on the Apache Arrow call, but 
wanted to share it here to reach a broader audience and gather more feedback.

In addition to the message type itself, I’d also be interested in hearing 
thoughts on how PyArrow’s interfaces might be extended to allow users to read 
and write these arbitrary messages as part of existing IPC stream readers and 
writers.

Looking forward to your thoughts and discussion.

Kind regards,
Rusty

Reply via email to