Hi Arrow Friends, I’ve really appreciated Arrow Flight’s ability to carry custom metadata messages alongside record batches. In some of my current work, however, I’m dealing with Arrow IPC streams that are *not* sent via Flight, and I’d like to have a comparable capability there as well.
To support this, I’d like to propose adding a new IPC message type—tentatively named `*OpaqueBytes*`—that would allow arbitrary bytes to be embedded directly within IPC streams. IPC readers that do not understand this message type could safely ignore it, preserving compatibility. My motivation is to enable multiplexing of auxiliary messages within a stream that otherwise consists of schemas, dictionaries, and record batches. A concrete example would be interleaving logging or signaling messages with record batches. Today, I’m approximating this by emitting zero-row record batches with binary metadata, but this approach is awkward and incurs unnecessary overhead due to schema complexity. An `OpaqueBytes` IPC message type could enable a range of use cases, including (but not limited to) logging, flow control, signaling, and other auxiliary communication needs that don’t naturally map to record batches. I briefly discussed this idea a few weeks ago on the Apache Arrow call, but wanted to share it here to reach a broader audience and gather more feedback. In addition to the message type itself, I’d also be interested in hearing thoughts on how PyArrow’s interfaces might be extended to allow users to read and write these arbitrary messages as part of existing IPC stream readers and writers. Looking forward to your thoughts and discussion. Kind regards, Rusty
