Hi all, Thanks for the thoughtful discussion; this has been really helpful to follow.
It seems to me that there are two slightly different (but related) needs being discussed: 1. The ability to carry *opaque, non-string bytes* in IPC, avoiding the overhead and semantic mismatch of zero-row RecordBatches and the string-only limitation of custom_metadata. 2. The ability to control the association* and ordering* of such data with respect to RecordBatch messages (sometimes tightly coupled, sometimes intentionally independent). >From that perspective, the combination Antoine suggested of a lightweight empty message type plus an application_data bytes field feels like a nice decomposition: - Empty provides a low-overhead, ordering-preserving carrier when the data is intentionally independent. - An application_data field on Message allows attaching bytes directly to RecordBatch (or other messages) when tight association is desired (e.g., statistics, per-batch signals). This also seems to align well with David’s point about avoiding base64 and with Dewey’s use cases where the payload is meaningful but doesn’t naturally fit schema metadata. One thing I like about this direction is that it keeps the initial scope focused: it doesn’t force multiplexing or structured interpretation up front but still leaves room to experiment (e.g., embedding serialized IPC in Empty or evolving higher-level conventions later). >From an implementation point of view, it also seems feasible to prototype incrementally: - introduce Empty + application_data(bytes) in the IPC format - initially treat application_data as opaque and pass-through in readers/writers - let higher-level libraries decide how (or whether) to interpret it Happy to help with prototyping or reviewing pieces of this if that’s useful. Best regards, Vignesh On Thu, 5 Feb 2026 at 14:17, Antoine Pitrou <[email protected]> wrote: > > Le 05/02/2026 à 04:44, Dewey Dunnington a écrit : > >> a new application_data field in the Message table to pass arbitrary > >> opaque data with any kind of message > > > > I believe this could be done with the Empty message by putting the bytes > in > > the body instead of in the header. Probably the only place this > > functionally makes a difference would be dissociated IPC where the body > is > > transported separately. Perhaps both are useful. > > I thought about that, but then it means the application data can only be > transmitted in the new Empty message, not with a RecordBatch. > > That's not necessarily a problem, just a limitation to think about. > > >> It could be interesting to support multiplexing multiple IPC streams > over > > the same socket > > > > I agree that there are some applications of the Empty where it would be > > tempting to have the payload of the Empty be serialized IPC (e.g., if > used > > for statistics and the statistics are encoded in the lovely Arrow spec we > > have for that). Perhaps with Empty one could prototype that. > > I hope we can find a way of transporting statistics together with the > corresponding RecordBatch message, as opposed to a separate message in > the IPC stream. > > Regards > > Antoine. > >
