Le 03/02/2026 à 20:27, Rusty Conover a écrit :
Hi Antoine,
It is nice to hear from you!
Ditto :-)
On the face of it, this looks like a reasonable idea, though I wonder if
it should be a separate message type *or* an optional field carried
together in RecordBatches.
The main issue with carrying this in RecordBatch metadata is ordering. While
IPC already supports `custom_metadata` via `write_batch` (which I’ve been
using), that approach assumes the application data can be attached to a
specific batch.
In some cases, the application data and record batches are produced
independently and cannot be cleanly associated. A concrete example is
interleaving stderr output (arbitrary log messages) with record batches written
to stdout, while preserving a single ordered IPC stream.
I experimented with using zero-row record batches as a workaround, but this is
inefficient: even with no rows, the serialized message size grows with schema
complexity.
Ok, perhaps we can find a generic solution using two additions:
1) a new Empty message type to avoid the overhead (and semantics) of
empty record batches
2) a new application_data field in the Message table to pass arbitrary
opaque data with any kind of message
Something like:
https://gist.github.com/pitrou/363c4509706f56743f0ca0373f20949c
What do you think?
Regards
Antoine.