Hi, We're looking at using Arrow as part of our solution to ship tabular data between different streaming systems, potentially implemented using different technologies, like Spark, Beam, Flink, etc. Some of these systems contain "watermarks" as a key concept. Briefly, a watermark is a promise that a certain data source will not produce any more events/rows with a timestamp earlier than a given time. For example, if I produce a batch of rows every 5 minutes, after I've finished sending the 12:00 data, I would send a watermark update of 12:04:59, thus letting downstream consumers know that no future row from me will have a timestamp before 12:05.
We would like to be able to propagate watermarks with our data, and I wondered if this list has any ideas of how to do this currently, or whether it is part of the roadmap for the Arrow compute api or similar. We'd like to be able to do this over Arrow Flight, but potentially also for other methods of shipping Arrow data, like pubsub feeds, file dumps, etc. Thanks Matt Rudary Two Sigma