Hey Matt,

For Flight: for DoGet/DoPut/DoExchange, you can accomplish with the 
app_metadata fields built in to these methods. For instance, in 
DoGet/DoExchange, you could send some batches of data, then send a message with 
only an app_metadata field encoding the watermark. (The app_metadata field is a 
byte blob, and you can impart structure on it with JSON or Protobuf or the 
scheme of your choice.) The app_metadata field could also be appended to the 
side of a record batch as well.

For Arrow files: ARROW-16131 in Arrow 8.0.0 gives you the ability to add custom 
key-value metadata to individual batches, and Arrow already supported key-value 
metadata in files, so you could use either of these to record a watermark. 

Do these seem workable? Regarding the roadmap, I don't think we've seen 
requests for this previously; what sort of support would help in the compute 
API?

-David

On Wed, Apr 27, 2022, at 13:02, Matt Rudary wrote:
> Hi,
>
> We're looking at using Arrow as part of our solution to ship tabular 
> data between different streaming systems, potentially implemented using 
> different technologies, like Spark, Beam, Flink, etc. Some of these 
> systems contain "watermarks" as a key concept. Briefly, a watermark is 
> a promise that a certain data source will not produce any more 
> events/rows with a timestamp earlier than a given time. For example, if 
> I produce a batch of rows every 5 minutes, after I've finished sending 
> the 12:00 data, I would send a watermark update of 12:04:59, thus 
> letting downstream consumers know that no future row from me will have 
> a timestamp before 12:05.
>
> We would like to be able to propagate watermarks with our data, and I 
> wondered if this list has any ideas of how to do this currently, or 
> whether it is part of the roadmap for the Arrow compute api or similar. 
> We'd like to be able to do this over Arrow Flight, but potentially also 
> for other methods of shipping Arrow data, like pubsub feeds, file 
> dumps, etc.
>
> Thanks
> Matt Rudary
> Two Sigma

Reply via email to