Hi,

I am getting a headache when thinking about watermarks and timestamps.
My application reads events from Kafka  (they are in json format) as a
Datastream
Events can be keyed by a transactionId and have a event timestamp
(handlingTime)

All events belonging to a single transactionId will arrive in a window of a
couple of minutes (say max 5 minutes).
As soon as this 5 minutes has passed it should calculate the differences in
timestamp between the ordered events, add that elapsed time to every event
and emit them to the Sink.

I basically want to use the table api to do
"SELECT transactionId, handlingTime, handlingTime - lag(handlingTime) over
(partition by transactionId order by handlingTime) as elapsedTime,
originalEvent FROM InputTable"

After the result of this query has been pushed to the Sink all data with
respect to this transactionId can be discarded.

What kind of watermark do I need to use?
- bounded out of orderness?
- with idleness?
- ...

Late events can be ignored. They will rarely happen.

Regards Hans-Peter

Reply via email to