Hi,

I'm trying to implement customized state logic with KeyedProcessFunction.
But I'm not quite sure how to implement the correct watermark behavior when
late data is involved.

According to the answer on stackoverflow:
https://stackoverflow.com/questions/59468154/how-to-sort-an-out-of-order-event-time-stream-using-flink
, there should be a state buffering all events until watermark passed the
expected time and a event time trigger will fetch from the state and do the
calculation. The buffer type should be Map<T, List<E>> where T is the
timestamp and E is the event type.

However, the interface provided by Flink currently is only a MapStae<K, V>.
If the V is a List<E> and buffered all events, every time an event comes
Flink will do ser/deser and could be very expensive when throughput is huge.

I checked the built-in window implementation which implements the watermark
buffering.  It seems that WindowOperator consists of some InternalStates,
 of which signature is where window is namespace or key, if I understand
correctly. But internal states are not available for Flink users.

So my question is: is there an efficient way to simulate watermark
buffering using process function for Flink users?

Thanks.

Reply via email to