I'm afraid not. The DataStream window implementation uses internal APIs to manipulate the state backend namespace, which isn't possible to do with the public-facing API. And without this, you can't implement this as efficiently.
David On Wed, Feb 16, 2022 at 12:04 PM Ruibin Xing <xingro...@gmail.com> wrote: > Hi, > > I'm trying to implement customized state logic with KeyedProcessFunction. > But I'm not quite sure how to implement the correct watermark behavior when > late data is involved. > > According to the answer on stackoverflow: > https://stackoverflow.com/questions/59468154/how-to-sort-an-out-of-order-event-time-stream-using-flink > , there should be a state buffering all events until watermark passed the > expected time and a event time trigger will fetch from the state and do the > calculation. The buffer type should be Map<T, List<E>> where T is the > timestamp and E is the event type. > > However, the interface provided by Flink currently is only a MapStae<K, > V>. If the V is a List<E> and buffered all events, every time an event > comes Flink will do ser/deser and could be very expensive when throughput > is huge. > > I checked the built-in window implementation which implements the > watermark buffering. It seems that WindowOperator consists of some > InternalStates, of which signature is where window is namespace or key, if > I understand correctly. But internal states are not available for Flink > users. > > So my question is: is there an efficient way to simulate watermark > buffering using process function for Flink users? > > Thanks. >