Roman Khachatryan created FLINK-26062:
-----------------------------------------

             Summary: [Changelog] Non-deterministic recovery of PriorityQueue 
states
                 Key: FLINK-26062
                 URL: https://issues.apache.org/jira/browse/FLINK-26062
             Project: Flink
          Issue Type: Bug
          Components: Runtime / State Backends
    Affects Versions: 1.15.0
            Reporter: Roman Khachatryan
            Assignee: Roman Khachatryan
             Fix For: 1.15.0


Currently, InternalPriorityQueue.poll() is logged as a separate operation, 
without specifying the element that has been polled. On recovery, this recorded 
poll() is replayed.

However, this is not deterministic because the order of PQ elements with equal 
priorityis not specified. For example, TimerHeapInternalTimer only compares 
timestamps, which are often equal. This results in polling timers from queue in 
wrong order => dropping timers => and not firing timers.

 

ProcessingTimeWindowCheckpointingITCase.testAggregatingSlidingProcessingTimeWindow
 fails with materialization enabled and using heap state backend (both 
in-memory and fs-based implementations).

 

Proposed solution is to replace poll with remove operation (which is based on 
equality).
 
cc: [~masteryhx], [~ym], [~yunta]



--
This message was sent by Atlassian Jira
(v8.20.1#820001)

Reply via email to