[PR] [FLINK-37213][table-runtime] Improve performance of unbounded OVER aggregations [flink]

via GitHub Fri, 24 Jan 2025 01:42:34 -0800


pnowojski opened a new pull request, #26075:
URL: https://github.com/apache/flink/pull/26075


   Instead of sorting all of the records based on the row time explicilty use 
timers to achieve the same thing.
   
   This version vs the previous one register a timer for each record, as 
opposed to just one timer per key. However since we are using RocksDB for 
timers, this is a minor problem. In exchange, we:
    - don't have to iterate over all of the state for each timer
    - we are firing timers only when needed, vs for each watermark for each 
key. For example if watermarks are fire every 200ms and for a given key, we 
have only one record that should be fired 20s into the future, the previous 
version would be firing a timer for that key for each watermark unnecessarily 
without doing any work.
   
   ## Verifying this change
   
   This is covered by various existing tests, that were parametrised to check 
for both old and new code paths.
   
   ## Does this pull request potentially affect one of the following parts:
   
     - Dependencies (does it add or upgrade a dependency): (yes / **no**)
     - The public API, i.e., is any changed class annotated with 
`@Public(Evolving)`: (yes / **no**)
     - The serializers: (yes / **no** / don't know)
     - The runtime per-record code paths (performance sensitive): (**yes** / no 
/ don't know)
     - Anything that affects deployment or recovery: JobManager (and its 
components), Checkpointing, Kubernetes/Yarn, ZooKeeper: (yes / **no** / don't 
know)
     - The S3 file system connector: (yes / **no** / don't know)
   
   ## Documentation
   
     - Does this pull request introduce a new feature? (**yes** / no)
     - If yes, how is the feature documented? (not applicable / **docs** / 
**JavaDocs** / not documented)
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@flink.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org

[PR] [FLINK-37213][table-runtime] Improve performance of unbounded OVER aggregations [flink]

Reply via email to