ion created FLINK-39898:
---------------------------
Summary: ProcessingTimeService timer callbacks delayed until
checkpoint when mini-batch and unaligned checkpoints are enabled
Key: FLINK-39898
URL: https://issues.apache.org/jira/browse/FLINK-39898
Project: Flink
Issue Type: Bug
Affects Versions: 2.1.2
Reporter: ion
h3. Description
When both {{table.exec.mini-batch.enabled = true}} and
{{execution.checkpointing.unaligned.enabled = true}} are set, timer callbacks
registered via {{ProcessingTimeService.registerTimer()}} are not executed
between checkpoints. They are only processed during checkpoint barrier handling.
This is a regression from Flink 1.20, introduced by the urgent mail system in
Flink 2.1 (FLINK-35796). Unaligned checkpoint barriers are submitted as urgent
mails, which causes non-urgent mails (including ProcessingTimeService timer
callbacks) to be skipped by
{{TaskMailboxImpl.tryTakeFromBatch()}}.
h3. Reproduction
* Flink 2.1.2
* {{table.exec.mini-batch.enabled = true}}
* {{execution.checkpointing.unaligned.enabled = true}}
* {{execution.checkpointing.interval = 3m}}
* Any operator that uses {{ProcessingTimeService}} timers (e.g., upsert-kafka
sink with {{sink.buffer-flush.interval > 0}})
*Expected:* Timer callbacks fire at the registered interval.
*Actual:* Timer callbacks only fire at checkpoint time.
h3. Test Results
||Configuration||Timer works between checkpoints||
|Flink 1.20 + unaligned ON|(/) works|
|Flink 2.1 + unaligned OFF|(/) works|
|Flink 2.1 + unaligned ON|(x) broken|
h3. Workaround
Disable either mini-batch or unaligned checkpoints.
--
This message was sent by Atlassian Jira
(v8.20.10#820010)