[jira] [Updated] (FLINK-12912) Incorrect handling of task.checkpoint.alignment.max-size when one checkpoint subsumes another one

Flink Jira Bot (Jira) Thu, 22 Apr 2021 06:26:19 -0700


     [ 
https://issues.apache.org/jira/browse/FLINK-12912?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]


Flink Jira Bot updated FLINK-12912:
-----------------------------------
    Labels: stale-major  (was: )

> Incorrect handling of task.checkpoint.alignment.max-size when one checkpoint 
> subsumes another one
> -------------------------------------------------------------------------------------------------
>
>                 Key: FLINK-12912
>                 URL: https://issues.apache.org/jira/browse/FLINK-12912
>             Project: Flink
>          Issue Type: Bug
>          Components: Runtime / Network
>    Affects Versions: 1.6.4, 1.7.2, 1.8.0
>            Reporter: Piotr Nowojski
>            Priority: Major
>              Labels: stale-major
>
> {{BarrierBuffer#numQueuedBytes}} which is used to evaluate {{ 
> task.checkpoint.alignment.max-size}} limit, is not correctly handled if one 
> checkpoint subsumes another one.
> The max size limit is checked against a sum of {{numQueuedBytes}} and 
> {{bufferBlocker.getBytesBlocked()}}. The {{getBytesBlocked}} keeps tracks of 
> the alignment size of the only most latest checkpoint. The bug is 
> {{BarrierBuffer#releaseBlocksAndResetBarriers()}} method, where while 
> handling first subsumed checkpoint in the branch:
> {code:java}
>               if (currentBuffered == null) {
>                       // common case: no more buffered data
>                       currentBuffered = 
> bufferBlocker.rollOverReusingResources();
>                       if (currentBuffered != null) {
>                               currentBuffered.open();
>                       }
>               }
> {code}
> we clear the {{bufferBlocker.getBytesBlocked()}} counter, while we do not 
> update {{numQueuedBytes}} counter. 
> For example when first checkpoint approached to 99.9% of max alignment size 
> when it was subsumed, due to this bug calculated alignment size drops to 0 
> bytes. For subsequent subsumed checkpoints {{numQueuedBytes}} is correctly 
> updated.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

[jira] [Updated] (FLINK-12912) Incorrect handling of task.checkpoint.alignment.max-size when one checkpoint subsumes another one

Reply via email to